A new dictionary

What Should a New Dictionary Look Like?

The corpus to be treated will have to be representative and at the same time selective. The core choice will be determined by the shape of the symbiotic body that in Tamil has been termed Ilakkiyam-Ilakkaṇam (from Skt. lakṣya-lakṣaṇa, “what is to be characterised and what characterises”), namely poetry and the complex of language-related theoretical disciplines that may be summed up as grammar and poetics. The poetic works considered to be worthy of theoretical consideration have for the most part been composed during the first millennium, beginning with the classical corpus par excellence, the Caṅkam corpus. Commentaries, grammatical works in the wider sense and lexica, however, continued to be produced until the wake of modernity in the 19th century. For this reason, the corpus is slightly lopsided from a chronological perspective: it comprises the whole of the literature composed in the first millennium (poetic and theoretical), as well as the treatises of the second millennium and more recent poetic works that grew on the margin of the theoretical discussions, for example by the way of illustrative texts. The corpus in question will be represented in the form of critical editions with annotated translations and corpus dictionaries, that is, full concordances of the words and their derivations employed in a particular text. Indication of size (number of verse lines) is an ongoing process and has been marked already for the larger part.

The entries of the basic lexicon are automatically compiled from the head words of entries in the corpus dictionaries. On the theoretical side the head nouns have already been pre-defined by the Tamiḻ Ilakkaṇap Pērakarāti. Each basic Tamilex entry will be cross-referenced to the existing dictionaries (Dravidian Etymological Dictionary, Madras Tamil Lexicon, Tamiḻ Ilakkiyap Pērakarāti, Tamiḻ Ilakkaṇap Pērakarāti, the Vocabulario, and specialised dictionaries for example for plant names, as well as the comparatively small number of articles concerned with semantic issues), in the case of loan-words or Sanskrit/Prakrit cognates a Sanskrit/Prakrit origin or parallels, and sub-entries for the semantic development or diversification with quotations in historical sequence (quotations cross-referenced to the corpus dictionaries, editions and translations). Tamil meanings will be based on glosses in classical commentaries (as far as available) and nikaṇṭu entries. English meanings will be based on contextual and historical-philological work, based on the ongoing discussions of the international team.

A considerable share of basic vocabulary has remained (relatively) stable over the two thousand years of Tamil literary history and does not pose great problems, such as for example the standard verb “to do, to make”, ceytal. Other common words have remained common but undergone crucial semantic shifts, such as in the case of the verb tūṅkutal, which means “to hang” in classical Tamil, but “to sleep” in modern Tamil. In such cases not only the semantic range, but the historical development has to be outlined in the examples. The most in demand of attention, however, are the less common or even rare ‒ refined or poetic ‒ words which from the modern reader’s point of view are often not easily distinguished from regional words (theorised as tiri-col and ticai-col in the Tamil grammatical tradition). It is here that the commentary glosses and the nikaṇṭu entries become a vital aid, although of course both have to be backed up by actual occurrences and quotations. Another problem currently not even adequately described is actual dialectal variation, testifying to a substratum of dialects not only in the respective periods of literary production, but also simply in scribal environments. This is very visible in literature (and even more in manuscripts and in the early dictionaries like the Proença’s Vocabulario) not only in the form of “regional words” but in spelling variations which have only partly found entry into the dictionaries of the 20th century.

Tamil literary commentaries, probably not dating back as early as the first millennium, are, as in many traditions, built on glosses. The simplest form of commentary (and the one easiest to exploit for lexical purposes) is the arumpata urai, the “commentary on rare words”, essentially a collection of glosses. Only a small number of these survive, and they will be treated as core material. Much more difficult to use are the slightly later paraphrase commentaries (most commentaries on the theoretical texts begin with a paraphrase) where almost every word is paraphrased but often with a view to the overall meaning and not with a view to semantic detail. The more sophisticated ones moreover complicate matters, as already mentioned, by poetic variation (words with an evident meaning may be glossed differently for beauty’s sake). Moreover paraphrases may be abbreviated with the aim of simply giving the gist. Here it is necessary to filter the available information for pertinent glosses. Commentary glosses in their turn can be shown to rely heavily on the nikaṇṭus.

Tamil has had a lexicographical tradition since ancient times. In particular, it has developed the “thesaurus” (nikaṇṭu) rather than the dictionary style of listing words. In fact, words are not alphabetically listed, but grouped according to their mutual semantic affinity, like in a dictionary of synonyms, with extra sections devoted to homonyms and synonyms. This takes place in verse form, easy to memorise, but it often makes it difficult to disentangle entry words from metrical fillers. The two oldest nikaṇṭus, which will be included in our corpus, are the Tivākaram (Tivākarar, 9th c., 12 sections, 2180 aphorisms, 9,500 entries) and the Piṅkalam (Piṅkala Muṉivar, 10th c., 10 sections, 4121 aphorisms, 14,700 entries). These texts will have the special status of being both part of the corpus of historical texts in which the words were used and part of the “tools” (such as the Madras Tamil Lexicon) that are used to define the semantic scope of each word. Their inclusion is of paramount importance since many glosses found in commentarial literature rely on them.

The example section of each entry will begin with the earliest attested passage for each meaning, separating literary from theoretical (ilakkiyam and ilakkaṇam) and following the genre distinctions (if valid in a particular case), beginning with Caṅkam (Akam and Puṟam, that is, erotic and heroic poetry), Kīḻkkaṇakku (didactic poetry), epic (peruṅkāppiyam < Skt. mahākāvya-), the devotional canons (Tirumuṟai and Tivyappirapantam), later literary genres (Pirapantam). Where possible quotations will be based on critical editions, otherwise on the electronic versions of the standard editions. Where possible semantic development will be distinguished from polysemy/homophony. Examples will be cross-referenced back to the electronic corpus and the corpus dictionaries.