Information Coding and Kalidasa
An interesting book that I stumbled upon. “Kalidasa Kosha” “Thesaurus of Kalidasa” by H L Shukla. What does this book have to offer for the NLP enthusiasts and for the language learners?
A spectacullar linguistic analysis of 25,604 words of Kalidasa across his 7 stellar works of Raghuvamsha, Kumarasambhava, Meghadootha, Ritu Samhara, Abhigyana Shaakuntalam, Malvika Agnimitra and Vikramorvashiya.
The 650 pages of work in the meta language of tables and listing is divided into 2 volumes. It essentially captured these topics. I was particularly interested in the first 2 topics.
1. Dictionary of High Frequency words (400 pages)
2. Meghadootha and Information Theory (40 pages)
3. Rediscovering Kalidasa (100 pages)
4. Kalidasa in legends
1. Dictionary of High Frequency words
Like a dictionary, it provides an alphabetical listing of the top 1800 high frequency words across his works. Each word entry has listing of his stellar works where the word is found and gives a granular indication of verse number, line number, word and character number. It also captures meta data of root sounds, parts of speech and special notes on language usage. I was blown away by the meticulous dedication in creating this kind of database. In the below snapshot, it shows references to Purva Megha and Uttaramegha of Meghadootha for instance.
However, all of this metadata is in Sanskrit and some of the notes in Hindi.
2. Information Theory on Meghadoota
While I was reading Meghadootha a few months back, I had several tenets of Information Theory spinning in my mind. I was looking forward to what this section present.
A non descriptive listing of words in their various forms in Meghadootha running into 40 pages.
I am not covering the topics related to his eulogy, his timeline, birth place and other details which was briefly touched upon.
3. Unique Words in each work
It is interesting to note that he used Prakrit too in his works. Like in the final token exchange between Bharat and Dushyanta. The following table gives a gist of unique words and percent of Sanskrit.
And from parts of speech distribution point of view..
A frequency list of words. The following dump is from the 66th word as most of the high frequency wordss are filler words.
4. Shanon’s Theory of Entropy and Coding
Based on the words of Meghadoota listed in section 2, the author claims that Entropy is 2.05 bits/syllable. It is a measure of corelation and prediction on the next syllable or word level. More diverse the words, more entropy and lesser correlation with the next syllable.
Interestingly according to Herden, Entropies for English is between .76 and 1.56, German is 1.65, Russian is between 2.07 and 2.11.
5.Word Count analysis
Apparently, 79% of words were used only once, 11% twice and so on and hence describes the poet’s versatality.
The author briefly explores a small set of interesting word genres like How a king was addressed, Abusive words, Words for women, Names of Characters, Names for Relations
7. Semantic change of word meanings
A list of 40 words on how the meaning has changed from pre Kalidasa era to Kalidasa era and in Hindi.
8. Prakrit and Non Paninian words
9. Repeating Words
A method used to convey intensity and continuous action. In my opinion, it also aids in the meterical fitting of the words in Meghadootha. khinna khinna, bhuyah bhuyah, mandam mandam.
10. Redundant Words
Superfluous characters perhaps for the sound effects and meter analysis.
The author discusses about 11 pages on compound and verb idioms. Seems interesting.
A great treasure of work to analyse Kalidasa from an objective, statistical and mathematical point of view. For a beginner to get used to the styles and vocabulary of Kalidasa, I feel there is too much Entropy in reading through so many pages and not finding much to latch on.
I was hoping to find a simple English translation of the high frequency words. If we were to have NLP tables with filters on root words, alphabet beginnings, genre of words, prefix analysis and have hyperlinks to individual works, this would be much more a delight for early adopters and learners of Kalidasa’s works.
Meghadoota for instance is a subset of Valmiki’s sundarakandam in vocabulary styles. A genre based representation of flora, fauna, landscapes, relations, emotions can give a lucid introduction of vocabulary. I have made some notes and hope to publish this. May be there are works in the public domain that need to be discovered.
On Information Coding and Entropy, my 2 cents on Meghadootha. On a verse level Entropy-Each verse contains 4 lines and each line has 3 markers for the Meter. I feel Kalidasa packs upto 12 elements in one verse in a very terse and intriguing manner. This is a benchmark for compression. And his weaving style is very unique. The first 4 long syllables like a thick stroke of a brush indicate motion emotion and the subsequent 5 short syllables is a quick cadence of vivid description.
Instead of saying in a mundane way “there is a house with windows, the women are drying their hair with dhoop, the smoke is escaping and augmenting to the colour of the cloud”
He first introduces first the motion part and kind of creating a curiosity in the mind. Roughly it would translate to
Escaping through the lattices and augmenting your colour - the culturing the hair by dhoop
and in Sanskrit it is just 1 line of the verse..
However, if we were to consider entropy at the scope of the whole work, contrastingly there is such harmony in the words to fit into the mandakranta meter. The author has not explored this part. I have made a mental assesment of some parts that fit into the equation. Hope to write on it some time.
May be there should be a term like ‘Harmony’ for poetry analysis. This was for me the first work whose beat revealed the meterical pattern without a priori knowledge. Such is the cohesiveness. In a way the ‘mandakranta meter’ and ‘vidarbha style’ are the frameworks within which Meghadootha is composed. Perhaps if compared with another poetic work following the same meter and checking the pulse of Harmon. I wonder if this can be quantified by a mathematical formula.
To conclude, it is amazing to see a work on Information Theory and Kalidasa. Hope to see digital versions which enable students of Sanskrit to discover the realms of words.