Data Science with Bhagawat Gita

Sujatha R
14 min readMay 29, 2019

A peek into Visualizations, High frequency words, Alphabet Dictionary & Linguistic cues in the Divine Song !

Bhagawat Gita — Philosophical Conversations between the grief struck Arjuna & Lord Krishna

A friend once said, “Reading the Gita had downloaded enormous serenity in me. No matter what happens outside, it is still from inside”

It takes a lot of courage and serendipity to read the Gita. I had been procrastinating it long for the fear of grammar, vocabulary and consequent clouding of attention. Arjuna traversed from the sorrowful aisles of Vishada to the opulences of Vibhuthi. Like a bud unfolding each petal to welcome the Spring -perhaps the song of Gita, the words that weave it and the translations will aid in the blossoming of the Gita Philosophy in the hearts of its seekers.

Arjuna was blessed with a vision to behold the divine cosmic form. At a very gross level, can we behold the words that weave this Song? For those who are bewildered by the vast grammar tables in Sanskrit, is there an entry way for Gita Parayanam and Bhaja Govindam without the obstacles of Dukrinkaranam (grammar)? Recently I was reading about Twitter Analysis with Data Science and it stuck to me, “How about doing it with Gita?”

Methodology Followed

The English verses of Gita were taken from Kaggle Data Science Site. For the purpose of searching and counting patterns and creating frequency tables, Python and Data Science Modules were used. Input patterns were provided based on familiar verses of Gita.

What are the number of words? What was the conversation style? Were there any grammar patterns to learn? How did the Alphabet wise Dictionary look?

Let us begin with the number of chapters and verses in each.

Now let us look into the number of words in each Chapter. About 10K words make up these 700 verses. Seems like a big number to digest... Like a Data Scientist, let us work in the Frequency Domain. Not all the 10K words are unique.Let us make a table of frequent words and their frequency.

Constructs of the Gita language

For simplicity, let us bucket the entire 10K words into Connectors, Pronouns, Verbs and Nouns.


Just like in English, the connector words seem to top the list. Usually they are discarded in Data Science Sentiment Analysis as they are filler words and ambush the main message. However, let us use them to get some cues on the mood and the conversation style.

Was it one sided preaching? Were there any questions? Any negations? Were the sentences complex? Where there examples and illustrations?

I am sure you will make a guess looking by skimming through the below diction and its frequency.

High Frequency Connector Words in Bhagawat Gita

Complex sentence connections?


ca (and) 392 evam (thus) 30
eva (only/indeed) 173 hi (certainly) 67 tu (indeed) 66
iti (like this)65 tasmāt (therefore) 25

[~850 occurrences]

abhyāsena tu kaunteya vairāgyeṇa cha … [Certainly with practise and with detachment]
evam ukto hṛiṣhīkeśho [and thus addressing] [1.24]
karmaṇi eva adhikāras te [Only you have right over …]
iti ahaṁ vāsudevasya [Like this I have heard] [18.74]

Cause Effect & Illustrations

when-then, where-there, as-so

yadā tadā (when then) 24
yatra tatra atra kutra(where-there) 20
yathā tataḥ (like similarly)-70

yadā yadā hi dharmasya ... tadā ātmānaṁ [When ever there is .. at that time][4.7]

yatra yogeśhvaraḥ kṛiṣhṇo yatra pārtho … tatra śhrīr ...[Where there is krishna & Arjuna, There is wealth ...] [18.78]

yathā ākāśha-sthito nityaṁ... tathā sarvāṇi bhūtāni ... [Just Like how the Sky is stationary… Like that All beings] [9.6].

Lot of Q&A

Negations & Questions : no, do not, how, why

na (no)256 mā (do not) 9 katham (how)9 kim (why)21 [305]

phaleṣhu kadāchana [not entitled to the fruits][2.47]
na inaṁ chhindanti śhastrāṇi [not can the weapons tear][2.23]


mahā 43 sarva 67 param 36 śrī 29 punaḥ 21 [196]

The frequent connectors listed above add up to a 1500 words of the 10K list!!


Was it predominantly conversation between 2 friends (First Second person)? Did it have references to Others? (Normal people, Exemplary People, Objects?) (Third Person)

First Person: aham (I)103 mām (me) 84 me 61 mama (my)24 mayā (by me) 22 mayi (in me)20

Second Person: tvam (you) 31 tvām (to you) 20

Third Person: ye te (those who -they) 95 yaḥ saḥ (he who…he) 140 tasya (their)21

yat -tat (that which-that) 170 idam (this) 39 etat (these) 25 ayam (this) 24 iha (in this)21


mayi āveśhya mano ye māṁ nitya-yuktā upāsate
śhraddhayā parayopetās
te me yuktatamā matāḥ [12.2] [ye-te-me] [mayi māṁ]

Those (ye) who fix their mind on me(mayi) and always engage devotion to me(māṁ) , they (te) are considered by me (me)to be the best yogis.

ahaṁ vaiśhvānaro bhūtvā …[15.14] I am the digestive fire…

iti te jñānam ākhyātaṁ guhyād guhyataraṁ mayā … [18.63] In this manner the knowledge which is utmost secretive has been revealed to you by me …

The small list of the above pronouns revolving around ‘me’, ‘you’, ‘they’ and ‘it’ sum up to about 900 words.

Common Pronouns in Bhagawat Gita


Let us look into what the Nouns speak as they contain the main message. I have clustered them under themes for easy classification purpose.

Names of Arjuna & Krishna

arjun 48 pārth 42 kauntey 25 bhārat 22 kṛṣṇ 14 bharat ṛṣabh 8 guḍākeś 4

bhagavān 28 keśav 7 hṛṣīkeśa 5 govinda 2


Related to Moderation

yoga 105 jñān 91 yukta (moderation) 38 sama (equananimity) 41


Related to Karma (Action)

karma 119 kuru (do)18 phal(fruit) 32 saṅg(attachment) 34 akarma 8 dharma 20


Related to Bhakthi (Devotion)

brahma (creator) 42 bhava (emotion) 10 bhakt (devout) 25 deva (demi gods)35 yagna (sacrifice) 12 bhog (enjoy)15 priya (dear)21 upāsana (worship)6


Ascent through Sense Organs, Mind, Intellect & Soul

indriya (sense organs)43 mana (mind) 39 buddhi (intellect)55 ātma (soul)136


Various Emotions

kāma(lust) 44 duḥk(sad) 28 sukha(happy) 32 rāga(love) 10 dveṣa(hate) 13 bhaya(fear) 8 krodhaḥ(anger) 6


Physical Elements & Traits

bhūtā (elements) 72prakṛti(physical nature) 29 guṇa (attributes) 23 janma(birth) 17 mṛtyu(death) 10 tāmasa(inert) 15 rājasa(passionate) 14 sāttvik(good) 15


The above Noun list comprises around 1200/10K words in the Gita.


Let us look into the action words and get a sense of Verb usage in the Gita.

uvāca (narrated) 64 śṛṇu (hear) 13

The different forms of the word ‘am’

asmi 39 (am) asti (is)18 asi (you are) 17

The verbs are rarely used in their root form. The context and plurality information of the Verbs is embedded in the suffix. It was a conversation between Lord Krishna & Arjuna. That makes me think we should skim through First & Second person singular verbs that could have been used for their direct conversations.

First Person Singular

Think of these as mostly Krishna telling about himself. For instance icchāmi — I wish


paśyāmi (I see) 7
icchāmi (I wish) 6
visṛjāmi (I create) 2
hṛṣyāmi (I take pleasure) 2
dadāmi (I give) 2
sambhavāmi (I appear) 2 śaknomi (I am able) 1


pravakṣyāmi (I shall explain) 4
vakṣyāmi (I shall explain) 4
Total 34

Second Person — Should be mostly Krishna addressing his friend Arjuna.

asi (you are)17
arhasi (befitting you)10
avāpsyasi (you will attain) 4 prāpsyasi 2 (you will attain)
icchasi (you desire)3

Third person verbs would be interesting. Actions pertaining to different kinds of people (the exemplary and not so exemplary), actions pertaining to objects. It is interesting to note that singular usage seems more prevalent than plural usage.

Interestingly in Sanskrit there are 2 kinds of verbs. Crudely a) the sophisticated higher order verbs like arise-born-bloom-gain and b)day to day action verbs which are associated with nouns like sit-eat-give-go. Based on the suffix patterns one can differentiate both of them.

Third Person Singular

For instance - saḥ ucyate — he calls

ucyate (calls) 28

Sophisticated: vidyate (is there) 9 upāsate (worships) 5 paryupāsate (worships) 5 labhate (gains)5 jāyate (takes birth )5 viśiṣyate (becomes special) 5

Regular: bhavati (happens)17 paśyati (sees)12 adhigacchati (attains) 8 juhvati (offers) 5 kāṅkṣati (desires) 4 vindati (enjoys) 4 praṇaśyati (is lost) 3 tiṣṭhati (stands) 3 prakāśayati (illuminates) 3

tyajet (should sacrifice) 3 bhavet (would be) 2 āgacchet (should come)1

Total 150

Third Person Plural

viśanti (falling down)7 gacchanti (go) 6 paśyanti (see) 5 bhavanti (are) 3 nibadhnanti (bind) 3 bhajanti (pray)2

Total 26

Past Tense Singular

dṛṣṭavān (saw) 2 proktavān 2 (told) abravīt (spoke) 3

Let us look into “Accompanying Verbs” aka Gerunds. They are not the main verb in the sentence but accompany the main verb to make the sentence construction more terse.

“Having” verbs.. After having done XXX did yyy

dṛṣṭvā tu pāṇḍavānīkaḿ…. rājā vacanam abravīt [BG 1.2][Having seen Pandavas…he spoke]

dṛṣṭvā(having seen)11 tyaktvā(having relinquished)13 jñātvā(having known)12 bhūtvā (having become)9 kṛtvā (having done)8 uktvā(having said) 6 hatvā (having killed) 6 śrutvā (after hearing)3

āśritya (having resided) 3 labdhvā (having obtained) 2

Total 70

“To XXX”

… na kaśhchit kartum arhati [no one able to cause ] [BG 2.17]

na tvaḿ śocitum arhasi [you do not deserve to lament] [BG 2.27]

kartum(to do)7 śocitum(to lament)3 hantum(to kill)3 veditum(to know)2 āptum (to attain)2

Total 17

Doing XXX , I am doing yyy (Sense of accompanying verb XXX done in parallel with yyy)

paśy śṛṇvan spṛś jighran
aśnan gacchan svap śvasan …. vartante iti dhārayan [BG 5.8]

seeing, hearing, touching, smelling (accompanying verbs)…

kurvan (doing) 5 paśyan (seeing)3 smaran (remembering)3 yuñjan (uniting)3

Total 17

That was a total of 446 Verbs out of the 10K Words!!

In this manner of skimming through high frequency words, we have covered about 1600 Connectors, 900 Pronouns, 450 Verbs and 1200 Nouns. That makes it around 4K out of the 10K words in the dictionary.

And we did get a feeling of the language and the conversation style. This is absolutely IMPRESSIVE..

Noun Context Patterns

A peek into few popular noun patterns for different contexts. In Sanskrit, the context is embedded in the form of suffix. This is the wonder in Sanskrit which gives it the unique property of making the sentence valid no matter how the words are shuffled.

For beginners, I like this approach as it reinforces context patterns in a natural way. I have taken them from popular verses which follows a constant pattern.

Kartha, the Subject(aḥ)

saḥ (he) yaḥ (he who )
manaḥ (mind) yogaḥ puruṣaḥ(person)
arjunaḥ sañjayaḥ

Karma, the object (am)

patraṁ puṣhpaṁ phalaṁ toyaṁ [9.26] leaf, flower, fruit, water

Karana, the Instrument of Action: [With] (ena/eṇa suffix)

abhyāsena tu kaunteya vairāgyeṇa cha gṛihyate [6.35] With Practise, With Detachment

abhyāsena (with practise)
vairāgyeṇa (with detachment)
yogena (with Yoga)

Sense or Offering: (aya)

paritranaya sadhunam vinasaya ca duskrtam … [BG 4.8]

paritrāṇāya (to protect)
vināśhāya (to annihilate)
sansthāpana-arthāya (to establish)

Sense of Position: At Kurukshetra, At the War ..

dharma-kṣhetre kuru-kṣhetre … [BG1.1]

dharmakshetre (at the place of dharma), kurukshetre (at kurukshetra), madhye (in between)

Cause & Effect Scenarios: From X arise Y.. From Y arises Z.. Or Compared to X, Y is … Compared to Y, z is… In Sanskrit, the common noun suffix for Cause is ‘at’, ad’

krodhād bhavati sammohaḥ … [BG 2.63]

annād bhavanti bhūtāni ... [BG 3.14]

śhreyo hi jñānam abhyāsāj jñānād dhyānaṁ viśhiṣhyate

krodhāt (from anger) sammohāt (from confusion) smṛiti-bhranśhāt (from clouding of memory) buddhi-nāśhāt (from clouding of judgement)

annād (from rice) parjanyād (from rain) yajñād (from sacrifice)

Relational Words: Arjuna’s Bow, Arjuna’s Sorrow.. The common noun suffix for Relation is ‘sya’

tasya (his) dharmasya (dharma’s)

Plurality in Nouns. Let us see a few examples of different noun contexts in plural forms.

Object: indriyāṇi (sense organs) śastrāṇi (shastras) bhūtāni (elements)

Sense of Relativeness: This pattern is found consistently in verses 10.21 to 10.31 of Vibhuthi Yoga. In the description of plurality and opulances, Krishna describes “among A I am B, among C I am D, among E I am F…”

vedānāṁ sāma-vedo ’smi devānām asmi vāsavaḥ
indriyāṇāṁ manaśh chāsmi bhūtānām asmi chetanā [10.22]

I am the Sāma Veda among the Vedas, and Indra among the celestial gods. Among the senses I am the mind; Among the living beings I am consciousness.

Sense of Location: When I first heard, I was intrigued by this chapter ending line…What did these suffixes mean? They have such a rhythmic effect.

Shrimad-Bhagavad-Gitasu Upanishadsu Brahmavidyayam Yogashastre Shri Krishna-Arjuna-samvade

In the Gitas, In the Upanishads, In the Brahma Vidhya & Yoga Shastras, In the conversation between Krishna & Arjuna

Gita Dictionary analysis

We are done with the frequency analysis. Let’s change gears and inspect How many words begin with Each letter.

Any guesses on the popular Alphabets? I have represented characters as per Sanskrit Alphabet matrix (swara & vyanjana) and for simplicity, clustered the consonant pairs.

Looks like words starting with ‘a’, ‘s’, ‘m’ are very popular.. Interesting that there were no words from the retro series ‘ṭ’, ‘ḍ’, ‘ṇ’.

If we were to list the top word in each alphabet. Let us peek into the list.

aham 103 ātmā 37 iti 65 īśvaraḥ 8 uvāca 64 ūrdhvam 3 eva 173 aiśvaram 4 oṁ 4 auṣadhīḥ 1 ṛṣabha 8

karma 84 khe 1 guṇa 16

ca 392 chinna 3 jñānam 31 jhaṣāṇām 1

tat 93 deva 16 dhanam 12 na 256

pārtha 38 phalam 16 brahma 42 bhagavān 28 mām 84

yat 79 rūpam 16 loke 12 vā 25

śrī 29 ṣaṭ 2 saḥ 74 hi 67

Words of Direction, Time & Count

ūrdhva-mūlam adhaḥ-śhākham [15.1] aśhvatth tree with its roots above and branches below.
ūrdhva (down) adhaḥ (up)

namaḥ purastād atha pṛiṣhṭhatas [11.40] Salutations from front and rear
puratah (front) pṛiṣhṭhah (back)

aham ādiśh cha madhyaṁ cha bhūtānām anta eva cha [10.20]
ādi (beginning), madhya (middle), anta (end)

Other descriptors used for indicating infinity, unperishable, fore ever and plurality feel are akshaya, avyaya, anantha, sanathana, nana, bahu, aneka. These are naturally to be found in the Vibhuthi Yoga and Vishwa Roopa Darshanam Chapter.

paśhya me pārtha rūpāṇi śhataśho ’tha sahasraśhaḥ
nānā-vidhāni divyāni nānā-varṇākṛitīni cha [11.5]

śhataśhah (hundreds) sahasraśhaḥ(thousands) nana (many)

Making of Complex Words

It seems overwhelming by the sheer number of the sophisticated words. A simple trick is to take cues from prefixes suffixes and see affinity with the underlying word.

The prefixes add a sense of direction and command to the words thereby highlighting, negating, reciprocating or bringing a sense of Reinforcement. Truly this makes each word an episode by itself.

Ascertaining (forward): pravakṣyāmi (shall reveal/tell) pravṛttim (acting properly) pratiṣṭhitā (firmly established; tishta is to stand)

[pravakṣyāmi] Even the intelligent are bewildered in determining what is action and what is inaction. Now I shall explain to you what action is, knowing which you shall be liberated from all sins. 4.16

Reciprocating (backward) : pratiyotsyāmi (fight back; yuddh meaning fight)

How can I counterattack to men like Bhisma and Drona? 2.4

Sense of Circling : pariprasnena (enquiry), parityāgī (renounce), parikliṣṭam (grudging), paritrāṇāya(to protect; tra means to protect)

[pariprasnena] Inquire from him and render service unto the Spiritual Master. 4.34

Transcend (beyond): ativartate ( transcends; vartate is dwelling) atitaranti (cross overs; taranti is swim)

(Supreme/Rooted): adhigacchati (attains) adhiṣṭhānam (body; sthanam is place)

[adhiṣṭhānam] The body, the doer, the various sense organs, the many kinds of efforts and Divine Providence — these are the five factors of action. 18.14

In Proximity : upāshritya (refuge near me) upadraṣṭā(overseer or inner seer) upasaṅgamya (approaching near)

[upadraṣṭā] Yet in this body there is a transcendental enjoyer, who is the Lord, the supreme proprietor, the overseer, permitter and the Supersoul. 13.23

Generate (Give rise to): udbhavam (generate)

Presence of the quality (with): sammohaḥ (clouding of memory) samupasthitam (all present) saṁśuddhiḥ (total purification) sambhūtam (born) sambhavami (appear) sañjāyate (develops ;jata is ‘goes’)

Absence of Quality : nirahaṅkāraḥ nirvedam nirmuktāḥ nirvikāraḥ

Descend (down): avagaccha (understand; gaccha is go)
avajānanti (disregard) avajñātam (with contempt)

Follow (behind): anubandhan (bound), anukampā (compassion, kampa is vibration), anuvartante (follow) anupaśyati (see; pasyati is to see) anusmara (remember) anudarśanam (observing; darshan is to see)

Towards / transitioning : abhijānāti (get to know; jānāti is know)
abhijāyate (take birth; ja is birth) abhirakṣitam (well protected)


My knowledge of Gita meanings and recital are from the handpicked verses from my Yoga Course at SVYASA university few years back. Composed in the Anustup meter poetic style, there was a poise and resonance in its rendering. Based on what seemed like repeating patterns to my ear, I conducted this experiment on the Gita text obtained from Kaggle website and deployed Python and Data Science modules for text parsing, iterating and book keeping of the patterns. As the English character set is lossy and cannot represent all the Indian Alphabets accurately, an extended version with diacritics (IAST) that is used internationally in Sanskrit transliteration has been used for the analysis.

The techniques described are still rudimentary as it involved multi stage parsing. In the first pass, the words were listed based on descending frequency. The connector set was extracted manually from this set. The nouns owing to their different suffixes for different contexts need to be normalized which further increases the frequency. This was done in iteration 2 for the high frequency nouns. In addition, it was grouped with relevant words to make the word list more relevant. Verb list was obtained by extracting words with popular suffix endings. Due to the mixed approach, the numbers shown in this article are for representation purpose of the idea.


Normalisation of all words would reduce the variance and number of unique words drastically. In addition, bucketing the words automatically into the 4 categories and rerunning the frequency distribution would give more precise data. Currently in the absence of normalisation, it seems the distribution of 10k words goes like this. This gives an idea about the words that are not so frequent.

Unique words: 3848
On an average, a word occurs 3 times
Standard Deviation from average:10
Minimum number of occurrences of word: 1
50% of words occur only once
75% of words occur twice or less
Maximum occurance of a word was 392


A head start and language feel for the seekers of Gita. In the context of Gita High Frequency patterns, there is a decimation of effort needed to understand the whole gamut of grammar as we are mostly focussing on Male Gender words (leaving neutral and feminine), Present Tense and Singular Verbs Nouns (leaving out dual and Plural to an extent). The Gerunds are very nicely woven into the verses.

It could be used in the field of NLP and Education in Schools and Universities. For instance, Visualizations and grammar weaved around a topic makes the subject easier to study. Top 20 words from each alphabet, top 20 verbs, top 20 nouns give a theme around learning.

The idea used could be extended to Visualise messages and Language styles of each Veda, Ramayana, Patanjali Sutras, Lalitha Sahasranama , Works of Shankaracharya etc.

There are subtler aspects like highlighting Acoustic Patterns in words, Anustap and other meter visualisations in sub verses, Chapter wise sentiments, Verse wise sentiment, Speciality about first-last word of a Verse, Classifications etc.

About the Author

Sujatha Ratnala enjoys novel teaching methodologies, exploring Linguistic Patterns and Phonetic Sciences. An engineer by profession, she has authored the book ‘Samskruthe Yogashala-Yoga Class in Sanskrit’ and authored several articles in Medium on Science, Technology, Mathematics, Intuition, Patterns and Yoga.



Sujatha R

I write.. I weave.. I walk.. कवयामि.. वयामि.. यामि.. Musings on Music, Linguistics & Patterns