Abstract: Have you ever wondered how realistic Tolkien’s languages really are? Or how might they differ statistically from natural languages? Chances are, you haven’t, but we’re here to tell you anyway. In this blog, we investigate the verisimilitude of Quenya, one of Tolkien’s Elvish dialects. Specifically, we consider how the Quenya translation of the New Testament differs from other natural language translations in terms of Shannon’s information entropy.
People construct fictional languages to add a semblance of reality to fictional worlds. We have Dothraki and High Valyrian from Game of Thrones, Klingon from Star Trek and Quenya (amongst about eight others) from Tolkien’s Lord of the Rings universe. Tolkien may be considered the father of fictional languages. He spent considerable time and effort over many years making his languages as realistic as possible. They have histories, they have etymologies, they even have uncertainties – just like a naturally evolved language.
They are not natural languages, and their evolution has been simulated by an individual. So how might we identify hidden differences? Languages evolve through a more-or-less random process, and humans are not good at the manual simulation of random processes. If you ask someone to write down a coin tossing sequence, they are unlikely to have long strings of heads or tails in their sequence, despite the statistical plausibility of this outcome. Our ultimate question: can we catch Tolkien doing something similar?
As it happens, if you look at the information entropies of a text in elvish, compared with the information entropies of natural language translations of the same text, some differences occur. The text we picked is Fuskanger’s (2015) Neo-Quenya translation of the New Testament (Neo-Quenya being Quenya + neologisms added by the translator), and we will be comparing this with the New Testament’s King James (1611, revised 2004), New International (1978), and Biblia Sacra juxta Vulgatam Clementinam (1592) translations. The third one is the 16th century revision of the original AD 382 Latin translation. We found entropies for the following 10 sections of each translation:
- New Testament
- Gospel of Matthew
- Gospel of Mark
- Gospel of Luke
- Gospel of John
- Acts of the Apostles
- Paul’s Letters
A quick sidenote on entropy: Developed in 1948 by Claude Shannon, entropy is a measure of information rate (Shannon, 1948). In our case, we are looking at the average number of bits per word in translations of the New Testament and its segments.
What we found:
The New International and King James translations have similar entropy distributions (their medians fall within each other’s 95% confidence interval). The distributions of Neo-Quenya and Latin entropies are more similar to each other than to the English translations, but are not statistically similar because their medians fall outside of each other’s 95% confidence intervals. The split into English and non-English languages is probably due to inflection – English is very weakly inflected, whereas Latin and Neo-Quenya are heavily inflected.
So, is Quenya statistically dissimilar to natural languages? Maybe. The difference between Latin and Neo-Quenya could be because Latin has evolved and Quenya is constructed, or because Quenya is not Latin. To be more certain, we’d need to include comparisons with other heavily inflected languages that are also not Latin.
Biblia Sacra juxta Vulgatam Clementinam (1952). Trans. by Jerome. https://www.wilbourhall.org/pdfs/vulgate.pdf.
I Vinya Vere: The New Testament in Neo-Quenya (2015). Trans. by H.K. Fauskanger. https://folk.uib.no/hnohf/nqnt.htm.
King James Version (2004). www.turnbacktogod.com/king-james-bible-kjv-bible-as-pdf/. first translated 1611.
New International Version (1978). www.turnbacktogod.com/wp-content/uploads/2011/02/NIV-Bible-PDF.pdf.
Shannon, Claude (1948). “A mathematical theory of communication”. In: The Bell System Technical Journal 27 (3), pp. 379-423.
The University of Adelaide