
The Brand New BBT Dictionary App!
An aid to BBT production: The BBT produces books in many languages. NecesÂsity is the mother of invention, and necessity has left us venturing into machine learning, neurolinguistic programming, and a fusion of science, computational linguistics, and Srila Prabhupada’s books. Years ago, we had fairly stable editorial teams in most of our languages. Nowadays devotees are more mobile in their services, and we’ve had to figure out ways to help ever new editorial teams use our layout software for hyphenating and finalizing their files – work best done by native speakers.
We’re working hard at developing our text repository – better known as a multilingual parallel text corpus. Combining the text repository with machine learning, natural lanÂguage processing, dictionaries, databases, and other tools is allowing us a fuller control over the huge amount of text we handle on a yearly basis. To deal with the immediate problem mentioned above and to improve all our production processes, Hare Krishna Dasa designed a dictionary utility that creates a list of unique words from a set of BBText files. It then pre-hyphenates those words using OpenOffice hyphenation dictionaries. The hyphenÂated words are then used by the BBT’s proprietary software, PP/HJ, to create hyphenated text files that can go straight to proofreading. We’re also building pre-hyphenated Sanskrit dictionaries to make the end result cleaner for the editorial teams.
We’ve been gradually building the multilingual paralÂlel parsable text corpus mentioned above. A corpus is more than a collection of book files; it includes tagged text files, dictionaries pulled from every word in a text, and dictionaries that recognize the roots and stems of words. Such dictionaries will make translation easier and allow us to develop basic synÂonym lists to aid automatic translation, parallel viewing, spellchecking, and all sorts of useful grammatical analysis. We’ll also parse proper nouns, allowing us to, say, connect dictionary or glossary entries to those words in ebooks and apps. We’ll also make Sanskrit terms and their equivalents parsable.
All this will allow side by Âside comparison of a book in multiple languages or of different editions in a sinÂgle language at paragraph or even sentence level. In our book apps, you’ll be able to swipe from Russian to English to German and find yourself on the same paragraph in each language. For those on editorial teams who are asked to do fidelity checks, this type of parallel analysis between languages can be quite useful. So although our internal app development might seem more pedantic than practical, we’re excited about it! Our dictionary app has a number of practical applications, not the least of which is that it will make BBT source texts accessible for translation, verificaÂtion, quality control, progress control, version control, audio overlay, and exporting to a variety of formats.
Our most recent iteration of the dictionary app works with these languages: Afrikaans, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English (British and American), Estonian, French, Galician, German, Greek, Hungarian, Icelandic, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese (Brazilian and European), Romanian, Russian, Sanskrit, Serbian (Cyrillic & Latin), Slovak, Slovenian, Spanish, Swedish, Telugu, Ukrainian, and Zulu.
Planned for the next version: Adding machine learning features so that the app will recognize a word’s language, including whether a word has been imported from another language, is pure Sanskrit, or is Sanskrit declined according to the rules of a particular language. The app will then spellcheck and pre-hyphenate accordingly.
