Skip to content

input methods for search function in GRETIL

Transcriptions from Indian languages make extensive use of diacritic marks, e.g. ś, ṣ, ā, ṝ. The standard transliteration scheme is IAST: https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration

To facilitate search queries, many online resources allow alternative transliterations that avoid these diacritic marks. The most widely used of these alternative transliterations is the Harvard-Kyoto system (https://en.wikipedia.org/wiki/Harvard-Kyoto).

For an example see: https://www.sanskrit-lexicon.uni-koeln.de/scans/PWScan/2020/web/webtc/indexcaller.php The Cologne Sanskrit Lexica allow for different input systems, but Harvard-Kyoto is the standard one.

In order to look up the word ‘kāraṇa’ one would type in ‘kAraNa’.

Some websites do not use Harvard-Kyoto and instead allow for both an input with diacritic marks and an input without diacritic marks.

For an example see: https://dsal.uchicago.edu/dictionaries/pali/

In order to look up the word ‘kāraṇa’ one would simply type in ‘karana’.

GRETIL would greatly benefit from being able to choose between or allow for different search inputs. Ideally, one would be able to select an input system in a similiar way to the Cologne Sanscrit Lexica (https://www.sanskrit-lexicon.uni-koeln.de/scans/PWScan/2020/web/webtc/indexcaller.php):

  1. Kyoto-Harvard: (only) ‘kAraNa’ yields (only) ‘kāraṇa’
  2. Roman Unicode: (only) ‘kāraṇa’ yields (only) ‘kāraṇa’
  3. ~informal: ‘karana’ yields ‘kāraṇa, karaṇa, kāraṇā ...’

If such a selection between different inputs, specifically for the GRETIL-corpus, is not possible due to the general architecture of TextGrid (or repercussions on other corpora), then variant 3. ~informal seems slightly better than having to type in the diacritically precise form as in variant 2. Roman Unicode, since, at least in the case of Sanskrit, words sometimes drastically alter their phonological appearance due to grammatical effects, although this might be mitigated by the use of ‘*’ and ‘?’, which is of course already possible.