Romanization is a common strategy for text entry in many writing systems, given the simplicity of typing with a Latin alphabet relative to the native script, particularly on mobile devices. The best known example of this is the Pinyin system for Chinese, but it is also widely used for South Asian writing systems such as Devanagari, Tamil, and Thai, as well as for Arabic, Persian, and Japanese, among others. In this talk, I will present an extension to a mobile keyboard input decoder based on finite-state transducers that provides general transliteration support, and demonstrate its use for input of South Asian languages using a QWERTY keyboard. On-device keyboard decoders must operate under strict latency and memory constraints, and we present several transducer optimizations that allow for high accuracy decoding under such constraints. Our methods yield substantial accuracy improvements and latency reductions over an existing baseline transliteration keyboard approach. The resulting system was launched for 22 languages in Google Gboard in the first half of 2017.
Brian Roark is a research scientist at Google since 2013.
More information: http://www.lanzaroark.org/brian-roark