Google Translate – Statistical Translation

  • Google translate used Systran’s engine until 2007 when it developed its own proprietary
    statistical translation engine.

  • Statistical translation uses phrase correlation between known pairs of pre-translated parallel texts

  • Statistical translation consists of three basic steps:
    1. Sentences from input language are broken down into small phrases
    2. Google searches for each phrase in a pair of pre-translated documents
    3. It then it goes to the parallel translated documents and looks up the corresponding phrase
    4. These are reassembled and given as output.

  • This leverages Google’s powerful search engines and massive computing power

  • Recently, Google uploaded approximately 200 billion words of parallel translated documents
    from United Nations archives to train their system. This has resulted in a significant improvement
    in translation accuracy.

  • The drawback of this approach is that it does not apply explicit grammatical rules, since its
    algorithms are based on statistical analysis rather than hard coded rule-based analysis

  • The main benefit of this approach is that Rule-based translation systems require the manual
    development of linguistic rules, which is costly, and does not carry over to other languages.
    Meanwhile, Statistical based systems are not tailored to any specific pair of languages they
    simply need big bodies of parallel text to train from.

  • That is the reason Google has 60 languages and Babelfish only has 14 even though it’s been in
    operation for significantly longer.

 

Google Translate:http://translate.google.com/#

 

Smart-Translator