Google Translate – Statistical Translation
- Google translate used Systran’s engine until 2007 when it developed its own proprietary
statistical translation engine.
- Statistical translation uses phrase correlation between known pairs of pre-translated parallel texts
- Statistical translation consists of three basic steps:
- Sentences from input language are broken down into small phrases
- Google searches for each phrase in a pair of pre-translated documents
- It then it goes to the parallel translated documents and looks up the corresponding phrase
- These are reassembled and given as output.
- This leverages Google’s powerful search engines and massive computing power
- Recently, Google uploaded approximately 200 billion words of parallel translated documents
from United Nations archives to train their system. This has resulted in a significant improvement
in translation accuracy.
- The drawback of this approach is that it does not apply explicit grammatical rules, since its
algorithms are based on statistical analysis rather than hard coded rule-based analysis
- The main benefit of this approach is that Rule-based translation systems require the manual
development of linguistic rules, which is costly, and does not carry over to other languages.
Meanwhile, Statistical based systems are not tailored to any specific pair of languages they
simply need big bodies of parallel text to train from.
- That is the reason Google has 60 languages and Babelfish only has 14 even though it’s been in
operation for significantly longer.
Google Translate:http://translate.google.com/#