Google Translate has been the butt of many jokes since its inception. However, things are changing and within years Google could potentially own the best and most comprehensive machine translation technology in the world. Best of all Google are sharing this with you and the world.
At the Google I/O, Josh Estelle and Rohit Khare gave a lecture on what is now called the Translate API (application programming interface) and how to use this when going global, i.e for things like website translation, etc.
In the video below, they tell explain why translation is important, how machine translations in Google Translate are carried out and how the Translation API can be applied to future translations. Have a watch, but if you prefer words, we have summarised some of the main points below.
Google: The Importance of Machine Translation
Estelle starts off by stating that even if you speak multiple languages, translation is not an easy task – it is time-consuming and requires a lot of knowledge. If humans are already struggling with translation, you can only imagine how hard it is to teach a computer how to carry out the task!
However, according to Estelle, Google happily took on this challenge.
The first machine translations stem from over sixty years ago. In fact, Estelle says translation was the first non-numerical task that people tried to have computers carry out. In 1954, IBM collaborated with Georgetown in a project to develop machine translation. They even held a public demonstration on the IBM 701, the first commercially available translation machine.
Of course, the machine translations that were carried out back then weren’t as accurate or as long as the ones translation devices can produce today, but it made people enthusiastic about the concept and propelled research. Researchers believed machine translation would be perfected in three to five years. Let’s say that took a little longer than expected…
However, Estelle believes that in the the last ten to fifteen years, machine translation improved drastically. This has a number of reasons:
• The computational power of electronic devices has grown remarkably. Computers and even smartphones are capable of much more than a researcher in the 1950s could ever imagine.
• Currently, there is a lot of data available: Google Translate uses all the data available on the web to perfect their service.
• Research has led to statistical machine translation. This type of translation is not based on grammar rules, but uses statistics and probability to “learn” a language.
Google Translate – A Brief History
Estelle starts off by stating Google’s mission:
To organize the world’s information and make it universally accessible and useful.
At Google, machine translation is regarded as an essential part of making information universally accessible. Without machine translation, Estelle says, a lot of content on the web would not be available to the majority of the world’s population, especially to those who do not speak English.
Google Translate has been a Google service since 2001. Although Google itself only existed for 3 years, the need for machine translation was already apparent to the owners of the company. Initially, the service only employed 5 languages, but in 2003, this number had already grown to 8. For the first few years, the technology of the service was licenced from a third party. The software was rule-based, and even though the quality of the translations was better than those of the first machine translations, it left a lot to be desired.
In 2003, developers decided to take a “Google approach” to machine translation, which meant a lot of data was gathered and the computational power was intensified. Two years later, the company participated in the NIST (National Institute for Standards in Transaltion) machine translation evaluation. The result? Google Translate was significantly better at producing accurate translations than its competitors. A data approach to translation thus proved to be very effective. However, the service didn’t carry out translations at top speed: for the 1000 sentences that had to be translated for the evaluation, 40 hours were needed.
In 2006, Google Translate was launched to the public. Initially, the service was only available in Russian and Arabic, but this gradually expanded. Moreover, before 2007, Google Translate could only carry out translations to and from English, but since 2008, any language pair is possible. The number of languages available still continues to grow: in fact, the number is growing so fast that the 66 languages that were stated on Estelle’s slide is no longer correct! There are now 71 languages which can be used in the service. Only a few weeks ago, Estelle says, options for Bosnian and Javanese became available.
How Does Google Translate Work?
Estelle says that people and machines use very different ways to “learn” translation: humans learn a language by learning grammar rules and vocabulary, while computers use a whole different approach when learning statistical machine translation. Here, devices are fed with a great deal of examples and derive the rules of a language from these texts. The computer compares pieces of text, and when the same target word or character is encountered on a number of occasions, the software calculates which source word goes with it.
To ensure the translation is correct, this method is applied to billions of documents, which are preferably “parallel documents,” i.e. texts of which the original has been translated or localized into other languages. Especially the documents from for example the United Nations and the European Parliament have proven to be a great resource for these type of texts. However, the main part of the data is derived from the web. There is a slight problem that arises there: as Estelle puts it, the data on the web is a “mess.” However, the Google Translate system is so intricate that it can still use it.
Google Translate uses a phrase-based translation model. This is done by assigning certain probabilities to certain aspects of language. First, segmentation is needed: how can you break up a sentence, what parts go together? Then, there is translation: a phrase translates to another one. Thirdly, distortions occur: the phrase order in two languages can differ. Google Translate uses these “rules” for their translation model. When a new text is then entered in Google Translate, the service searches their data for possible translations and chooses the most probable option.
Google Translate is available as a separate service, but also in for example Chrome, where websites can be translated in an instant. According to Estelle, this changes the way people use the internet: language barriers are slowly but surely broken down. Over 200 million users use some form of Google Translate today, with 92% of the people that use the service coming from outside of the US. The service carries out 1 billion translations a day.
The service might be quite elaborate already, but Google isn’t stopping here! Estelle says the company is planning on making the service available in even more languages and continue to improve the quality of the translations. Moreover, Google is aiming for a ubiquitous service and real time translations.
How to Use the Translation API
After Estelle’s lecture, Rohit Khare takes over to tell the audience a little more about the Translate API. Google uses Google Translate both for their users products and for their internal applications. It is for example used in international chats, where users can now all understand each other.
According to Khare, the core technology of Google Translate is “open for business.” It is part of the Google Cloud Platform. The costs for the Translation app in this cloud are 20 dollars per million characters translated. In comparison, one dollar for localization often will get you no further than one or two words. Khare thus believes the price is very reasonable.
Google Translate determines the source text by itself and often carries out automatic HTML parsing as well. The Translate API is also available in for example Google Apps Script. Khare believes the most important things clients must realize for the Translate API is that it employs the two most key features of statistical machine translation: language detection and language translation between the languages Google supports.
Google is still interested in launching products for free, but they have products for commercial use as well. There are two big pillars on which these commercial products rest. Attribution, the awareness of humans that these are machine translation, is one, the other one is that machines know that these are machine translations as well.
The Translate API is mainly used by developers and especially mobile app developers are discovering the benefits of the service. However, many business process applications employ the technique as well. Khare reveals that at Google itself, for example, bug reports, etc are translated back to English by using the API. In addition, the service is also used by media monitoring applications and hybrid translation companies.
These are all examples of usages in the “real” world, but the Translate API is mainly used on the web. Here, it is for example used in e-commerce reviews. Especially hotels and restaurants are avid users of the service, which makes reviews available for many different potential customers. The API is also used for multimedia content, language learning and social communities.
You might think human translation is always favoured over machine translation: according to Khare, it is not. There are cases where machine translation is the better option:
• If you only want to get the gist of a text, machine translation might be better. Human translation is more expensive, so if you only want to know the main points of a text, machine translation might be the way to go.
• Your reach can greatly improve with machine translation.
• The amount of text you can translate is much greater with machine translation.
• If the user has requested the translation and knows it is presented a machine translation, it is a great option as well.
Khare concludes by saying that the Google translation services are a great way of helping your business when entering the global market. It can present new opportunities in the form of new markets, growth, compliance and it can help create new connections. It is definitely and asset for your company!
Have you changed your mind about Google Translate? Can you see how your business could use it? Check out our other blog post Translation, Internationalization and Localization: Google’s Tips for Taking Your Website Global which can point you in the right direction of where to use Google’s localization and translation tools to support your business.