Think of 5 things without which you could not live. Chances are they’re around you right now. It is very likely that your computer or your mobile phone or even both are on that list. What made screens ubiquitous in everyday life was essentially our desire to consume, share and spread written information. Even in our hyperconnected world, we remain animals of old habits: we like stories.
An important part of what we do online is to read and write these stories. We read and share news and blogs, we comment on forums, we read reviews and compare products before buying them, we write emails, we post our own story every day. Every day billions of words are written, shared , " tweeted", posted. Shared experiences (Facebook), shared travel (TripAdvisor), shared knowledge (Quora, Wikipedia), shared opinions about products (Amazon)... We sell, trade and share our goods (eBay), our houses (Airbnb), our cars (Uber), our time and professional knowledge, free and paid (Fiverr, oDesk, Elance).
Language: the final frontier
There is a new global economy based on sharing without frontiers, and yet we are limited by language barriers. The experience of a netizen fluent in English is very different from the experience of someone who speaks only Portuguese, Spanish, Russian. The communication and information sharing between people who do not share a common language is very difficult, if not impossible. And now that the Internet has several decades of existence, now that we have found so many and so diverse uses for it, now that we have the ability - in theory - to communicate with people across the world, it's time to break the language barrier through technology, through sharing and inevitably, via our computer and mobile phone. Google Translate is perhaps the most widely used translation software in the world today. Although there is no official data, we can assume that billions of words are translated every day using this software. Google Translate uses a statistical method to translate. Which means that the software analyzes millions of translations - originally made by human beings - of books, official documents and websites to find patterns and predict the likelihood that the English word "car" is equivalent to French "voiture", or whether "right", in a given context, should be translated by "droit" or "juste". For each word the software chooses the option which seems the most probable. This probabilistic analysis includes elements of artificial intelligence, to the extent that the software is learning to translate better every time it is used. Machine translation today produces a reasonable quality translations in many language pairs, English to French, Spanish or Portuguese for example. Translation into asian languages on the other hand still gives dismal results. And anyway, reasonable is not enough! Certainly you have experienced this already: an automatic translation felt good enough to understand that email written in another language. But did you feel comfortable enough to reply using the machine translation without fear of sounding unnatural or just plain wrong? Probably not. Is there a solution for this?
Moving beyond statistical translation tools
The online job market for translation has never been so big. Several companies are trying to reinvent the ancestral work of the translator and bring it to the internet. However, despite all the tools and aids that translators have gained over the past few years, translation is still viewed more as an art than a science, as work that can only be done by a community of dedicated and experienced professionals. But there is actually a much larger community: according to Wikipedia, there are more bilingual people in the world than monolingual. Surprising, right? What if there were a way to harness the power of all these bilingual people, people who are willing to share their knowledge of languages, cross it with the artificial intelligence of software, to create a platform able to translate millions of words each day, breaking linguistic barriers and breaking the frontiers created by languages? This is what we are trying to do with Unbabel: to develop a technology for this approach.
Challenge accepted!
To achieve this, many and varied pieces of technology are needed: a statistical translation engine that learns from each translation, an algorithm that delivers each translation task to the right person according to languages he or she understands, and also according to the reputation they earned on the platform, the quality of their work, the topics that interest them, etc.
We would also need to display both the original text and the translated text, an advanced text editor, the possibility to choose between different alternatives for a given word and have this action impacting the rest of the text, a software to collect millions of texts which require translation. All this in a frictionless product, where money exchange (when it happens) is arbitraged and where users can have fun, all 3 ingredients for a successful sharing economy product.
The challenge is huge: imagine a world where we could conserve every different language and culture, help them thrive everywhere, and yet where we would simultaneously be able to communicate universally. The pieces of technology required for this are already being developed by universities and companies. They only work with human intervention, with the precious help of the translators and bilingual community. This integration is maybe the key, Man and Machine working together to create a smaller, more connected world. Guest post written by Vasco Pedro.
Founder and CEO of Unbabel, I am passionate about creating amazing products by combining a set of extraordinarily talented individuals, a thorough problem understanding, scalable solutions and a lot of hard work. I have a PhD in Language Technologies from Carnegie Mellon.