Google Translate and other current machine translation programs are based on bilingual corpora, i.e., collections of translated texts. They translate a text by breaking it into bits, finding similarities in the corpus, selecting the corresponding bits in the other language and then stringing the translation snippets together again. It works surprisingly well, but it means that current machine translation can never get better than existing translations (errors in the corpus will get replicated), and also that it’s practically impossible to add a language that very few translations exist for (this is for instance a challenge for adding Scots, because very few people translate to or from this language).
My prediction is that the next big break-through in computational linguistics will involve deducing meaning from monolingual corpora, i.e., figuring out the meaning of a word by analysing how it’s used. If somebody then manages to construct a computational representation of meaning (perhaps aided by brain research), it should then theoretically be possible to translate from one language into another without ever having seen a translation before, by turning language into meaning and back into another language. I’ve no idea when this is going to happen, but I presume Google and other big software companies are throwing big money at this problem, so it might not be too far away. My gut feeling would be 10–20 years from now.
Interestingly, once this form of machine translation has been invented, translating between two language varieties will be just as easy as translating between two separate languages. So you could translate a text in British English into American English, or formal language into informal, or Geordie into Scouse. You could even ask for Wuthering Heights as J.K. Rowling would have written it.
Also, the computer could be analysing your use of language and start mimicking it – using the same words and phrases with the same pronunciation. In effect, it could start sounding like you (or like your mum, Alex Salmond or Marilyn Monroe if you so desired).
This will have huge repercussions for dialects and small languages.
At the moment, we’re surrounded by big languages – they dominate written materials as well as TV and movies, and most computer interfaces work best in them. It’s also hard to speak a non-standard variety of a big language, because speech recognition and machine translation programs tend to fall over when the way you speak doesn’t conform. Scottish people are very aware of this, as shown by the famous elevator sketch:
However, if my predictions come out true, all of that will change. As soon as a corpus exists (and that can include spoken language, not just written texts), the computer should be able to figure our how to speak and understand this variety. Because translation is always easier and more accurate between similar language varieties than between very different ones, people might prefer to get everything translated or dubbed into their own variety. So you will never need to hear RP or American English again if you don’t want to – you can get everything in your own variety of Scottish English instead. Or in broad Scots. Or in Gaelic.
Every village used to have its own speech variety (its patois to use the French term). The reformation initiated a process of language standardisation, and this got a huge boost when all children started going to school to learn to read and write (not necessarily well, but always in the standard language). When radio was invented, the spoken language started converging, too, and television made this even more ubiquitous. We’re now in a situation where lots of traditional languages and dialects are threatened with extinction.
If computers start being good at picking up the local lingo, all of that will potentially change again. There will be no great incentive to learn a standard variety of a language if your computer can always bridge the gap if other people don’t understand it. The languages of the world might start diverging again. That will be interesting.