Scots on Smartphones

Writin Scots uisin SwiftKey’s preditive keyboard.
A’ve been fasht for a lang time at predictive keyboards wadna recognise Scots ava – ilka time ye uised a perfecklie normal wird, it wad get chynged tae a completelie different Inglis wird at juist happent tae leuk similar.

Sae A wis weel chuft whan ane o ma clients, Scottish Language Dictionaries, gat a email fae Julien Baley fae SwiftKey (a Lunnon-based companie awnt bi Microsoft) twa-three months syne anent addin Scots tae thair predictive keyboard for Android an iOS. A dae aw the data stuff for SLD, sae o coorse A wis chosen tae wirk wi Julien on this.

A extractit the relevant bits fae the new edition o the Concise Scots Dicionary an sent this tae Julien. Forby, A gied him a earlie version o a corpus (a collection o texts) o modren Scots. He separatelie contactit Andy Eagle and gat the heidwirds fae his Online Scots Dictionary.

Suin efter this, Julien sent me the first version o the keyboard. At this pynt, it daedna ken the Scots inflections, an it wis makkin some unco substitutions (e.g., aA oweraw), sae A advised him on the grammar o Scots an on the substitutions. The final bit wis tae leuk at wirds he fund in the corpus at wisna in the dictionars, an the keyboard wis redd.

Ye can doonlaid SwiftKey on yer Android smartphone the day, but gin ye hae a iPhone, ye maun wait few mair days (technical issues pat it aff).

SwiftKey will lair fae the wey fowks uise it, sae it’ll get better and better.

A howp this will see monie mair fowks writin Scots wi confidence, an ultimatelie tae better support for Scots in programs an on wabsteids. Wad it no be great gin Scots wis supportit in yer spellchecker, in Google Translate, and as Facebook interface leid?

PS: A wis chuft tae sae stories aboot this in The National, The Herald an Bella Caledonia.

AlphaDiplomacy Zero?

diplomacy game photo
Photo by condredge
When I was still at university, I did several courses in AI, and in one of them we spent a lot of time looking at why Go was so hard to implement. I was therefore very impressed when DeepMind created AlphaGo two years ago and started beating professional players, because it was sooner than I had expected. And I am now overwhelmed by the version called AlphaGo Zero, which is so much better:

Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.

It is able to do this by using a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. The system starts off with a neural network that knows nothing about the game of Go. It then plays games against itself, by combining this neural network with a powerful search algorithm. As it plays, the neural network is tuned and updated to predict moves, as well as the eventual winner of the games.

I’m wondering whether the same methodology could be used to create a version of Diplomacy.

The game of Diplomacy was invented by Allan B. Calhamer in 1954. The seven players represent the great powers of pre-WWI Europe, but differently from many other board games, there are no dice – nothing is random. In effect it’s more like chess for seven players, except for the addition of diplomacy, i.e., negotiation. For instance, if I’m France and attack England on my own, it’s likely our units will simply bounce; to succeed, I need to convince Germany or Russia to join me, or I need to convince England I’m their friend and that it’ll be perfectly safe to move all their units to Russia or Germany without leaving any of them behind.

Implementing a computer version of Diplomacy without the negotiation aspect isn’t much use (or fun), and implementing human negotiation capabilities is a bit beyond the ability of current computational linguistics techniques.

However, why not simply let AlphaDiplomacy Zero develop its own language? It will probably look rather odd to a human observer, perhaps a bit like Facebook’s recent AI experiment:

Well, weirder than this, of course, because Facebook’s Alice and Bob started out with standard English. AlphaDiplomacy Zero might decide that “Jiorgiougj” means “Let’s gang up on Germany”, and that “Oihuergiub” means “I’ll let you have Belgium if I can have Norway.”

It would be fascinating to study this language afterwords. How many words would it have? How complex would the grammar be? Would it be fundamentally different from human languages? How would it evolve over time?

It would also be fascinating for students of politics and diplomacy to study AlphaDiplomacy’s negotiation strategies (once the linguists had translated it). Would it come up with completely new approaches?

I really hope DeepMind will try this out one day soon. It would be truly fascinating, not just as a board game, but as a study in linguistic universals and politics.

It would tick so many of my boxes in one go (linguistics, AI, Diplomacy and politics). I can’t wait!

The future belongs to small and weird languages

tlingit photo
Photo by David~O
Google Translate and other current machine translation programs are based on bilingual corpora, i.e., collections of translated texts. They translate a text by breaking it into bits, finding similarities in the corpus, selecting the corresponding bits in the other language and then stringing the translation snippets together again. It works surprisingly well, but it means that current machine translation can never get better than existing translations (errors in the corpus will get replicated), and also that it’s practically impossible to add a language that very few translations exist for (this is for instance a challenge for adding Scots, because very few people translate to or from this language).

My prediction is that the next big break-through in computational linguistics will involve deducing meaning from monolingual corpora, i.e., figuring out the meaning of a word by analysing how it’s used. If somebody then manages to construct a computational representation of meaning (perhaps aided by brain research), it should then theoretically be possible to translate from one language into another without ever having seen a translation before, by turning language into meaning and back into another language. I’ve no idea when this is going to happen, but I presume Google and other big software companies are throwing big money at this problem, so it might not be too far away. My gut feeling would be 10–20 years from now.

Interestingly, once this form of machine translation has been invented, translating between two language varieties will be just as easy as translating between two separate languages. So you could translate a text in British English into American English, or formal language into informal, or Geordie into Scouse. You could even ask for Wuthering Heights as J.K. Rowling would have written it.

Also, the computer could be analysing your use of language and start mimicking it – using the same words and phrases with the same pronunciation. In effect, it could start sounding like you (or like your mum, Alex Salmond or Marilyn Monroe if you so desired).

This will have huge repercussions for dialects and small languages.

At the moment, we’re surrounded by big languages – they dominate written materials as well as TV and movies, and most computer interfaces work best in them. It’s also hard to speak a non-standard variety of a big language, because speech recognition and machine translation programs tend to fall over when the way you speak doesn’t conform. Scottish people are very aware of this, as shown by the famous elevator sketch:

However, if my predictions come out true, all of that will change. As soon as a corpus exists (and that can include spoken language, not just written texts), the computer should be able to figure our how to speak and understand this variety. Because translation is always easier and more accurate between similar language varieties than between very different ones, people might prefer to get everything translated or dubbed into their own variety. So you will never need to hear RP or American English again if you don’t want to – you can get everything in your own variety of Scottish English instead. Or in broad Scots. Or in Gaelic.

Every village used to have its own speech variety (its patois to use the French term). The reformation initiated a process of language standardisation, and this got a huge boost when all children started going to school to learn to read and write (not necessarily well, but always in the standard language). When radio was invented, the spoken language started converging, too, and television made this even more ubiquitous. We’re now in a situation where lots of traditional languages and dialects are threatened with extinction.

If computers start being good at picking up the local lingo, all of that will potentially change again. There will be no great incentive to learn a standard variety of a language if your computer can always bridge the gap if other people don’t understand it. The languages of the world might start diverging again. That will be interesting.

Being the god of kitchen plans


I once took a very interesting postgrad course in genetic algorithms (taught by Zbigniew Michalewicz), and since then it’s been a technique I occasionally pull out of my sleeve.

For a number of years I’ve been producing kitchen plans telling all members of the family when they’ve got to cook, set the table, fill the dishwasher and tidy up the kitchen. In my experience, it’s very hard to get kids to do anything if you try to convince them on the day that’s it’s their turn, but if you put up a plan on the fridge, they’ll moan once the day you put it up and will then get on with it.

However, in a family as big as ours, making such a plan is a very difficult problem. For instance, I can’t cook on Mondays, nobody should do anything on their birthday, Marcel is only here during holidays, and nobody should do two things on the same day or the same task two days in a row, etc., etc. To make it even worse, if anybody has skipped a task the month before (or done too much), the new plan should take it into account.

For a long time, I’ve been using an old-fashioned program to generate these plans, and its complexity has grown and grown with time. At the same time, the quality has decreased because it was simply getting too hard for the computer to satisfy all the competing requirements.

I therefore decided to employ genetic algorithms.

I created a world consisting of 27 islands, each with a population of kitchen plans (initially they were all random, but consisting of the right number of tasks for each person). I then let them live their rich and satisfying lives, having sex, producing offspring (sometimes with random mutations), and as a good Darwinist I ensured only the fittest individuals survived to have kids. After 200 generations, I took the fittest plan from each island, put them all on a new island of champions and let them evolve for another 1000 generations. I then took the fittest plan of all time, put it on our fridge, and extinguished the world. Basically I was the God of kitchen plans for ten minutes.

The result is so much better than the old plans, but then this type of problem really lends itself to the genetic approach: It’s very easy to assign a fitness value to a plan (-5 points if I cook on a Monday, -2 points for a repeated task, etc.) — in fact it’s much easier than actually producing a plan — and it’s also easy to figure out how to breed plans (e.g., take all the cooking tasks from one plan and combine it with the other tasks from another one).

That’s basically the basis for successful evolution: If it’s hard to calculate the right answer, but you can rank individuals by fitness and figure out how to let the fit ones have sex, then your solution will evolve over time, and as a bonus, you will be a god for a little while.

Converting photos into embroidery

I recently discovered that VistaPrint have started selling embroidered polo shirts.

However, their embroidery program is rather fussy with regards to the images it can deal with, so you can’t just upload a normal photo. As they write, there must be “no small lettering or tiny detail” and “no photographic imagery or gradients”.

So what do you do if you want to get a photo embroidered? Here’s what I did to the photo below:

Marcel before and after embroidery.
Marcel before and after embroidery.

I first opened up a photo in the Gimp, cut out Marcel’s head and placed it on a white background. The result is the photo on the left.

I then opened up this photo in Inkscape and selected Path->Trace Bitmap. I then selected Colours and specified a low number of colours. Some of the resulting colours were rather similar, so I changed them to something very different. Finally I exported it as a bitmap.

This bitmap was now acceptable to VistaPrint, so I could change the colours back to something more reasonable, and their embroidery preview is shown on the right. In many cases, their program will still complain, so you might need to simplify the paths in Inkscape, reduce or number of colours, or use a simpler photo to start with.

So long as you start with a reasonable photo, you should be able to create a beautiful embroidered polo shirt in this way. Have fun!

Will Google Glass lead to Google Glove?

Google Glass Prototype
Google Glass Prototype, a photo by Ars Electronica on Flickr.
I’m really excited by the upcoming release of Google Glass. Until now, there’s been an inherent conflict in smartphone design between creating a huge screen and making the device small enough that you can be bothered carrying it around at all times. (It’s interesting how mobile phones were getting smaller and smaller until the advent of smartphones meant a larger screen was required, at which point they started growing again.)

Although Google Glass is looking great, I’m sure it’ll evolve rapidly over the next few years. Apart from increasing the resolution, I expect it to expand from one eye to two, to allow for three-dimensional display. I also wonder whether it’s really the best idea to put the display above the line of vision rather than below it — if you’re using it for reading a book, surely it must feel like holding the book above your head.

However, the main area for improvement is how you interact with it. Google Glass apparently requires you to touch the frame to control it, which is essential one-dimensional and tiring. However, traditional devices such as keyboards, mice and touch-screens are not going to be very effective, either. I’m not sure what they’ll come up with, but I won’t be surprised if Google Glass 3D in 2016 will be accompanied by a Google Glove (or perhaps just by small sensors on your finger nails).

The spirit was willing but the flesh was weak

When I was a linguistics fresher back in 1990, we were told a well-known anecdote about the early days of machine translation: When the sentence “The spirit is willing, but the flesh is weak” (an allusion to Mark 14:38) was translated into Russian and then back to English, the result was “The vodka is good, but the meat is rotten”.

I vaguely remember trying out this sentence in the early days of Google Translate, with amusing result.

However, I recently decided to try it again, and imagine my surprise when I realised that Google Translate can translate this exact phrase into any of the available languages and back into English without making a single error.

The obvious explanation is that Google must have added Mark 14:38 to the training corpus to ensure that nobody mocks them for getting it wrong.

It’s only this specific sentence that it handles this well. As soon as you start moving the words around or adding extra words, the quality of the translation decreases. For instance, “The spirit is willing, but the flesh is weak” becomes “Ånden er rede, men kødet er skrøbeligt” when translated into Danish, but “The spirit in the bottle is willing, but the flesh in the box is weak” becomes “Ånden i flasken er villig, men kødet i boksen er svag”. I’m not saying this translation is bad, but I find it interesting how it suddenly becomes unable to add the neuter -t to svag, although it managed perfectly well to add it to skrøbelig.

It’s quite interesting to investigate how Google Translate handles the individual words in this sentence. For instance, in the case of translating “spirit”, it appears the singular normally triggers the soul sense, whereas the plural triggers the alcohol sense. The result is that “The house of the spirits” gets translated into Danish as “Huset af spiritus” (“The house of alcohol”) rather than the expected “Åndernes hus”.