Hvad er formålet med Dansk Sprognævn?

radiohuset photo
Photo by seier+seier
Der er i dagens danske aviser flere artikler, der beskriver flytningen af Dansk Sprognævn fra København til Bogense primært fra medarbejdernes synsvinkel (og de er naturligvis ikke glade).

Aviserne er ret ukritiske over for sprognævnets beskrivelse af sig selv. Fx har Berlingske et længere interview med nævnets direktør, Sabine Kirchmeier, hvor hun ikke bliver stillet mange kritiske spørgsmål, ikke engang når hendes udsagn er ret subjektive, som i det flg.:

Dansk Sprognævn er ikke et lukket miljø, hvor vi sidder bag hver vores computer og taster. Vi er en meget udadvendt institution, der laver forskningsprojekter og seminarer og alle mulige andre aktiviteter. Nogle af os er censorer på universiteterne og underviser på universiteterne, og vi har ph.d.-studerende, der bliver vejledt hos os, og studerende, der skriver speciale hos os. Vi fungerer på mange måder som et universitetsinstitut, og hvis man i Moderniseringsstyrelsen, eller hvor man nu har truffet beslutningen, havde læst vores årsberetning, så ville man have set, at vi også får besøg af skoleklasser og holder seminarer om sprog med offentlige institutioner og uddannelsesinsitutioner. Og så videre og så videre.

De aktiviteter, hun beskriver her, lyder ganske rigtigt meget, som om de hører hjemme på et universitetsinstitut. Det underbygges af, at hendes svar på det rimelige spørgsmål, om de ikke bare ansætte nogle andre dygtige sprogfolk er flg.: „Nej, for der findes ikke ret mange med en ph.d.-grad på vores område. Nye medarbejdere skal først uddannes til jobbet, og det tager tre år at tage en relevant ph.d.-grad.“ Jeg har arbejdet i ordbogsbranchen i over femten år, og det er første gang, jeg hører om et sted, hvor der kræves en PhD-grad for at arbejde der (hvorimod det er meget normalt på universiteterne).

Og DSN er jo slet ikke noget universitetsinstitut iflg. loven:

§ 1. Dansk Sprognævn er en statsinstitution, som har til opgave at følge det danske sprogs udvikling, at give råd og oplysninger om det danske sprog og at fastlægge den danske retskrivning.

Stk. 2. Sprognævnet skal

1. indsamle nye ord, ordforbindelser og ordanvendelser, herunder forkortelser,

2. besvare sproglige spørgsmål fra myndigheder og offentligheden om det danske sprogs bygning og brug, herunder give vejledning i stavning og udtale af udenlandske navne,

3. udgive skrifter om dansk sprog, navnlig vejledninger i brugen af modersmålet, og samarbejde med terminologiorganer, ordbogsredaktioner og offentlige institutioner, der autoriserer eller registrerer stednavne, personnavne og varenavne.

Stk. 3. Dansk Sprognævn skal arbejde på videnskabeligt grundlag. I sit arbejde skal nævnet tage hensyn til sprogets funktion som bærer af tradition og kulturel kontinuitet og som spejl af samtidens kultur og samfundsforhold.

Stk. 4. I sager, som vedrører forholdet til andre sprog, forhandler nævnet med tilsvarende organer i de pågældende lande. Nævnet skal især samarbejde med sprognævn og tilsvarende organer i Norden.

[…]

§ 2. Dansk Sprognævn redigerer og udgiver den officielle danske retskrivningsordbog. Heri offentliggøres den af nævnet fastlagte retskrivning.

Stk. 2. I forbindelse med udgivelse af nye udgaver af retskrivningsordbogen kan nævnet på egen hånd foretage ændringer og ajourføringer af ikkeprincipiel karakter.

[…]

§ 3. Sprognævnet udsender hvert år en beretning om arbejdet. I beretningen eller på anden måde offentliggør nævnet mindst en gang om året et udvalg af de udtalelser, som det har afgivet i årets løb.

Det forekommer mig, at sprognævnet har overfortolket „at følge det danske sprogs udvikling“ „på videnskabeligt grundlag“. De to dele står ikke i samme sætning, men det forekommer mig, at de har opfattet det som et carte blanche til at skabe et universitetsinstitut, der forsker i det danske sprogs udvikling. Problemet er, at det ikke nødvendigvis er det, de får deres penge for. Det undstøttes af en mail fra ministeren til Politiken (intet gratis link):

Kulturminister Mette Bock (LA) afviser, at udflytningen bliver et problem for sprognævnet.

„Dansk Sprognævn løser i dag vigtige opgaver for hele landet, som de også kan løse fra Bogense. Arbejdspladserne vil blive en stor gevinst for byen,“ skriver hun i en mail til Politiken. […] „Sprognævnet løser i dag to hovedopgaver, som handler om rådgivning om sprog og at følge sprogets udvikling. Det er opgaver, som løses digitalt og telefonisk. Det kræver ikke en bestemt fysisk placering i København,“ skriver Mette Bock.

Det er naturligvis møgtræls for de berørte medarbejdere, der troede de skulle arbejde på et universitetsinstitut tilknyttet Københavns Universitet og ikke på et ordbogsforlag i Bogense, men det er jo ikke rigtigt regeringens problem.

Det skulle såmænd ikke undre mig, om de diskret er blevet bedt om at skrue ned for deres videnskabelige aktiviteter (det kunne fx begrunde deres sidste flytning fra universitetet til det gamle radiohus), men at de ikke har villet lytte. Deres struktur (med en direktør, en bestyrelse og et repræsentantskab) kunne godt tænkes at gøre det svært for den siddende kulturminister at gennemtrumfe en ændring på andre måder end ved at flytte dem.

Hvis Dansk Sprognævn opfatter sig som et universitetsinstitut, burde de nok opfordre til, at loven skrives om, så den passer med virkeligheden, og derefter burde de nok blive overført til et af universiteteterne. Alternativet er jo nok at flytte til Bogense og ansatte nogle nye medarbejdere, der er gode til at løse de opgaver, loven har bestemt, de skal udføre.

Emil fra Lønneberg og Julemanden

Emil og Julemanden.
Anna (som lige er fyldt ti) læser hver aften lidt op for mig for at blive bedre til dansk (og jeg læser også højt for hende). For tiden læser hun Emil fra Lønneberg, og det går da også ganske godt.

Nogle gange går det dog galt, som for eksempel, da hun glad og fro sagde flg.:

Emil spejdede op i skorstenen, og da så han noget sjovt. I hullet lige over hans hoved hang en rød julemand og kiggede ned til ham.

„Hej med dig,“ sagde Emil. „Nu skal du se en, der kan klatre!“

I originalen står der „julemåne“, men det er nu ikke nær så sjovt!

På samme måde læste hun flg. et par sider senere, men det var nu måske nok med vilje, for hun gjorde det med et skælmsk smil:

Men i Katholtsøen mellem hvide åkander svømmede Emil og Alfred rundt i det kølige vand, og på himlen hang julemanden, rød som en lygte og lyste for dem.

„Dig og mig, Alfred,“ sagde Emil.

„Ja, dig og mig, Emil,“ sagde Alfred, „Det skulle jeg mene!“

Phyllis bestemte sig i øvrigt for at teste Léon på den første passage, og han begik den selvsamme fejl som Anna, så det må være en oplagt fejl for dansk-skotter.

Scots on Smartphones

Writin Scots uisin SwiftKey’s preditive keyboard.
A’ve been fasht for a lang time at predictive keyboards wadna recognise Scots ava – ilka time ye uised a perfecklie normal wird, it wad get chynged tae a completelie different Inglis wird at juist happent tae leuk similar.

Sae A wis weel chuft whan ane o ma clients, Scottish Language Dictionaries, gat a email fae Julien Baley fae SwiftKey (a Lunnon-based companie awnt bi Microsoft) twa-three months syne anent addin Scots tae thair predictive keyboard for Android an iOS. A dae aw the data stuff for SLD, sae o coorse A wis chosen tae wirk wi Julien on this.

A extractit the relevant bits fae the new edition o the Concise Scots Dicionary an sent this tae Julien. Forby, A gied him a earlie version o a corpus (a collection o texts) o modren Scots. He separatelie contactit Andy Eagle and gat the heidwirds fae his Online Scots Dictionary.

Suin efter this, Julien sent me the first version o the keyboard. At this pynt, it daedna ken the Scots inflections, an it wis makkin some unco substitutions (e.g., aA oweraw), sae A advised him on the grammar o Scots an on the substitutions. The final bit wis tae leuk at wirds he fund in the corpus at wisna in the dictionars, an the keyboard wis redd.

Ye can doonlaid SwiftKey on yer Android smartphone the day, but gin ye hae a iPhone, ye maun wait few mair days (technical issues pat it aff).

SwiftKey will lair fae the wey fowks uise it, sae it’ll get better and better.

A howp this will see monie mair fowks writin Scots wi confidence, an ultimatelie tae better support for Scots in programs an on wabsteids. Wad it no be great gin Scots wis supportit in yer spellchecker, in Google Translate, and as Facebook interface leid?

PS: A wis chuft tae sae stories aboot this in The National, The Herald an Bella Caledonia.

AlphaDiplomacy Zero?

diplomacy game photo
Photo by condredge
When I was still at university, I did several courses in AI, and in one of them we spent a lot of time looking at why Go was so hard to implement. I was therefore very impressed when DeepMind created AlphaGo two years ago and started beating professional players, because it was sooner than I had expected. And I am now overwhelmed by the version called AlphaGo Zero, which is so much better:

Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.

It is able to do this by using a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. The system starts off with a neural network that knows nothing about the game of Go. It then plays games against itself, by combining this neural network with a powerful search algorithm. As it plays, the neural network is tuned and updated to predict moves, as well as the eventual winner of the games.

I’m wondering whether the same methodology could be used to create a version of Diplomacy.

The game of Diplomacy was invented by Allan B. Calhamer in 1954. The seven players represent the great powers of pre-WWI Europe, but differently from many other board games, there are no dice – nothing is random. In effect it’s more like chess for seven players, except for the addition of diplomacy, i.e., negotiation. For instance, if I’m France and attack England on my own, it’s likely our units will simply bounce; to succeed, I need to convince Germany or Russia to join me, or I need to convince England I’m their friend and that it’ll be perfectly safe to move all their units to Russia or Germany without leaving any of them behind.

Implementing a computer version of Diplomacy without the negotiation aspect isn’t much use (or fun), and implementing human negotiation capabilities is a bit beyond the ability of current computational linguistics techniques.

However, why not simply let AlphaDiplomacy Zero develop its own language? It will probably look rather odd to a human observer, perhaps a bit like Facebook’s recent AI experiment:

Well, weirder than this, of course, because Facebook’s Alice and Bob started out with standard English. AlphaDiplomacy Zero might decide that “Jiorgiougj” means “Let’s gang up on Germany”, and that “Oihuergiub” means “I’ll let you have Belgium if I can have Norway.”

It would be fascinating to study this language afterwords. How many words would it have? How complex would the grammar be? Would it be fundamentally different from human languages? How would it evolve over time?

It would also be fascinating for students of politics and diplomacy to study AlphaDiplomacy’s negotiation strategies (once the linguists had translated it). Would it come up with completely new approaches?

I really hope DeepMind will try this out one day soon. It would be truly fascinating, not just as a board game, but as a study in linguistic universals and politics.

It would tick so many of my boxes in one go (linguistics, AI, Diplomacy and politics). I can’t wait!

The future belongs to small and weird languages

tlingit photo
Photo by David~O
Google Translate and other current machine translation programs are based on bilingual corpora, i.e., collections of translated texts. They translate a text by breaking it into bits, finding similarities in the corpus, selecting the corresponding bits in the other language and then stringing the translation snippets together again. It works surprisingly well, but it means that current machine translation can never get better than existing translations (errors in the corpus will get replicated), and also that it’s practically impossible to add a language that very few translations exist for (this is for instance a challenge for adding Scots, because very few people translate to or from this language).

My prediction is that the next big break-through in computational linguistics will involve deducing meaning from monolingual corpora, i.e., figuring out the meaning of a word by analysing how it’s used. If somebody then manages to construct a computational representation of meaning (perhaps aided by brain research), it should then theoretically be possible to translate from one language into another without ever having seen a translation before, by turning language into meaning and back into another language. I’ve no idea when this is going to happen, but I presume Google and other big software companies are throwing big money at this problem, so it might not be too far away. My gut feeling would be 10–20 years from now.

Interestingly, once this form of machine translation has been invented, translating between two language varieties will be just as easy as translating between two separate languages. So you could translate a text in British English into American English, or formal language into informal, or Geordie into Scouse. You could even ask for Wuthering Heights as J.K. Rowling would have written it.

Also, the computer could be analysing your use of language and start mimicking it – using the same words and phrases with the same pronunciation. In effect, it could start sounding like you (or like your mum, Alex Salmond or Marilyn Monroe if you so desired).

This will have huge repercussions for dialects and small languages.

At the moment, we’re surrounded by big languages – they dominate written materials as well as TV and movies, and most computer interfaces work best in them. It’s also hard to speak a non-standard variety of a big language, because speech recognition and machine translation programs tend to fall over when the way you speak doesn’t conform. Scottish people are very aware of this, as shown by the famous elevator sketch:

However, if my predictions come out true, all of that will change. As soon as a corpus exists (and that can include spoken language, not just written texts), the computer should be able to figure our how to speak and understand this variety. Because translation is always easier and more accurate between similar language varieties than between very different ones, people might prefer to get everything translated or dubbed into their own variety. So you will never need to hear RP or American English again if you don’t want to – you can get everything in your own variety of Scottish English instead. Or in broad Scots. Or in Gaelic.

Every village used to have its own speech variety (its patois to use the French term). The reformation initiated a process of language standardisation, and this got a huge boost when all children started going to school to learn to read and write (not necessarily well, but always in the standard language). When radio was invented, the spoken language started converging, too, and television made this even more ubiquitous. We’re now in a situation where lots of traditional languages and dialects are threatened with extinction.

If computers start being good at picking up the local lingo, all of that will potentially change again. There will be no great incentive to learn a standard variety of a language if your computer can always bridge the gap if other people don’t understand it. The languages of the world might start diverging again. That will be interesting.

Scots-medium schools

Bokmål and NynorskIf Scots is a language – and it’s almost universally accepted today that this is the case – why is it treated as a regional accent by schools?

The typical approach is to learn a few songs in Scots, and perhaps even to read a short story or a play in high school, but not much else. There’s also now an awareness that kids shouldn’t get told off for talking Scots or using Scots words when speaking English. Surely this approach only makes sense if Scots is some variety of English.

If Scots is a language then it should be taught in separate classes, not as part of English lessons. And using Scots words when speaking English should be regarded as a case of code-switching – something which is common in all bilingual areas, but hardly a thing to be encouraged.

And last but not least, we should have Scots-medium schools. It’s absolutely wonderful that we have so many Gaelic-medium schools in Scotland now, but surely we should have Scots-medium ones, too. Schools where Scots is the language of tuition, apart perhaps from the English lessons.

It could be similar to the situation in Norway, where all pupils have to learn both Bokmål (similar to Danish) and Nynorsk (based on the dialects). However, some schools are Bokmål-medium and teach Nynorsk as a separate subject, and others are Nynorsk-medium and teach Bokmål as a subject.

Surely we could do the same here? Of course it will take a while to get there – the teachers will need training (even if they’re native speakers of Scots), and a lots of text books would need to get translated – but it would be do wonders for the Scots language.

Relearning how to spell

Lower case a with circumflex
Lower case a with circumflex.
The #JeSuisCirconflexe shitstorm that is currently engulfing France is a reminder of how hard it is to implement an orthographic reform. People who witnessed Denmark’s “mayonnaise war” (when the Danish language academy wanted to change the spelling of mayonnaise to majonæse) or the German spelling reform fights will not be surprised. People who’ve invested many hours in becoming good spellers in order to feel clever and superior simply don’t want any reforms that make them worse at spelling than primary school children.

This is probably the reason why systematic spelling reforms that are really easy to learn often get accepted without too much of a fight. For instance, it’s my impression that the change of “aa” to “å” in Danish in 1948 was implemented without too much pain (albeit slowly because typewriters and typesetters didn’t have access to that letter at first), and the bit of the German spelling reform that changed “ß” to “ss” after a short vowel (but not after a long one) was much less contested than the other changes (such as the change from “radfahren” to “Rad fahren”) that require more of an effort to remember.

I therefore suspect that the French would have been happier with a reform that dropped all the circumflexes rather than the one at hand that removes it in coût and paraître but keeps it in and je croîs (“I grow”). It’s simply too hard to learn the new rules for people who’ve left school already.

What does this mean for the prospects for changing the spelling of the English language? (Let’s just ignore for a moment the fact that there isn’t any language board that could instigate such a reform – it would be relatively easy for the major dictionary publishers of the English-speaking world to get together and create one if there was a demand.)

Some reforms that would seem straightforward in one part of the world are of course impossible because of pronunciation differences. For instance, many Americans use the same vowel in father and hot, but changing the spelling of the former to fother would be a disaster elsewhere. In the same way, people from southern England might want to drop the silent r’s, but of course they’re not silent in Scotland and most of America. Even changes that would be popular in most places would often face steep resistance in small areas – for instance only people from Scotland and Northern Ireland would object strongly to changing the spelling of bird and nerd to burd and nurd, but they really wouldn’t be popular here.

A reform that changed those words that go against all the normal rules – e.g., gaugegaige, debtdet, nightnite – would be eminently sensible, but the experience from other languages makes me think it would face enormous resistance, especially if the new spellings were made obligatory rather than just optional variants.

The only type of reform that would stand a chance would probably be wholesale changes of letters or letter groups, such as changing “ph” to “f(f)” or initial “x” to “z”, but to be honest changes like these wouldn’t make English significantly easier to spell, and what’s the point in that case?

A proper English spelling reform would be marvellous, but I doubt it’ll happen during my lifetime.