OK Computer: how to work with automation and AI on the web

Warning This article was written over six months ago, and may contain outdated information.

Auto­mat­ed sys­tems pow­ered by new break­throughs in Arti­fi­cial Intel­li­gence will soon begin to have an impact on the web indus­try. Peo­ple work­ing on the web will have to learn new design dis­ci­plines and tools to stay rel­e­vant. Based on the talk “OK Com­put­er” that I gave at a num­ber of con­fer­ences in Autumn 2015.

In 1996 Intel began work on a super­com­put­er called ASCI Red. Over the life­time of its devel­op­ment it cost £43m (adjust­ed for today’s rate), and at its peak was capa­ble of pro­cess­ing 1.3 ter­aflops. ASCI Red was retired in 2005, the same year that the Playsta­tion 3 was launched. The PS3 cost £425, and its GPU was capa­ble of pro­cess­ing 1.8 teraflops.

IBM’s Wat­son is a com­put­er for learn­ing. It was ini­tial­ly built with the aim of beat­ing the human cham­pi­on of the US game show, Jeop­ardy, and in 2011, it beat the human cham­pi­on of Jeop­ardy — by a lot. (In the pic­ture below, Wat­son is the one in the middle).

The Watson computer on the gameshow, Jeopardy. Credit: IBM

It’s hard to find devel­op­ment costs for Wat­son, but a con­ser­v­a­tive esti­mate would put them at £12m over ten years. Four years after the Jeop­ardy vic­to­ry, the Cog­ni­toys Dino, a smart toy for chil­dren, will go on sale. It costs £80, and is pow­ered by Watson.

In 2012 Google used a net­work of 16,000 con­nect­ed proces­sors to teach a com­put­er how to recog­nise pho­tographs of cats. Three years lat­er, the same tech­nol­o­gy can now suc­cess­ful­ly iden­ti­fy a pho­to of a “man using his lap­top while his cat looks at the screen”.

Man using his laptop while his cat looks at the screen

I’ve told these three sto­ries to make a point: that cheap and plen­ti­ful pro­cess­ing has allowed Arti­fi­cial Intel­li­gence to improve very much, very quickly.

There are lots of dif­fer­ent strands to A.I., but the big recent break­throughs have been in deep learn­ing using neur­al net­works. Very broad­ly, a neur­al net­work is a series of nodes that each per­form a sin­gle action, of analy­sis or clas­si­fi­ca­tion, on an input. The result of the action is passed onto anoth­er set of nodes for fur­ther pro­cess­ing, until the net­work returns a final out­put, stat­ing with some degree of cer­tain­ty what the input is. It can return a rough answer quick­ly, or a more pre­cise answer slow­ly. It’s sort of how our brains work.

As an illus­tra­tion of the poten­tial of deep learn­ing, com­put­er sci­en­tists in Ger­many have used neur­al net­works to analyse the paint­ing styles of the Old Mas­ters and apply them to pho­tographs. This is not sim­ply apply­ing image fil­ters — the sys­tem detects sky, water, trees, build­ings, and paints them accord­ing to the styles of artists such as Turn­er, Van Gogh, and (shown here) Munch.

A photo of a street, and the same image rendered in the style of Munch

And hav­ing con­quered the world of art, now these A.I. sys­tems are com­ing for your job.

This graph from an arti­cle on Gartner.com shows a rough approx­i­ma­tion of the like­li­hood of your job being auto­mat­ed by learn­ing sys­tems in the near future. It works on two axes: rou­tine, and structure.

Graph showing the likelihood of a job being automated

To sum­marise it, if your job deals with abstrac­tions and varies great­ly from day to day, you’re prob­a­bly safe. But if you deal with hard data, like num­bers, and you do the same thing every day, it’s prob­a­bly time to start ner­vous­ly look­ing over your shoul­der. As a stark illus­tra­tion of this, recent fig­ures show that the num­ber of employ­ees in the finance depart­ments of major US com­pa­nies has dropped by 40% since 2004.

More direct­ly relat­ed to the Web indus­try, a recent study by Oxford Uni­ver­si­ty and Deloitte indi­cat­ed that it’s “not very like­ly” that a web design and devel­op­ment pro­fes­sion­al will lose their job to automa­tion in the next twen­ty years. How­ev­er, their def­i­n­i­tion of “not very like­ly” is a 21% chance; to put that more blunt­ly, one in five of us could be out of work, due to an over­all shrink­ing of the avail­able job market.

There are already signs that auto­mat­ed sys­tems could replace pro­gram­mers in the future. MuS­calpel, a sys­tem devel­oped by Uni­ver­si­ty Col­lege Lon­don, can trans­plant code from one code­base to anoth­er with­out pri­or knowl­edge of either; in one test it copied the H.264 media codec between projects, tak­ing 26 hours to do so — a feat which took the team of human engi­neers 20 days. And Heli­um, from MIT and Adobe, can learn (with­out guid­ance) the func­tion of code and opti­mise it, pro­vid­ing effi­cien­cies of between 75 and 500 per­cent in tests.

These sys­tems are a few years away from mar­ket, but we’re already start­ing to see automa­tion move into the indus­try in small­er ways. Ser­vices such as DWNLD, App­Ma­chine, and The Grid offer users a tai­lored mobile app or web­site with­in min­utes, with styles and themes based on the con­tent and infor­ma­tion pulled from exist­ing social pro­files and brand assets. These ser­vices, and oth­ers like them, will become smarter and more avail­able, skim­ming away a whole lev­el of brochure sites usu­al­ly offered by small dig­i­tal agen­cies or individuals.

A com­mon crit­i­cism of ser­vices like The Grid is that they can only pro­duce iden­tik­it designs, with no flair or imag­i­na­tion. But look at the col­lec­tion of web­sites below; peo­ple designed these, and flair and imag­i­na­tion are nowhere to be seen.

Screenshots of homogenous website design

These screen­shots are tak­en from Travis Gertz’ excel­lent arti­cle, Design Machines, in which he high­lights the problem:

The work we pro­duce is repeat­able and pre­dictable. It pan­ders to a com­mon denom­i­na­tor. We build buck­ets and tem­plates to hold every kind of con­tent, then move on to the next com­po­nent of the system.

Dig­i­tal design is a human assem­bly line.

Look­ing back at the Gart­ner chart on the like­li­hood of automa­tion, I’d say that “a human assem­bly line” would be some­where near the bot­tom left. And we’ve only our­selves to blame. Gertz again:

While we’ve been stream­lin­ing our process­es and per­fect­ing our machine-like assem­bly tech­niques, oth­ers have been watch­ing close­ly and assem­bling their own machines.

We’ve designed our­selves right into an envi­ron­ment ripe for automation.

All of the work­flows we’ve built, the com­po­nent libraries, the process­es and frame­works we’ve made… they make us more effi­cient, but they make us more automatable.

How­ev­er, bril­liant writer that he is, Gertz doesn’t only iden­ti­fy the prob­lem; he also offers a solu­tion. And that solu­tion is:

Humans are unpre­dictable mushy bags of irra­tional­i­ty and emotion.

This is a good thing, because a com­put­er can nev­er be this; it can nev­er make judge­ments of taste or intu­ition. Many peo­ple are famil­iar with the Tur­ing test, where a human oper­a­tor has to decide if they’re talk­ing to anoth­er human, or a bot. But there’s a less­er-known test, the Lovelace test, which sets cre­ativ­i­ty as the bench­mark of human intel­li­gence. To pass Lovelace, an arti­fi­cial agent must cre­ate an orig­i­nal pro­gram — such as music, or a poem — that it was nev­er engi­neered to pro­duce. Fur­ther, that pro­gram must be repro­ducible, and impos­si­ble to explain by the agent’s orig­i­nal creator.

The idea is that Lovelace should be impos­si­ble for an arti­fi­cial agent to pass. Cre­ativ­i­ty should be impos­si­ble for a com­put­er. And it’s this, not tools, that offers us the oppor­tu­ni­ty to make our roles safe from automation.

Andrew Ng, who helped devel­oped Google’s deep learn­ing sys­tems and now works at Chi­nese search com­pa­ny Baidu, has seri­ous con­cerns that automa­tion is going to be respon­si­ble for many job loss­es in the future, and that the best course of action is to teach peo­ple to be unlike com­put­ers:

We need to enable a lot of peo­ple to do non-rou­tine, non-repet­i­tive tasks. Teach­ing inno­va­tion and cre­ativ­i­ty could be one way to get there.

But as well as learn­ing to be cre­ative, we should also become cen­taurs — that is, learn to enhance our abil­i­ties by com­bin­ing our instincts with an intel­li­gent use of arti­fi­cial intel­li­gence. Many smart peo­ple have begun con­sid­er­ing the impli­ca­tions of this; Cen­ny­dd Bowles wrote:

A.I. is becom­ing a cor­ner­stone of user expe­ri­ence. This is going to be inter­est­ing (read: dif­fi­cult) for designers.

To the cur­rent list of design dis­ci­plines we already per­form — visu­al, inter­ac­tion, ser­vice, motion, emo­tion, expe­ri­ence — we will need to add one more: intel­li­gence design.

Pre­vi­ous­ly I said that A.I. is improv­ing very quick­ly, and this also means that it’s becom­ing much more avail­able very quick­ly. Ser­vices that were once only avail­able to the inter­net giants are now avail­able to every­one through APIs and prod­ucts, at rea­son­able-to-free prices.

Remem­ber Wat­son? All of its pow­er is avail­able to use through IBM’s Devel­op­er Cloud; their Bluemix cloud plat­form and a Node SDK gives you access to pow­er­ful and sophis­ti­cat­ed image and text ser­vices via some REST­ful APIs. IBM wants Wat­son to be the ubiq­ui­tous plat­form for AI, as Win­dows was to the home PC and Android is to mobile; as a result, Devel­op­er Cloud is free for devel­op­ers, and rea­son­ably priced for businesses.

What do you get with Wat­son? For a start, some visu­al recog­ni­tion tools, like the Google one I men­tioned at the begin­ning of this piece. Upload an image, and Wat­son will make an edu­cat­ed guess at explain­ing the con­tent of that image. It works well in most cas­es (although was bizarrely con­vinced that a por­trait of me con­tained images of wrestling).

These iden­ti­fi­ca­tion errors should reduce in fre­quen­cy as you give Wat­son more data, because deep learn­ing thrives on train­ing and data. That’s why all the major online pho­to tools, from Google Pho­tos to Flickr, entice you with huge stor­age lim­its, in many cas­es prac­ti­cal­ly unlim­it­ed; they want you to upload more, because it makes their ser­vices bet­ter for every­one. These ser­vices include auto­mat­ic tag­ging of pho­tos and con­tent-based search; Google Pho­tos in par­tic­u­lar is very good at this, eas­i­ly find­ing pic­tures of ani­mals, places, or even abstract con­cepts like art.

Google Photos search results for ‘street art’

Even­tu­al­ly these offer­ings will raise the expec­ta­tions of users; if your pho­to ser­vice doesn’t offer smart search, it’s going to seem very dumb by comparison.

Wat­son can also find themes in batch­es of pho­tos, offer­ing insights into users inter­ests and allow­ing for bet­ter tar­get­ing. This is anoth­er rea­son why pho­to ser­vices want you in: because you become more attrac­tive to advertisers.

I should add that Wat­son is not the only game in town for image recog­ni­tion; alter­na­tives include star­tups like Clar­i­fai, Met­a­Mind, and Sky­Mind; and Microsoft’s Project Oxford, which pow­ers their vir­tu­al assis­tant Cor­tana. Project Oxford has best-in-class face APIs, able to detect, recog­nise, ver­i­fy, deduce age, and find sim­i­lar faces; how you feel about that will large­ly depend on your lev­el of trust in Microsoft.

While image recog­ni­tion is inter­est­ing and use­ful, the ‘killer app’ of AI is nat­ur­al lan­guage under­stand­ing. The abil­i­ty to com­pre­hend con­ver­sa­tion­al lan­guage is so use­ful that every major plat­form is adopt­ing it; Spot­light in OSX El Cap­i­tan allows you to search for “doc­u­ments I worked on last week”, while ask­ing Google “how long does it take to dri­ve from here to the cap­i­tal of France?” returns direc­tions to Paris.

If you want to add nat­ur­al lan­guage under­stand­ing to your own apps, one of the best tools around is the Alche­my API, orig­i­nal­ly a start­up but now part of the Wat­son suite. This offers sen­ti­ment analy­sis, enti­ty and key­word extrac­tion, con­cept tag­ging, and much more.

Nat­ur­al lan­guage under­stand­ing is a key com­po­nent in a new wave of rec­om­men­da­tion engines, such as those used in Pock­et and Apple News. Exist­ing rec­om­men­da­tion engines tend to use ‘neigh­bour­hood mod­el­ling’, bas­ing rec­om­men­da­tions on social graph inter­ac­tion; but the new AI-pow­ered engines under­stand the con­cepts con­tained in text con­tent, allow­ing it to be bet­ter matched with oth­er, sim­i­lar content.

Where AI real­ly excels, how­ev­er, is when applied to con­ver­sa­tion. Talk­ing to a com­put­er is noth­ing new; to give just one exam­ple, IKEA have had a cus­tomer ser­vice chat­bot called Anna on their web­site since at least 2008. But although Anna can answer a straight­for­ward ques­tion, she has no mem­o­ry; if you don’t pro­vide the same infor­ma­tion in a fol­low-up than you did in the pre­vi­ous ques­tion, you’ll get a dif­fer­ent answer. This isn’t real­ly a con­ver­sa­tion, which has require­ments as defined here by Kyle Dent:

A con­ver­sa­tion is a sequence of turns where each utter­ance fol­lows from what’s already been said and is rel­e­vant to the over­all inter­ac­tion. Dia­log sys­tems must main­tain a con­text over sev­er­al turns.

Main­tained con­text is what today’s AI con­ver­sa­tions offer that was pre­vi­ous­ly miss­ing. Google are using this to tri­al auto­mat­ed sup­port bots, trained on thou­sands of pre­vi­ous­ly record­ed sup­port calls. (The same bots, trained on old movies, can be sur­pris­ing­ly philo­soph­i­cal.) And it’s what pow­ers the new wave of vir­tu­al assis­tants: Cor­tana, Siri, Google’s voice search, and Ama­zon Echo. This lat­ter is par­tic­u­lar­ly inter­est­ing because it lives in your home, not your phone, car or TV; it’s the first move into this space, soon to be joined by robots with per­son­al­i­ty, like Jibo, or Mycroft.

All of these vir­tu­al assis­tants share anoth­er fea­ture, which is that you can inter­act with them by voice. Voice isn’t essen­tial to con­ver­sa­tion, but it helps; its much eas­i­er than using a key­board in most cas­es, espe­cial­ly in coun­tries like Chi­na which have very com­pli­cat­ed char­ac­ter input, and high lev­els of rur­al illiteracy.

Mak­ing com­put­ers recog­nise words has been pos­si­ble for a sur­pris­ing­ly long time; Bell Labs came up with a voice recog­ni­tion sys­tem back in 1952, although it could only recog­nise num­bers, and only spo­ken by a spe­cif­ic per­son. There was a fur­ther break­through in the 1980s with the ‘hid­den Markov mod­el’, but deep learn­ing has huge­ly improved voice recog­ni­tion in the past three years; Google says its voice detec­tion error rate has dropped from around 25% (one misiden­ti­fied word in every four) in 2012 to 8% (one in twelve) ear­li­er this year — and that’s improv­ing all the time. Baidu says their error rate is 6% (one in sev­en­teen), and they han­dle some 500 mil­lion voice search­es every day.

Voice recog­ni­tion is avail­able in some browsers, name­ly Chrome and Fire­fox, through the Web Speech API, and many oth­er prod­ucts includ­ing Wat­son and Project Oxford offer speech-to-text ser­vices. These all require a micro­phone input, of course, which unfor­tu­nate­ly rules out using Safari or any iOS brows­er at all.

But while voice recog­ni­tion can iden­ti­fy the indi­vid­ual words in an utter­ance, it doesn’t under­stand in any way the mean­ing of those words, the intent behind them. That’s where the pre­vi­ous­ly-men­tioned break­throughs in nat­ur­al lan­guage under­stand­ing come in. There are a grow­ing num­ber of voice-based lan­guage under­stand­ing prod­ucts now avail­able, includ­ing Project Oxford’s LUIS (Lan­guage Under­stand­ing Intel­li­gence Ser­vice — I love a good acronym) and the start­up api.ai. The mar­ket leader in pars­ing com­plex sen­tences is Houn­di­fy, from Sound­Hound, but the ser­vice I like for its ease of use is Wit.

Wit was once a start­up but is now owned by Face­book. It’s free to use although all of your data belongs to Face­book (which may be a deal break­er for some) and is avail­able to every oth­er user of the ser­vice — because, as I said ear­li­er, more data gives deep learn­ing sys­tems more pow­er. It has SDKs for mul­ti­ple plat­forms, but where it wins for me is in its train­ing sys­tem, which makes it very easy to cre­ate an intent frame­work and cor­rect mis­in­ter­pret­ed words.

Wit is the pow­er behind M, Facebook’s entry into the vir­tu­al assis­tant mar­ket. M is notable because it only lives inside Face­book Mes­sen­ger, which is a pat­tern I’m sure we’re going to see much more in the future: the AI-pow­ered shift from the graph­i­cal user inter­face to the con­ver­sa­tion­al; from GUI to CUI.

There’s a rea­son that Face­book paid an esti­mat­ed £15 bil­lion for What­sApp, and it’s not sole­ly their £6.5 bil­lion in sales: it’s because mes­sag­ing apps are huge, and What­sApp is the biggest of all, with some 900 mil­lion month­ly active users. What’s more, mes­sag­ing apps are becom­ing even more huge real­ly quick­ly; they’re the fastest grow­ing online behav­iour with­in the last five years, and an esti­mat­ed 1.1 bil­lion new users are set to come on board in the next three years.

And mes­sag­ing apps as we know them in the West are actu­al­ly very lim­it­ed com­pared to mes­sag­ing apps in Asia, espe­cial­ly Chi­na, where they are more like plat­forms than apps: you can shop, bank, book a doc­tors appoint­ment… basi­cal­ly, any­thing you can do on the web today. Mes­sag­ing apps are a huge growth area, and they’re going to be pow­ered by con­ver­sa­tion­al assistants.

We can see this begin­ning already with apps like chat­Shop­per (cur­rent­ly Ger­many only) which lets you talk to a per­son­al shop­per to make pur­chas­es through What­sApp; and Dig­it, a sav­ings app that com­mu­ni­cates almost entire­ly by text mes­sage. These cur­rent­ly use a mix of auto­mat­ic and human oper­a­tors (this is also how Facebook’s M works right now), but as AI becomes more intel­li­gent the bots will take over from the humans in many cases.

More advanced, ful­ly auto­mat­ed ser­vices include x.ai’s ‘Amy’, or the appar­ent­ly very sim­i­lar Clara. These are meet­ing assis­tants that work by email; you ask them to find a suit­able time and place for a meet­ing, and they com­mu­ni­cate with all the par­tic­i­pants until the arrange­ments are made, then email you back with the final details.

Con­ver­sa­tion­al UI is an idea that’s time has come, enabled only now by AI and nat­ur­al lan­guage under­stand­ing. To add it to your own apps you could look at Watson’s Dia­log ser­vice (a sim­i­lar ser­vice is also appar­ent­ly com­ing to Wit in the near future), or a start­up such as re:infer. But it’s not only a case of plumb­ing in a ser­vice, it will also require an addi­tion to the list of design dis­ci­plines I men­tioned pre­vi­ous­ly: con­ver­sa­tion design.

I should note that new inter­ac­tion mod­els are still prone to old prob­lems; secu­ri­ty, pri­va­cy and trust should always be para­mount in your appli­ca­tions. Remem­ber the Sam­sung TV scan­dal ear­li­er this year? Do we real­ly want a repeat of this but with an arti­fi­cial­ly intel­li­gent Bar­bie in children’s bedrooms?

The ready avail­abil­i­ty of deep learn­ing ser­vices has come upon us so quick­ly that we’ve bare­ly realised; many of the ser­vices I’ve men­tioned didn’t exist even 18 months ago. This is a lit­tle bit scary, and a huge oppor­tu­ni­ty. There’s lit­tle doubt that AI’s going to take rou­tine jobs from the web indus­try; so as AI improves, we need to improve with it. The way to do that is to har­ness AI for our own use, and apply cre­ative, irra­tional, human think­ing to it.

Cross-post­ed to Medi­um.

Comments are closed.