Skip to main content

WebSkills - a proposal for open intelligent assistants

It's clear at this point that intelligent assistants - and more broadly, ambient computing devices that you interact with naturally, rather than holding like a smartphone or laptop - are going to play an important part in our digital future.

Of the platforms doing the rounds at the moment, I'm most excited by Alexa, because of its relative openness: Amazon has made it available as an operating system for manufacturers, so it'll start showing up in cars and offices, and they've treated their product line-up as a series of proofs of concept. Nice.

Still, you need to plug Alexa Skills (their name for apps) through their APIs in a relatively closed way. Back-end deals need to be done for new functionality, and so on.

What if that didn't need to be the case? Picture this:

1. I'm using my favorite web service. It lets me know that I can install its functionality into my WebSkills-compatible intelligent assistant, either using UI on the site itself, or through a strip at the top of the page, a bit like how Safari on the iPhone tells you about relevant apps. I push the button, because I'd love to be able to talk to this service whenever I need.

2. My device prompts me to make sure I want to authenticate with this skill and install it in my assistant. Sure I do.

3. What's actually happening is that an endpoint, referenced in the website's HTML through a <link rel="webskill" href="..." trigger="service name"> tag, is being registered with my device. (No, trigger isn't a valid link propertt right now, but bear with me.) The trigger is the unique service word that can be used to trigger the request. For example, if the trigger was "Wolfram Alpha", the request to the assistant might be of the form, "Alexa, ask Wolfram Alpha what is the GDP of Bhutan?"

4. When a request is made, the intelligent assistant looks to see if the trigger word has been registered. It then calls the associated URL from the link tag using a GET request with a q property that contains the full text of the request.

5. The endpoint returns either text to be read out, or the contents of a WAV or MP3 audio file. The intelligent assistant dutifully plays this out.

This is one example of a simple mechanism that would allow any provider on the internet to add intelligent assistant skills in a cross-platform way. It's unsophisticated, but it would allow a thousand intelligent assistant platforms to bloom, with the web at their core, rather than a few monopolistic platforms.

I'd love feedback! It's easy to talk about these kinds of projects, but talk is cheap, so my next plan is to build a proof of concept.