This morning, driving down to the office, I asked myself the question if speech recognition really is the next great break through and why? If so, why has it taken such a long time for it to be mature enough to be used in our daily lives?
Can we add it to the list of the introduction of the mouse (eighties), the graphical user interface (eighties, and only a success because of the mouse), touch screen (used since the eighties in industrial situations, but really widely accepted with the introduction of the iPhone® and later the iPad®). I think we can.
This won’t happen today, or tomorrow, but soon. Just as the technologies mentioned earlier, it has changed the way we interact with systems around us drastically. If you had asked me 15 years ago if I could imagine today, using a mobile phone with only a touch screen to keep track of my e-mail, calendar, financial situation and be in touch with my friends (without using the actual phone functionality), what do you think my answer would have been? It was impossible to imagine a device containing more technology and processing power than the latest satellites being launched. With my background and interests though, my answer probably would have been, ‘when and where can I get it?!’.
So what will this new solution look like, and how are we going to use it? Is it going to evolve into something like suggested in Star Trek (I’m not a diehard Star Trek fan by the way)? Or like in Minority Report and Iron Man (I love those films)? It probably will. Providing an intuitive way of communicating with the technology around us; asking questions and giving simple tasks.
Speech recognition has been available for quite some time, and although the performance and accuracy still increases with new versions, the basic concepts have not changed a lot. The next chapter in this era of speech technology is language understanding. Not just recognising the words and presenting them on screen, or having to dictate clear literal commands. Instead, interpreting what has been said and giving meaning to that, which implies it is able to handle many different variances of the same sentence and still come up with the same result.
Basically this is exactly what we are currently researching and working on at G2 Speech. Are we able, today, to come up with an intuitive way of interacting with a system and are we then able to use that data for reporting on patient care, quality indicators for a department, or for a hospital? Are we able to use that data for decision support, giving immediate feedback to the consultant creating a report for a patient and prescribing medications?
Time will tell! Come back to me in 15 years…
If you’re interested in finding out more about how language understanding works, take a look at the Google video. The eight minute video explores the progress of speech recognition.
Feedback, comments, ideas on this blog are welcome!
To view Google’s video on the progress of speech recognition, click here