Author: Edward Caulfield
Release Date: May 28, 2017
Tim Cook, Apple’s CEO, has spent the last several months spreading the word about how Augmented Reality is one of the most significant new technologies of our time. And while the iPhone 8 is rumoured to contain some Augmented Reality capabilities, very little is publicly known.
Microsoft, on the other hand, has been clearly working very hard for a long time to integrate Augmented Reality into their product offerings. They are rumoured to have spent upwards of $1B in developing Augmented Reality technology and have released both HoloLens, a $3,000 Augmented Reality head set, and Windows 10 Creator Update, which embeds Augmented Reality technology into the Windows OS.
Google has clearly also been working hard to bring together disparate technologies that, when combined, give them a significant position in the Augmented Reality wars.
I start with the one thing that I expect most people will yawn at, Augmented Reality in Chrome using WebAR. At Google I/O this year it was announced that work was progressing on adding WebAR (an open source project based upon WebVR) to the Chrome (open source) browser. Why should we care? Because the current practise of having every Augmented Reality toolmaker create their own scanner is simply ridiculous. It doesn’t scale. While it may work for the short term, downloading yet another app so that you can view someone’s Augmented Reality experience is burdensome and wasteful to the consumer. We know from decades of experience that users value convenience over just about everything else and having Augmented Reality capabilities in Chrome, or any other browser for that matter, will be convenient. As a result, it will drive adoption. I suspect that proprietary Augmented Reality scanners will never go away completely, but they will be relegated to niche requirements.
Then we move on to the machine learning and the object recognition capabilities that are found in Google Photos and Google Lens, driven by TensorFlow. TensorFlow is Google’s open source project for Machine Intelligence. Object Recognition is currently a very significant challenge in Augmented Reality, with several companies working literally years to make this work and only a few succeeding. Google’s investments in a Machine Intelligence engine that drives object recognition capabilities leverages heavily into Augmented Reality. The same Machine Intelligence capabilities that can differentiate a dog from a cat, and tell you precisely what kind of flower you are looking at, can also determine the make and model of the fuel pump you are holding in your hand. You’re just going to have to train it.
The third piece of the puzzle is Google’s Tango. Tango is the tool that gathers the precise physical characteristics of your surroundings and generates a point cloud representation of it. Tango gives Object Recognition a boost, because instead of just having an image of an object, you now have its correct dimensions as well. Having physical dimensions provides a very helpful set of features for the machine intelligence algorithms. As well, Tango helps when you position virtual objects in your Augmented Reality experience, giving you information about the placement of real objects and ensuring that occlusion works correctly.
How does this all work together? Let’s take it a step at a time.
1) You open your browser and point your device at an object. The image of the object along with, and thanks to Tango, its physical dimensions are captured.
2) This information, or a subset of it, is fed into the TensorFlow Machine Intelligence engine to identify the object scanned.
3) You then overlay the object with your Augmented Reality experience, using Tango to ensure that your placement of virtual objects is in sync with the real world.
Google is doing all the technical heavy lifting so that you can focus on your Augmented Reality content.
Let’s walk through a few examples to make this a little more relevant.
1) You point your device at the coffee table in the middle of your living room. On the coffee table is an empty vase. The system recognises this and places a variety of seasonal flowers into the vase. Your device is not a phone or tablet, rather “always on” Smart glasses that can dynamically reconfigure your environment, adding and removing virtual items at will, to suite your desires. I see the walls as white, you see them as pink. I see Roses in the vase, you see Tulips. An empty room becomes an art gallery filled with representations of any object you wish, or a meeting room that automatically populates with the virtual presence of your colleagues. You see whatever you want, whenever your want, however you want it.
2) Your auto mechanic is now an expert on every vehicle make and model. Wearing Smart glasses, she identifies things simply by looking at them. Sounds are analysed and probable root causes are displayed, along with a series of diagnostic steps and the necessary tools. The correct replacement parts are identified and ordered immediately once a root cause is determined.
3) You “go to school” and all of your lessons are now three dimensional. People no longer talk about things to explain them, you explore things virtually. Instead of seeing a picture of a heart on a page, you experience a 3D representation of it in front of you, at any size. A solar system can be compressed to the size of a classroom, an atom can be held in your hand and spun like a basketball. If you’re curious to dig deeper, you can drill into the model and learn to your heart’s content. Everyone’s experience is individualised. Education becomes exploration, not rote learning.
While it will take years of effort to fully realise these examples, each and every one exists in the market today, just to a lesser degree than described here. Certainly, it will take years to migrate our knowledge into the new tools and there are many technical details still to be ironed out. The important thing is that the tools to realise this are no longer lingering in someone’s imagination, they are on the table in front of us and awaiting to be put to use.
Let’s see how Apple and Microsoft react.