Google I/O ‘17: Moving Toward an ‘A.I. First World’
In just the past few decades everything from our commerce, to our social interactions, to our entertainment, to our news media has been thoroughly transformed by immediate and constant access to the Internet through mobile devices. A.I. and machine learning represent the next phase of this transformation. Much will change under this paradigm with the rise of devices that will be able to interpret and respond to inputs dynamically, evolving along with our needs, desires and environment.
A few weeks ago I had the privilege of attending Google I/O. Google’s business leaders, scientists, and developers presented nothing less than a transformative vision of the future of its product suite and computing more broadly. The crux of it, as stated by Google’s chief executive Sundar Pichai during the keynote address, is that, “In an A.I.-first world, we are rethinking all [Google] products.” Not only that, but Google has succeeded in building out tools for non-experts to leverage the power of machine learning in their own applications.
The foundation of “A.I. first” at Google is TensorFlow, an open source library for machine learning. At its core, TensorFlow defines the behavior of neural networks through matrix multiplication and vector math. (If you’re interested in digging into the core mathematical concepts underlying neural networks, I highly recommend Neural Networks and Deep Learning as a starting point!) Google has introduced other libraries that run on top of this, providing users with the ability to more easily build, train, and evaluate machine learning models.
The Keras API is one such library for writing models. Describing the impetus for Keras at a session at Google I/O, researcher Francis Chollet said, “We really believe that, in the future, deep learning will be part of the toolbox of every developer… Everyone needs intelligent applications. To make this future possible, we need to make deep learning really easy to use. You shouldn’t need to be an expert to start leveraging deep learning to solve big problems.”
Chollet demonstrated how Keras enables just that. The problem he posed was a video of a man packing boxes into a car trunk. The machine learning model would be able to answer a natural language question about the content of the video, “What is the man doing?” In order to do so, the model would need to draw conclusions not only from the content of each individual frame, but also from the order of the frames. Otherwise, “watching” the same video, the model might conclude that the man was unpacking the trunk! Chollet stressed that, up until very recently, it would have taken a team of experts months to develop such a model. In stark contrast, the Keras approach is a “democratization of deep learning” that empowers anyone with basic Python scripting abilities to do so.
First, you run each constituent image of the video through a convolutional neural network, generating a vector representation encoding all of the frames. The frame vectors are in turn run through a sequence processing network with LSTM (long short-term memory) architecture. LSTM is designed to remember values for either long or short term durations, making it suitable for reducing the frame vectors into a single vector encompassing the frames’ content as well as their order. The same process is applied to get from spoken words to a single vector capturing the question, “What is the man doing?” Finally, the video and question vectors are concatenated to derive the answer, “Packing.” Keras has built-in best practices to do all of this with minimal configuration and engineering overhead.
I was most impressed and inspired by one of the last sessions on the final day of the conference. The talk was on Project Magenta, a Google Brain project exploring machine learning applications to music and art. Music and art are so fundamental to human culture, and realms that many people would argue it cannot, or should not, be territory for A.I. The speaker, Douglas Eck, met such challenges head on, arguing that musicians and artists throughout history have appropriated and distorted new technologies to test the boundaries of expression. He demoed a model that can predict how two sounds might be fused to create something more complicated, but also more subtle and natural, than the mere average of their individual soundwaves. A cow clarinet! A bass organ! “Instruments” that would be impossible to construct from physical materials. The result, which you can experiment with using this elegant interface, was truly awesome.
“A.I. first” was the hottest topic and most pervasive theme at Google I/O this year. Advancements in pragmatic tooling, like TensorFlow and Keras, will truly allow more and more developers to leverage the power of machine learning. Creative applications like Project Magenta will push us to test the boundaries in new and unexpected ways.
Needless to say, I cannot wait to see how Google’s “A.I. first” story continues to unfold, and how we will bring these advancements to bear for our partners and the products we build here at Prolific.