Implementing Google Voice Actions into Your Android App
Vincent Huang
Vincent Huang

Implementing Google Voice Actions into Your Android App

Vincent Huang, Android Engineer

What are Google Now Voice Actions?

The Google Voice Interactions API was recently added as part of Android Marshmallow. This is separate from the search-by-voice functionality provided by Google Voice Search last year. Google Voice Interactions allow developers the ability to communicate directly with their users. The concept is relatively simple: the user makes a request on Google Now ranging from “Take a picture” to “Play Thriller on myMusicApp”, and Android will open any app that is able to handle that specific request. If further interaction is necessary, the app can vocally communicate this information to the user with additional voice interactions.

tunein-interaction

What are the Benefits?

Google Voice Actions provides another way for users to interact with their apps; users who can’t be bothered to touch their screens can now verbally navigate through an app. The interface is fairly intuitive and interactive: an overlay for voice prompts covers part of the app screen. As the user provides information to the app, the app can dynamically update and adapt based on the user’s voice input. At the same time, users can still swipe and scroll through the screen during a voice interaction.

This is another excellent way to drive users to your app. As of now, 55% of teens and 41% of adults use Google Now.Contributing to this is Google Now’s success over Cortana and Siri as a verbal search engine. Many third party apps that have already implemented Google Voice Actions, including TripAdviser, Eat24, and TuneIn. Implementing Google Voice is another way for developers to maintain an extra edge over their competitors.

444190-digital-assistant-showdown

Google Now tops in sheer volume of questions successfully answered

What Voice Actions are available?

As a relatively new feature, voice actions are currently limited in number. Small enough that I can list them all right here:

  • Set an alarm for 7 am (Alarm)
  • Set a timer for 5 minutes (Alarm)
  • Call Bosco (Communication)
  • Take a picture (Media)
  • Record a video (Media)
  • Play thriller on myMusicApp (Media)
  • Search for cat videos on MyApp (Search)
  • Open ProlificInteractive.com (Web Browser)

This is a comprehensive list of System Voice Actions immediately available to developers. To add your own like “Lock the front door”, you would need to fill out the form here and have it approved by Google. Personally, I’d rather physically lock the door for peace of mind.

Show me some code!

There are 4 basic steps for setting up a full Google Voice Action Interaction:

    1. Update Gradle Build to Android M:
android {
  compileSdkVersion 23
  buildToolsVersion "23.0.1"

  defaultConfig {
    minSdkVersion 23
    targetSdkVersion 23
  }
}
    1. Set up an Intent Filter in the Manifest to receive the voice intent from Google Now:For “Take a picture”:
<intent-filter>
    <action android:name="android.media.action.STILL_IMAGE_CAMERA"/>
    <category android:name="android.intent.category.DEFAULT"/>
    <category android:name="android.intent.category.VOICE"/>
</intent-filter>

For Searching:

<intent-filter>
    <action android:name="com.google.android.gms.actions.SEARCH_ACTION"/>
    <category android:name="android.intent.category.DEFAULT"/>
    <category android:name="android.intent.category.VOICE"/>
</intent-filter>

As you can see both “voice” and “default” categories are required.

    1. Check for voice interaction:
if (!isVoiceInteractionRoot() || !isVoiceInteraction()) {
    //Not a voice interaction, proceed normally
} else {
    beginVoiceInteraction();
}

Alternatively, if your voice interaction includes a search query:

String action = intent.getAction();
if (action.equals(Intent.ACTION_SEARCH)) {
    String query = intent.getStringExtra(SearchManager.QUERY);
        handleVoiceQuery(query);
}
    1. Use Google’s Voice Interaction API
//One option can have many synonyms
VoiceInteractor.PickOptionRequest.Option voiceOption1 =
        new VoiceInteractor.PickOptionRequest.Option(“Green”, 1);
    option.addSynonym(“Olive”);
    option.addSynonym(“Emerald”);

VoiceInteractor.PickOptionRequest.Option voiceOption2 =
        new VoiceInteractor.PickOptionRequest.Option(“Red”, 1);
    option.addSynonym(“Crimson”);
    option.addSynonym(“Burgundy”);

//Add as many options as you’d like within the option array, this will increase the chances of //a successful response.
getActivity().getVoiceInteractor()
        .submitRequest(new PickOptionRequest("What is your favorite color?”, new Option[]{voiceOption1, voiceOption2}, null) {
            @Override
            public void onPickOptionResult(boolean finished, Option[] selections, Bundle result) {
                if (finished && selections.length == 1) {
                    //Use the index of the options array to determine what was said
                    selections[0].getIndex();
            }
            @Override
            public void onCancel() {
                getActivity().finish();
            }
        });

For a full app Voice Actions App Demo, I recommend trying the two codelab apps that Google has on their website.

How does the Google Voice Interactions API work?

There are four steps to creating a Voice Interaction conversation:

      1. Retrieve the voice interactor
      2. Submit a request to the voice interactor
      3. Google handles the conversation
      4. Handle the callback (this is created within the request)

The most important part for developers is the request; everything else is on Google’s end. A request is generally comprised of three things: a prompt to say to the user, expected responses, and a callback to handle the response.

Here are a list of voice requests available to developers:

      • PickOptionRequest (VoiceInteractor.Prompt prompt, Option[] options, Bundle extras): Provide an array of options; each option can contain a list of different response strings that represent one value. As you can see above I have one option “Green” which also applies to “Emerald” and “Olive”.
      • AbortVoiceRequest (VoiceInteractor.Prompt prompt, Bundle extras): This is used to abort the current voice interaction and return to the app’s basic UI. Generally used when the voice interaction is unable to proceed further.
      • CommandRequest (String command, Bundle args): Enter a command string that Google will take to retrieve relevant information (A non-working example is: com.google.voice.commands.REQUEST_NUMBER_BAGS). There are no commands that I know of at the moment, but these will most likely be critical for finding information like numbers, dates, etc. in the future.
      • CompleteVoiceRequest (VoiceInteractor.Prompt prompt, Bundle extras): Similar to AbortVoiceRequest, except this is used when a voice interaction successfully completed.
      • ConfirmationRequest (VoiceInteractor.Prompt prompt, Bundle extras): This is generally used for unsafe yes/no operations that require touch (Like using a credit card for payment).
      • Prompt (CharSequence[] voicePrompts, CharSequence visualPrompt): This is the prompt argument that most requests take. The first string is the prompt and the second string is the prompt shown to the user. NOTE: I have so far been unable to get the visual prompt to work; by default it will display the voice prompt.

What should I watch out for?

There are several kinks in the Voice Actions system that still need to be ironed out. Below is a list of things I ran into while developing a Google Voice Action demo app for making reservations:

      • The Voice Interaction API is only available when the app is started via specific Voice Actions. When attempting to use the voice interaction API during a Search Voice Action, nothing happens at all.
      • Voice Actions that use your specific app name require you to have the app placed publicly in the app store so the name can get indexed by Google. This is a relatively large setback for actual testing because if you want to test this, you will have to use the command prompt to fake a voice query.
      • The multilingual functionality of Voice Interactions is limited. For example, I can’t use ‘Sí’ when using Google’s “ConfirmationRequest”.
      • There isn’t a simple Voice Interaction API call available yet to handle unknown variables. Some examples include asking for a date, number of bags, etc. Google will most likely add functionality for these in the future.

If you’re interested in further reading, be sure to check out the below resources:

Google Voice Action Demo
Google Voice Action Demo
Voice Action Guide
Voice Action Developer API