Many retailers are using chatbots as a way to efficiently communicate with customers, with some starting to extend this service to popular instant messaging platforms. Here Senior Cloud Data Architect, Yoyu Li and Cloud Systems Developer, Olga Lugai share how they used Google Cloud Ai Building Blocks to build a WhatsApp chatbot for a fictitious food-retailer in just two weeks.

WhatsApp is one of the world’s favorite messaging and calling platforms. Many businesses and organizations have been using the WhatsApp API to reach their target audience and, not surprisingly, we have seen the use of chatbots in combination with the API. However, while most chatbots are capable of understanding texts, multimedia messages – such as images and voice notes – still remain largely unexplored. Therefore, we have built this demo to showcase how you can utilize Google Cloud AI Hub to make sense of multimedia data produced in such conversations.

Whatsapp message

In this demo, a fictitious retail business wants to develop a WhatsApp chatbot to deal with customer inquiries or complaints. To provide the users with more natural experience, the business wants:

  • To encourage users to send product pictures or snapshots of a product barcode, instead of typing in the exact product names
  • To be able to redirect key information (such as the type of the inquiry, the product and store referred to) to appropriate channels to better assist the customers
  • To be able to provide multi-lingual support to their customers
  • To be able to analyze the data at a global level in order to provide better service to their customers, such as uncovering frequently raised issues, problematic products or suppliers

Making use of the fully-managed Cloud AI services on Google Cloud Platform, we built this demo within two weeks.

The Overall Architecture

Due to restrictions on WhatsApp’s API (currently in a limited public preview), for this demo, we used Twilio’s What’s API wrapper for the ease of implementation. In a conventional approach, we would have integrated the services using the Twilio Plugin for Dialogflow and Cloud Functions for ‘custom fulfillments’ in Dialogflow. However, as Dialogflow does not currently preserve the media in a message, and we want to be able to make sense of the pictures users send through, we have to take a different approach.

So, we use Cloud Functions to handle the messages from Twilio API directly and we still use Dialogflow for entity extraction and context management. This way, in addition to Dialogflow, we can also plugin multiple AI services from Google Cloud AI Hub.

Demo approach

Extracting Key Information from the Messages

No matter the format of the messages – text or image – to assist the customers, we first need to extract some key information from each conversation. For example, which product is the customer referring to? Is it a complaint or general inquiry?

For text messages, we can use entity extraction of Dialogflow to distill some information like product names and store names from the texts. For example, if the customer sends “I bought a bottle of Nutella”, and “Nutella” is in our custom entity set for the product catalog, we know the product the customer talking about is Nutella; and for images, we will need to use the Vision API, which we will discuss later.

Custom entity extraction for product namesCustom Entity Extraction For Product Names 

Naturally, customers don’t always use the word ‘complaint’ to express dissatisfaction. So we use a combination of Dialogflow and Sentiment Analysis in Google Cloud AI building blocks to understand customer’s intention. For example, if the customer says “I’m not happy with” a product, we can still understand they are raising a complaint.

Identifying Products in Images

In order to identify the products in images attached to the WhatsApp messages, we use the Product Search feature in Google Cloud Vision API. Product Search allows developers to train and utilize a customizable multi-class classification model to detect products in images.

To create a product set in Product Search, first, we take some pictures of each product that we want to include in the catalog; then we label those products with their product ID. Together with the product metadata (such as product category), we submit the data through Cloud Vision API then Product Search will automatically select the best algorithm, train and deploy the model for us.

To do a product search once the training is complete, we simply call the API with an image URL, then the API will tell us how likely the image matches one of more specific products in the catalog.

Below are some example images we used to train the model. We labeled each picture as either jam or mug. We set a minimum likelihood threshold of 50% in the chatbot application. After the training is completed, if a customer sends a picture of the jam to the chatbot, and Product Search thinks the picture 75% matches a previous jam picture used in the training, the chatbot will understand that jam is the product referred to in the conversation.

Product search training

Example Pictures Used in Product Search Training

Extracting Barcodes

Extracting barcodes from images, however, is not a typical computer vision challenge – barcodes are designed for machines’ convenience. Interestingly, Google only provides a barcode extraction service as part of the ML Kit for mobile devices to process locally. But in WhatsApp, we do not have control of the application itself, so we have to process the images and extract the barcode remotely. Luckily, the technology to extract barcodes is quite mature, all we need to do is to deploy an existing solution to Google Cloud. In this solution, we hosted this module in Cloud Run. Cloud Run is a fully-managed container service, and is priced based on the compute seconds and requests – so there is no standing cost if the API is not triggered.

Barcode Recognition

Extracting The Barcode from An Image within WhatsApp

The Memory of A Conversation

Now we should be able to extract key information from both texts and images. But one of the persistent challenges in chatbot building is that machines don’t inherently have a good memory.

We need short-term memory for the conversation, as we all know how frustrating it can be when the person we are speaking to can’t recall something you said just a couple of minutes before.

We also need long-term memory to remember the customer names and preferences in order to provide a more personal experience.

We use the context object in Dialogflow as the short-term memory to temporarily store the information during the cause of the conversation:

Context =  {
     Parameters: {product-name: "Cien Hand Gel"}
     Parameters: {store-name:"Holborn Store"}

Example of A Context Object

As for the `long-term memory’, we have the customer information such as names and preferences in the database, then we can use the mobile number passed from the Twilio to retrieve the details,

Multilingual Support

It is common for a multi-national retailer to have customers around the globe. Implementing a solution to support multiple languages can be expensive; but for the demo, we can achieve this easily by utilizing a building block in Google Cloud AI Hub that everyone’s familiar with – the Translate API.

When a user sends a message that is not in English, we first use the language detection API to recognize which language it was, then we translate it into English and forward to Dialogflow for processing.

Finally, we translate the response from Dialogflow back into the original language the customer used, before sending the reply. Google Translate API works reasonably well for our purpose:

Translate API

Using Translate API for Multilingual Support

What About Emojis?

On a side note, emojis can be supported too. Emojis are special characters, the only difference is that, for most of us, we cannot type in bytes nor read Unicode like U+1F604. Instead of U+1F604, we say :smile:. So we need to programmatically translate emojis into aliases and vice versa. In the demo, we use the emoji python library to ’emojize’ and ‘demojize’ characters:

>> import emoji
>> print(emoji.emojize('Google Cloud is :thumbsup:', use_aliases=True))
Google Cloud is ?
>> print(emoji.demojize('Google Cloud is ?'))
Google Cloud is :thumbs_up:

Gaining Business Insights

The application of the data collected from the WhatsApp messages can result in improved customer service and also assist in strategic conversations. When it is structured and presented using the visualization tools, it can become a valuable resource for various teams across the business.

The data is stored in BigQuery (the GCP Data Storage Service) where it can be queried directly to provide actionable insights. However, in order to analyze complex data more efficiently, we can connect Tableau, a popular data visualization software, to BigQuery to show live data and create interactive dashboards.

Instead of loading all data into Tableau, we run a query to get only the information we need to answer certain business questions. If other data sources have information on products and services that we want to include in the visualization, it is possible to add multiple connections. Excel spreadsheets, CSV files, relational databases – are some examples of the file types and sources that can be combined with the WhatsApp data to enhance our visualizations.

Geographic chart

Example of a geographic chart created from the multiple data sources

After multiple charts are joined in a complete dashboard, it can be published and presented to other teams in your business. It can become an invaluable tool for various departments to answer location-specific questions, help to respond to common customer concerns, and make strategic improvements.

A live data from WhatsApp that feeds into the dashboards allow you to see the state of the products and services in real-time and make the right decision at the right time.

How We Built This

To recap, the diagram below offers us an overview of the final architecture:

Final architecture

The architectural components used in this demo are 100% serverless, therefore, there is no infrastructure that we need to spin up or maintain. We have built this demo in an as-code manner, meaning that the configurations of each component are codified, so we can easily track the changes, and re-deploy the solution to a new environment in a minimum amount of time.

We have used Terraform for the configurations of some of the GCP elements, such as Cloud Functions, Cloud Run, BigQuery (for analytics) and Datastore (for customer database); however, many of the critical components used in the demo were not yet supported by Terraform, such as Dialogflow and Product Search in Vision API. For Dialogflow, we can configure the agent with JSON template; and for Product Search, we use Python code to automate the labeling, training, and deployment process.

We have also built an automated test and deploy pipeline in Bitbucket, which is key to our success in developing the demo in just two weeks.


We have presented an end-to-end demo of using WhatsApp as the customer service platform for a fictitious retail customer, using Twilio, Tableau, and Google Cloud AI building blocks.

From this demo, we can see Google Cloud offers a powerful toolkit for building AI-enabled applications. Throughout the project, we did not need to handcraft any ML models ourselves; instead, we simply made use of the high-level APIs, such as Language and Vision APIs, to achieve what would otherwise be challenging to accomplish.

We have seen more and more businesses and organizations that adopt WhatsApp as the communication tool to interact with their target audience. This solution is highly adaptable to different scenarios, especially if multimedia processing capability is desired. As the next step, we look forward to getting hands-on the WhatsApp API (once it comes out of the limited preview), so we can extend the features to a wider range of media formats, such as audio, video and location data.

To learn more about how our Data Insights practice is helping our customers gain unique insights and provide innovative experiences, click here. Also, to see how we are harnessing the power of Google Cloud to build game-changing technology and develop transformational solutions for many of the world’s largest enterprises, click here.