How Clustaar uses Machine Learning to help you improve your bot

By Guillaume LebourgeoisMarch 11, 2019
NLP engine graphic Clustaar

Chatbot Setup

Setting up a chatbot takes time, and requires various resources: business experts, developers, support agents. Input from said business experts & support agents is primordial in order to configure the scope of the first version of the bot. They use their experience from their respective fields, and chose the most important themes which must be managed by the bot.

It is a good way to be up and running quickly, lower support team input, and gather conversations.

Chatbot Maintenance & evolution

Clustaar provides great tools to monitor your bot performance, and analyze its behavior.

  1. A high-level dashboard, with an overview of activity
  2. Metrics on users satisfaction over time
  3. Intents & story statistics
  4. User Queries navigator

Using these, you’ll be able to maintain the health of your bot, its progression, and easily understand maintenance quick wins: if an intent needs improvement or to be removed, if a scenario can be completed.

Then, taking time in the User Queries navigator will help you better understand the questions your users ask, and if the bot is able to address them. However, this tool can become quite frustrating when the volume of conversations begins to raise.

Well; a raising volume of conversations is good news, and we don’t want you to be frustrated when it happens 😉 That’s why we created the “Intent Suggestions” feature!

Intent Suggestions Feature

Check out our knowledge base for in-depth details on Intent Suggestions. For newcomers, just know that this powerful feature is able to detect:

  • new formulation suggestions to add and improve existing intents
  • new intents recommendations for you to create, with a set of suggested formulations

This feature is intended to improve the efficacy of your bot and is made possible by Clustaar’s AI. These suggestions will enable the bot to answer the questions your users are really asking. Its is also helpful to detect changes in the questions users ask, and automatically follow the evolution of your product & users.

Let’s take a look at how this sorcery happens!

Learning from conversations

It is important to understand that this magic can only happen with a substantial volume of data. We will need a few thousands of questions asked in the last two months to be able to provide you with accurate and meaningful results.

We encourage you to deploy the chatbot on pages with high visibility: the more it is visible, the more data you’ll have, the more questions will be automatically handled.

For those with a significantly lower traffic: don’t worry, as described earlier, Clustaar provides all the tools to easily improve your bot with a bit of work.

Data Cleansing

If your bot is eligible, i.e. it has enough data within the last two months, its conversations will be automatically analyzed. The computing will begin with a big data cleansing, in order to remove as much noise as possible.

  • We are going to parse all queries, detect entities such as dates: “tomorrow”, “in three hours”, etc… They will all be aggregated under the same notion of TIME. The idea is that we consider questions such as “I want a room for tomorrow” or “I want a room in three hours” as similar for the building of an intent.
  • All exceptional vocabulary will be removed, in order not to pollute the dataset
  • Very short or very long queries are removed, as most of them are not “real” questions
  • Queries with only low-frequency tokens are judged unessential and removed as well

Machine Learning

In order to deal with the curse of dimensionality, we have to reduce the number of dimensions to compute. We do this by reworking the dimensions to focus on the most important ones, which are then enough to represent most of the data. This process is realized through Latent Semantic Indexing, with the objective of keeping enough dimensions to reach an inertia of 70%.

The data set is then simplified enough to be able to learn from it, and find interesting themes.

Algorithm Choice

We did a lot of R&D in order to find the most relevant algorithms for our purpose, and finally decided to work with Hierarchical Clustering which was giving us the best results. This algorithm belongs to the class of unsupervised Machine Learning algorithms, and will try to create “groups” (clusters) of queries which seems semantically similar, and can be associated together.

We use an agglomerative approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

The Hierarchical Clustering is particularly interesting in our context because it does not need to “manually” provide the desired number of clusters. Indeed, depending on the bot or the variety of subjects discussed by users, the number of clusters can vary. It would be quite difficult to decide without taking risks regarding the final quality of the clusters.

Stopping Criterion

One of the challenges of Hierarchical Clustering is to find an adequate Stopping Criterion, which will ensure clusters quality. Usually, people use a distance criterion, or a number criterion.

  • distance criterion: stops creating new clusters when members of the cluster are too distant from each other
  • number criterion: stops creating new clusters when a maximum number of clusters has been reached

We opted for the distance criterion, as the number of clusters can vary from one bot to another, depending on the volume of data and the theme of the bot. We then experimented with different kind types of distances (euclidean, cosine), and different data sets, in order to approximate the most relevant stopping criterion, and validate our choices.

Clustering Computing

Once all the queries have been cleaned, we launch our Clustering Algorithm upon them.

These clusters will vary in terms of size, and the importance of the queries they contain (frequent or not), and density. We will use these characteristics later to build a quality ordering set. But first, we have to qualify these clusters: would they be useful to enrich existing intents, or to build new ones? In order to know, we have to go through another computing step.

Queries Matching

Inside each one of the clusters, we go through each query and use the NLP Engine to see if there’s a match. We will then have two type of clusters:

  • Those with few or no matches: are good candidates for the creation of new intents, which will efficiently enlarge the scope of your bot
  • Those with many matches: are good candidates to provide new formulations for an existing intent in order to improve it

Queries Ordering

Inside each cluster, the queries are ordered using two parameters:

  1. first, the queries matching an intent, by matching score: it is important to evaluate the risks of creating a new intent, or improving an existing one. It is also interesting to challenge the existing intents, and see if some of their formulations should be removed, moved or adapted
  2. then, the other queries are ordered from the ones with the most occurrences, to the ones with the least amount of occurrences

Clusters Ordering

Lastly we ordered the list of Clusters, so the most relevant can be displayed first. The most relevant Clusters are the ones that contain queries with important volumes, and good density.

The density represented is how “focused” the clusters are: are queries very focused on the exact same theme, or do they diverge.


The hard work is done: Clustaar AI understood what improvements should be made, now you just have to push the button!