My team at work has recently been playing with chatbots, trying to understand the technology available, how to use it, and the most appropriate places to use one.
Chatbot technology
Chatbots tend to have at least two pieces: a framework for the bot logic itself and channels where the bot interacts with users. Many frameworks have supported channels and APIs that allow custom connections to unsupported channels.
A third piece common, but not required, is artificial intelligence. A common one is Natural Language Processors (NLP), which takes text and pulls intents and values according to some machine learning, but there are others for speech analysis, parsing FAQ-style data for conversational questioning and answering, and more.
We surveyed the landscape and found lots of products in the chatbot space, many that blur the edges between the framework, channel, and artificial intelligence pieces. A sampling of the pieces we found are:
- BotKit
- Amazon Lex
- IBM Watson
- Facebook Wit
- Google DialogFlow
- Microsoft Bot Framework
- Microsoft LUIS
- Microsoft QnA Maker
- Microsoft Text Analytics API
- Microsoft Bing Spell check
- Microsoft Bing Speech API
- Microsoft Bing Web Search API
Our team settled on Microsoft Bot Framework for the framework and Microsoft LUIS for the NLP. We're a Microsoft environment, so these were natural extensions of the products and services we already use.
Things that are easy
After spending some time trying to build some concrete solutions with chatbots, we discovered a few things that are easy to do:
- Workflows with discrete steps. Any user path that has sequential, distinct steps is straightforward with Microsoft DialogFlow. It has a prompts for different types of data, state management, and input validation all out of the box.
- FAQ. Microsoft's QnA Maker makes quick work of any kind of FAQ-style data source, and then uses machine learning to allow the chat bot to handle natural language questions and return relevant answers.
- Discerning general intent. With LUIS, figuring out, broadly, what the user is wanting to do is easy.
- Handling common tasks that have a concrete action. Any time you can say "when the user says something along the lines of [x], do [y]" is easy.
Things that are hard
Outside of the easy things, there are many things that we want to do, but are hard:
- Context switching. Let's say a user is in the middle of a DialogFlow, one of the easy things, but they respond with something that indicates they want to do something else. What happens here? Do you make them finish the DialogFlow? Do you cancel the DialogFlow and lose all of their progress? Do you save the state and return them to it when they're done with the interruption? Do you ask if they want to return when they're done? The hard part here is that there isn't a best approach, all of these could be viable options depending on the user and their context.
- State management. Our natural conversations are full of micro-state changes. We talk about something, that leads to something else, which then goes back to a previous comment. Tracking the state that matters versus the state that doesn't is hard.
- Revising intent. Part of state management, anytime the user has sent a message that has intent, but then wants to revise that intent in a follow up message, this is hard!
- Natural Language Processing of items with small differences. LUIS is impressive and gives you a lot right out of the box, but training it to differentiate between similar but different intents is difficult.
- Any tasks that require analysis or nuance. Chatbots cannot replace humans. Anything that requires a human to analyze or understand the intent cannot be done with a chatbot (obviously, right?).
Early conclusions
We're still at the beginning of this whole thing, but some conclusions stand out already. Chatbots are good at handling common, easily understood tasks. As a front-line for customer service, or as a different interaction model around a knowledge base, they can offer users a new, and perhaps better, way of getting the information they need. That's what we're keeping an eye out for: user interactions that currently require a person to deliver a concrete answer or solution to a concrete question.