A lot of attention has been focused lately on the pros and cons of CRM tools — much from the point of view of “which tool can I buy and what is it going to do for me?” One of the things that’s been overlooked is that for CRM to function properly, it needs to receive constant, meaningful input from an outside system. Input that acts as the raw meat for its real work, which is the analysis of what the customer wants and whether he got it.
We’re used to thinking of two main data inputs from the customer. One, IVR, has been around forever and is widely understood. There are not too many new ways to construct a branching tree routing script, or to parse an incoming account number. Application development for IVR has been made easy enough for an intelligent non-expert to do a credible job putting together working, useful apps.
The other input mode is the very rich, multi-textured Web interface. This has developed so recently that there are many techniques for feeding customer input from a website through to a CRM system and into a contact center. All the many data-gathering and connection modes fall into this camp: email, click-to-call, text chat, and Web-based ecommerce forms.
Almost off the radar, though, is another technology that has reached strong maturity. Speech recognition, which is making big waves out in the consumer technology world, is still seen as something of an afterthought in the CRM/contact center world. In fact, it’s a lot more than just a replacement for touch-tone input.
Why is it different? I’d like to argue that it creates the opportunity for a completely different type of interaction than does IVR. IVR, despite having “interactive” as part of its name, is really a one-way channel. Customers enter ID and pull out a small subset of data that pertains to them. Remember why it caught on in the first place: it automated the dumping of small bits of repetitive info to the caller, keeping those calls away from expensive agents. Relatively low tech, profoundly efficient, easy to diagram into an existing call flow — the very definition of no-brainer.
But when you add speech to the same call, you add several orders of complexity. Forget about the complexity of the tech that you need to operate it; instead, concentrate on the complexity of the information flow back and forth between you and the customers. Instead of asking questions that get answered only in numeric digits, you can draw out responses that are far more nuanced and subtle. Few people are going to enter an address using keys that have three letters apiece. Asking for a stock quote using the Schwab IVR is hard enough — each letter of the alphabet is assigned a two key code. They had to send customers little wallet cards to remind them of the alphabetic cipher just to be able to retrieve stock quotes. This was in 1996, before they were heavily online, and before they installed a speech rec system.
Stock quotes are one of those really basic information retrieval apps that the Web does really well as a replacement for the phone system. But you can do things that are so much richer, limited only by the system resources available to parse the speech, and the power of the recognition engine.
SpeechWorks, one of the companies with powerful speech rec tools, says that to run high-quality speech applications, you need four things:
§ state-of-the-art technology;
§ high-level building blocks (essentially this means that the recognition engine contains prefabricated modules for handling certain types of speech);
§ tight integration on robust telephony platforms; and
§ tools for analyzing and tuning applications.
That’s to make the speech rec work; to make the application it’s running a success as well, you also need
§ appropriate application development procedures;
§ an understanding of what your app is ultimately supposed to do for you, in terms of what would make it a success; and
§ a good user interface.
SpeechWorks says that based on several of their customer installs, the average cost of an agented call per minute is $1.50; by contrast the average cost of a speech-rec attended call is just $0.25-$0.35. That’s not too surprising, and they rightly say that the speech-rec costs vary based on the underlying contract the call center has with its local telcos for long distance traffic.
At one of their installations, the length of customer interactions was reduced from 12.5 minutes through touch tone to two to three minutes using speech. This goes right to the question of speech rec as a rough equivalent to IVR. If anything like that reduction can be repeated across the board, or even in a significant minority of applications, then speech looks a lot better as a way into the database despite the higher level of technology it needs to implement.
From a call flow and design point of view, though, it would be a mistake to think of speech rec as “talking IVR.” When I speak of a richer interaction, I mean this: you don’t have to delineate options one through four and leave the person scratching his head to figure out where his particular problem fits into your schema. The sophisticated application will acknowledge that there are ambiguities of response, and will tailor prompts to try to zero in on what the customer needs without being as linear as IVR.
People who are expert at using the system can shortcut through it, for example, or can barge in (that is, talk while the system is talking and have it know that it should stop and listen).
Those points apply to the IVR/speech rec comparison, which is how you look at speech rec when its main purpose is to identify the person and route to the right agent. But again, the interaction can be richer, used to gather information that you didn’t have already. Once you’ve used the system to identify the caller, you can ask questions that have more detailed answers, even questions that are tailored to a particular audience or context. The stronger the speech rec engine, the more you’ll be able to parse out of what a caller says. Again, its strength in the long run is not going to be that it gets the information to the caller at a lower cost; rather, it’s going to be that it gets information from the caller to you in a more meaningful and spontaneous way. It’s easier to say something into a phone than it is to fill out a survey and mail it back, or even to fill out a form on a website.
When I look at the spectrum of CRM-style applications that are rolling out over the next two years (and it’s a long list), the common element is the need for an information channel that brings information reliably from the customer inside the company. We’re used to customers calling when they want something, and parsing the data that comes with the call is so old-hat it’s never even mentioned anymore. We’re quickly getting used to customers using email and Web for interactions.
In a conversation with a very smart Dictaphone executive recently, it was opined that we’re moving inexorably to a point where all transactions are recorded, stored, and analyzed using advanced data mining techniques. He was speaking from the point of view of quality assurance and agent performance, as well as customer satisfaction measurement. It strikes me that if universal recording and archival does arrive, the collection of speech rec data, added to the analysis, could be a valuable (if a bit spooky) addition.
CRM, so much a buzzword now, is an idea that stands in for a range of future technologies, some of which will catch on and some of which won’t. It’s the theory that matters, that information flows freely between all systems and is parsed somewhere inside the organization, probably away from the call center. The theory of CRM will depend on controlling the most customer information at the lowest cost. Right now it seems that speech recognition has a good shot at replacing IVR as the primary information gathering tool for phone-only interactions.