Speech Rec: The Gateway to CRM?

Add a note hereA lot of attention has been focused lately on the pros and cons of CRM tools — much from the point of view of “which tool can I buy and what is it going to do for me?” One of the things that’s been overlooked is that for CRM to function properly, it needs to receive constant, meaningful input from an outside system. Input that acts as the raw meat for its real work, which is the analysis of what the customer wants and whether he got it.

Add a note hereWe’re used to thinking of two main data inputs from the customer. One, IVR, has been around forever and is widely understood. There are not too many new ways to construct a branching tree routing script, or to parse an incoming account number. Application development for IVR has been made easy enough for an intelligent non-expert to do a credible job putting together working, useful apps.

Add a note hereThe other input mode is the very rich, multi-textured Web interface. This has developed so recently that there are many techniques for feeding customer input from a website through to a CRM system and into a contact center. All the many data-gathering and connection modes fall into this camp: email, click-to-call, text chat, and Web-based ecommerce forms.

Add a note hereAlmost off the radar, though, is another technology that has reached strong maturity. Speech recognition, which is making big waves out in the consumer technology world, is still seen as something of an afterthought in the CRM/contact center world. In fact, it’s a lot more than just a replacement for touch-tone input.
Add a note hereWhy is it different? I’d like to argue that it creates the opportunity for a completely different type of interaction than does IVR. IVR, despite having “interactive” as part of its name, is really a one-way channel. Customers enter ID and pull out a small subset of data that pertains to them. Remember why it caught on in the first place: it automated the dumping of small bits of repetitive info to the caller, keeping those calls away from expensive agents. Relatively low tech, profoundly efficient, easy to diagram into an existing call flow — the very definition of no-brainer.

Add a note hereBut when you add speech to the same call, you add several orders of complexity. Forget about the complexity of the tech that you need to operate it; instead, concentrate on the complexity of the information flow back and forth between you and the customers. Instead of asking questions that get answered only in numeric digits, you can draw out responses that are far more nuanced and subtle. Few people are going to enter an address using keys that have three letters apiece. Asking for a stock quote using the Schwab IVR is hard enough — each letter of the alphabet is assigned a two key code. They had to send customers little wallet cards to remind them of the alphabetic cipher just to be able to retrieve stock quotes. This was in 1996, before they were heavily online, and before they installed a speech rec system.

Add a note hereStock quotes are one of those really basic information retrieval apps that the Web does really well as a replacement for the phone system. But you can do things that are so much richer, limited only by the system resources available to parse the speech, and the power of the recognition engine.

Add a note hereSpeechWorks, one of the companies with powerful speech rec tools, says that to run high-quality speech applications, you need four things:
§  Add a note herestate-of-the-art technology;
§  Add a note herehigh-level building blocks (essentially this means that the recognition engine contains prefabricated modules for handling certain types of speech);
§  Add a note heretight integration on robust telephony platforms; and
§  Add a note heretools for analyzing and tuning applications.
Add a note hereThat’s to make the speech rec work; to make the application it’s running a success as well, you also need
§  Add a note hereappropriate application development procedures;
§  Add a note herean understanding of what your app is ultimately supposed to do for you, in terms of what would make it a success; and
§  Add a note herea good user interface.

Add a note hereSpeechWorks says that based on several of their customer installs, the average cost of an agented call per minute is $1.50; by contrast the average cost of a speech-rec attended call is just $0.25-$0.35. That’s not too surprising, and they rightly say that the speech-rec costs vary based on the underlying contract the call center has with its local telcos for long distance traffic.

Add a note hereAt one of their installations, the length of customer interactions was reduced from 12.5 minutes through touch tone to two to three minutes using speech. This goes right to the question of speech rec as a rough equivalent to IVR. If anything like that reduction can be repeated across the board, or even in a significant minority of applications, then speech looks a lot better as a way into the database despite the higher level of technology it needs to implement.

Add a note hereFrom a call flow and design point of view, though, it would be a mistake to think of speech rec as “talking IVR.” When I speak of a richer interaction, I mean this: you don’t have to delineate options one through four and leave the person scratching his head to figure out where his particular problem fits into your schema. The sophisticated application will acknowledge that there are ambiguities of response, and will tailor prompts to try to zero in on what the customer needs without being as linear as IVR.

Add a note herePeople who are expert at using the system can shortcut through it, for example, or can barge in (that is, talk while the system is talking and have it know that it should stop and listen).

Add a note hereThose points apply to the IVR/speech rec comparison, which is how you look at speech rec when its main purpose is to identify the person and route to the right agent. But again, the interaction can be richer, used to gather information that you didn’t have already. Once you’ve used the system to identify the caller, you can ask questions that have more detailed answers, even questions that are tailored to a particular audience or context. The stronger the speech rec engine, the more you’ll be able to parse out of what a caller says. Again, its strength in the long run is not going to be that it gets the information to the caller at a lower cost; rather, it’s going to be that it gets information from the caller to you in a more meaningful and spontaneous way. It’s easier to say something into a phone than it is to fill out a survey and mail it back, or even to fill out a form on a website.

Add a note hereWhen I look at the spectrum of CRM-style applications that are rolling out over the next two years (and it’s a long list), the common element is the need for an information channel that brings information reliably from the customer inside the company. We’re used to customers calling when they want something, and parsing the data that comes with the call is so old-hat it’s never even mentioned anymore. We’re quickly getting used to customers using email and Web for interactions.

Add a note hereIn a conversation with a very smart Dictaphone executive recently, it was opined that we’re moving inexorably to a point where all transactions are recorded, stored, and analyzed using advanced data mining techniques. He was speaking from the point of view of quality assurance and agent performance, as well as customer satisfaction measurement. It strikes me that if universal recording and archival does arrive, the collection of speech rec data, added to the analysis, could be a valuable (if a bit spooky) addition.

Add a note hereCRM, so much a buzzword now, is an idea that stands in for a range of future technologies, some of which will catch on and some of which won’t. It’s the theory that matters, that information flows freely between all systems and is parsed somewhere inside the organization, probably away from the call center. The theory of CRM will depend on controlling the most customer information at the lowest cost. Right now it seems that speech recognition has a good shot at replacing IVR as the primary information gathering tool for phone-only interactions.

Financial Services Out In Front | Speech Recognition

Add a note hereVisa International is betting that “v-commerce,” the heinously-named catch-all term for telephone transactions enhanced by speech-recognition, is a big part of their future. 

Add a note hereSince 1995, Visa International has been an investor in Nuance, and has participated in pilot programs to add automated speech to cardholder transactions — basic things like card activation, card replacement, and travel planning.

Add a note hereThese are things that, like all good IVR apps, don’t need an agent for the basic, introductory information gathering stage of the transaction. Only when things get more complicated, or the consumer gets confused and tries to bail on the automated system, does an agent really become necessary.

Add a note hereNow Visa and Nuance are working on developing apps for use by the member banks for customers to self-serve over the phone. It’s got some important names from the call center field, but it doesn’t have any speech companies other than Nuance. Nuance’s speech rec is as good as anyone’s, but it’s not the only one, so the impact of an association like this is in the power of the end-users, like Visa, to push a particular technology.

Add a note hereSeveral v-commerce applications are currently available for Visa member banks including speech banking and automated bill payment.

Add a note hereAnother really interesting real-world application of speech rec is the installation that went into Ameritrade (the brokerage firm) in early 2000. Ameritrade decided to take their existing (and recently installed) speech rec front-end and expand its capacity.

Add a note hereAmeritrade’s system lets their brokerage customers check their accounts and act on their investment decisions via telephone using natural speech recognition. The speech-enabled system was introduced to Ameritrade customers on March 10, 2000, and handled more than 650,000 calls in its first nine trading days. That huge response was a major factor in Ameritrade’s recent decision to increase the port capacity of its InterVoice-Brite call automation system. The expansion is planned for this month.

Add a note hereSince the speech-enabled system was implemented, Ameritrade’s call completion volume has significantly increased. The system currently handles an average of more than 85,000 calls on trading days. More than 40% of callers are already opting to use the speech recognition capabilities. This self-service transaction option gives Ameritrade customers a faster, more convenient method for rapid stock transactions while enabling the company to increase the efficiency of its call center by reducing call wait times and freeing agents to process more complex customer service requests.

Add a note hereThe system isn’t pure-speech only; rather, their call flow involves an interesting hybrid of traditional user-entered touch tone digits and speech input. To use the self-service stock trading system, callers enter their account code and personal identification number, using traditional keypad entry. Then, the system asks for which stock the caller would like a transaction. Rather than using a tedious touch-tone entry method, callers speak the company name or stock symbol in a natural voice.

Add a note hereIt runs on InterVoice-Brite’s OneVoice platform and uses technology from SpeechWorks. The system’s vocabulary exceeds 60,000 words and even recognizes popular stock nicknames, such as “Big Blue” for IBM. (Though I believe you’d have to be really wanting in common sense to actually try to trade stock using that kind of nickname.)

Add a note hereInterVoice-Brite has also put in speech-enabled stock systems for DMG & Partners Securities, Lim and Tan Securities, Keppel Securities in Singapore, and Hyundai Securities in Korea.

Add a note hereThe system will be able to interpret more than 80% of first and last names in the United States. That’s going to make the system more viable for applications like health insurance benefits verification, travel reservation verification and cancellation, and inquiry applications where callers are asked to leave their name and number for an informational callback.

Add a note hereNew multilingual capabilities simultaneously support two or more languages on a single system. These multilingual capabilities will enable application developers to create self-service applications in 10 different languages. Callers can also respond to prompts in their own dialect depending on geographic location and regional demographics.

Add a note hereLanguages supported now include English (US, UK, Australian and Singaporean), Spanish (Latin and US), French (European and Canadian), Chinese (Mandarin) and German. The system features a vocabulary with more than 70,000 words and supports an increased number of ports to support higher call volumes. Additional tools include custom vocabulary development, industry-specific grammar libraries and a self-tuning feature.