Psyber Psychiatry, Commentary
Speech recognition programs
Faster, more accurate, easier to use.
Speech recognition technology—once too slow and inaccurate for clinical practice—is increasingly helping psychiatrists record patient notes, dictate letters and lengthy reports, and operate their computers.
Is speech recognition right for your practice? This article will help you decide by reviewing available programs and offering insights on choosing one for your practice.
Speaking of progress
Clinicians in radiology and pathology were among the first to use limited programs that employed voice commands and short phrases. Shortcuts that told the computer to type boilerplate passages have been available for more than 25 years. Early speech recognition programs required users to speak with a stilted voice and pauses between words.
By 1998, several “continuous speech” systems were available, but most were not suitable for a solo or small group psychiatric practice. Some included diagnosis-specific report templates that could be customized for initial assessment and progress notes. 1 These programs were useful for dictating short memos or e-mails but not for composing longer documents because of awkward editing and difficulties with punctuation, formatting, and on-screen navigation.2
Today’s programs recognize natural speech without pauses between words. As the clinician speaks into a microphone at a normal pace, text is typed at speeds approaching 160 words per minute with 90% to 98% accuracy. New versions of some programs save the dictation in audio files, allowing an assistant to listen to the dictation later and edit the transcription. These programs also eliminate superfluous utterances and perform macro commands, including handling e-mail.
Newer speech recognition products are much easier to use than older programs. They can be mastered within days and are cost-effective for solo or small group practices. Single-user programs range in cost from free (included with the Windows XP or newer Macintosh operating systems), to approximately $175 for ViaVoice Professional, to about $800 for the highly recommended Dragon Naturally Speaking Version 8.0.3
How to get started
Speech recognition programs require a powerful computer with processor (microchip) speed >1 gigahertz, random access memory (RAM) ≥ 512 megabytes, and a platform no older than Windows 2000 or Mac OS X Version 10.1.
After the software is installed, some clicking and keystroking may still be necessary. Learning when to talk or type can help users increase efficiency and prevent repetitive strain injury.
Critical factors for successful use include user motivation and training (or consultation with a reseller), a specialized vocabulary and language model, and a high-quality sound card and microphone (the most sophisticated hardware available is recommended, and this usually must be purchased separately).
Most speech recognition programs allow different users to train on the same computer. Users can dictate directly into word-processing applications, and some products allow dictation into other office programs.
Dragon Naturally Speaking Professional Medical Solutions Version 8.0 (www.dragontalk.com/DNS_MED_ PRO.htm) is widely recognized for its performance, high accuracy, and easy user interface. Users can dictate directly into a PC for immediate transcription or into selected digital recorders or personal digital assistants for transcription later. A file of your recorded speech can be saved along with the computer-transcribed document for future proofreading.
The program can process previously completed reports to customize word-use patterns and build a personalized vocabulary. The optional but useful medical vocabulary includes many terms unique to psychiatry and psychology.
Dragon responds to voice commands and macros and features online training and user guides. It works in most Windows-based applications but is not available for Macintosh. 4
IBM’s ViaVoice Release 10 (uk.scansoft.com/viavoice) is available in six languages and multiple levels and comes in versions for Windows and Macintosh platforms (also optimized for G4). Its manufacturer offers support for selected digital handheld recorders.
ViaVoice can analyze previous documents and save recorded dictation, is strong on voice navigation, and recognizes file names and tool bar buttons. Users can open a file by speaking its name or activate a command by saying it.
There are some drawbacks, however. Specialized medical vocabularies must be obtained from outside the company, creating additional technical obstacles or requiring developer assistance. Also, several reviewers do not consider ViaVoice as robust, accurate, or fast as Dragon for lengthy or medical dictation.5
Philips SpeechMagic. (www.speech.philips.com) Philips has developed sophisticated tools for document creation, transcription, and commands that integrate with larger information systems. The products are network-based and scalable, essentially designed for large groups or medical centers. The programs cannot be purchased from Philips but are installed by its distributors and software vendors.
Microsoft Office XP (office.microsoft.com) includes an alternative user input speech recognition feature within the operating system that offers dictation and voice command modes. It works with any office program and offers a “taste” of speech recognition, but with extremely limited function. It requires awkward switching between dictation and commands, does not filter out extraneous noises, and has no specialized medical vocabulary.
Apple Speech Recognition (www.apple.com/macosx/features/speech/), included in Mac OS X, is rudimentary and is appropriate primarily for controlling a computer by voice commands. It requires no training and can convert English text to spoken words.
Speech recognition programs that can be integrated with telephones, wireless phones, and tablet screens are in development. Microsoft has released speech-control software for Pocket PC devices that run on Windows Mobile 2003 and recognition software for navigating the Web.
Before long, personal digital assistants with built-in speech-recognition technology may respond to spoken questions or commands with a computer-synthesized voice, thus making a clumsy stylus or keypad outdated.
- Huang MP, Alessi NE. The Internet and the future of psychiatry. Am J Psychiatry 1996;153:861-9.
- Fulton S. Chart comparing features of Windows-based continuous speech programs. www.out-loud.com/features.html.
- Taintor Z. Computers, the patient, and the psychiatrist. In Dickstein LJ, Riba MB, Oldham JM. Review of psychiatry. Vol 16. Washington, DC: American Psychiatric Press; 1997.
Dr. Green reports no financial relationship with any company whose products are mentioned in this article. The opinions expressed by Dr. Green are his and do not necessarily reflect those of Current Psychiatry.
The author thanks Dan Newman, author of several books and video guides on speech recognition, and Len Zullo, chief executive officer, Assistive Technologies Inc., for their help with researching this article and personal communication regarding current product features and comparisons.
1. Leipsic JS. Computer speech recognition in psychiatry. Psychiatric Times 1998;15(8):54-6.
2. Miastkowski S. Can we talk? PC World January 1999;127-36.
3. Manes S. Speech! Speech! Forbes February 28, 2005. Available at: http://www.forbes.com/forbes/2005/0228/054_print.html. Accessed June 21, 2005.
4. Dragon Naturally Speaking Professional Users Guide Version 6. Burlington, MA: ScanSoft 2002;1-25,125-40,185-8.
5. Newman D. Talk to your computer: speech recognition made easy. Berkeley CA: Waveside Publishing; 2000;9-41,122-3.
Dr. Green is a distinguished fellow, American Psychiatric Association, and chair of information services, San Diego Psychiatric Society. Psyber Psychiatry, edited by John S. Luo, MD, is published monthly at www.currentpsychiatry.com.