PTFS Best Practices in Speech to Text ECM

The inherent problem with converting and managing audio files in an ECM platform is the inability to search across the actual content of the audio file as well as getting to the specific location in the file where the area of interest is present. As a solution to these problems, PTFS has developed best practices for speech to text conversion and subsequent management of this content in the ArchivalWare ECM platform.

The first step in the process is determining if the audio files need to be scrubbed for in-audible background noise. In many cases with human speech recordings, background noise (inherent in old and low quality audio) provides a challenge for the speech to text engine. PTFS possesses an easy to use cleanup tool that will remove out background noise in human text audio recordings, while leaving and enhancing the spoken audio words.

Once a clean version void of these sounds is obtained, speech recognition products are utilized to avoid manual transcription. Each product offering has specific benefits and flaws. To minimize the individual shortcomings from each product, while maximizing the benefits, multiple engines are utilized and a voting algorithm is developed to utilize the best text output from multiple pieces of software. This engine voting technology was common place in the late 1990’s, the early days of high performance OCR.

As the conversion process is performed, pointers to the digital audio file are created. A text search of the data can be subsequently performed and the operator can click on the text linked to the appropriate place in the audio recording. This allows efficiencies to be gained by more rapidly jumping over major portions of an audio file when there is no conversation of interest.

This process is especially groundbreaking because it allows for the first time, a semi-automated declassification tool for audio files.