A high signal to noise ratio, visual clues by looking at mouth and body language, a good understanding of the subject material as well as the language being used,audio frequency notch filtering and audio noise suppression filters can do a lot to convert otherwise unintelligible audio into suitable and accurate closed captioning.
However, if one just listens to the speech sample about ten times, the resulting transcription is usually close enough for most work. People who have lost their sense of sight often develop enhanced auditory discrimination skills and can be used in special circumstances
.
When I was doing the transcripts of the tapes of Stan in his backyard that had Charlie, Stan and Marylin in the background, the signal to noise ratio was so bad from the sound of the noise from the buggy's engine running that some speech was unintelligible and will likely always remain so,. Audio pareidolia is sometimes a help and and other times a hindrance.for proper transcript. Another technique is a consensus of transcript approach.
Additionally an aspect of speech to text rendering are decision rules regarding the uhs, pauses and slang and how they will be rendered. and punctuated. For example ,should c'mon be rendered as come on
or i'll see y'all be translated into I will see you all?