Quick & Dirty python: Converting a text file to audio (.wav)

This is a bit of a tangent but for some crazy reason, I wanted to convert some text to audio so I could listen to it while I drive. A quick Google search left me without any freeware that could handle the 53 page document–there are some cool websites that do text to mp3 like vozme and YAKiToMe! but they didn’t convert the whole document. I then found pyTTS, a python package that serves as a wrapper to the Microsoft Speech API (SAPI) , which has been in version 5 since 2000. But I didn’t easily find a version of pyTTS for python 2.6. So I decided to see if I could roll my own.

As it turns out, getting python to talk using SAPI is relatively easy. Reading a plain text file can be done in a few lines.

from comtypes.client import CreateObject

infile = "c:/temp/text.txt"

engine = CreateObject("SAPI.SpVoice")

f = open(infile, 'r')
theText = f.read()


And it wasn’t that much more to have it write out a .wav file:

from comtypes.client import CreateObject

engine = CreateObject("SAPI.SpVoice")
stream = CreateObject("SAPI.SpFileStream")

infile = "c:/temp/text.txt"
outfile = "c:/temp/text4.wav"
stream.Open(outfile, SpeechLib.SSFMCreateForWrite)
engine.AudioOutputStream = stream

f = open(infile, 'r')
theText = f.read()



And with that chunk of code, I was able to convert my 54 page document into a 4 hour long .wav file (over 600 MB) that I used another software package to convert to .mp3 (200 MB). The voice is a bit robotic but not too bad, I just hope the content that I converted (a database specification standard) doesn’t put me to sleep while I drive.


3 thoughts on “Quick & Dirty python: Converting a text file to audio (.wav)

  1. Hello!

    My MBA group and I have started a blog on GIS and its various applications in business. While we aren’t as technically focused as you are on some GIS aspects, we have really enjoyed getting to see just how much GIS has helped with!

    Please give our blog a look – let us know what you think!




  2. Copying and pasting your above code almost worked for me. I also needed the line below in order to make it run. Very handy in some testing of converting the result of intersected features into an audio description of that area… Thanks

    from comtypes.gen import SpeechLib

    • Thanks for the info, Timdine. I wonder if that is an OS thing since I just doubled-checked and I didn’t need to make any modifications (I’m on a Win 7 machine).

