As you might already know, Python is a wonderful programming tool because it allows us to do virtually anything! This also means that we can create our own software. In this tutorial, we will learn to synthesize speech, get Python to read pdfs, even translate them for us, and then read them to us.

What we’re going to do here is to get Python to read us a pdf, and translate it for us. First, we’ll try to create an English audiobook. As such, the first thing we must logically do is to extract the text from the pdf. For this, we use the module known as tika. As usual, to install Tika, one conjures pip.

Tika is a module used for content detection and extraction. Once installed, we need to import the parser object.

Next, we need the from_file() method, which takes two arguments maximally. The first argument is the name of the pdf file, and the second argument is called additional, which requests the type of data wanted. Leaving the additional field blank will request everything, from metadata to content. Here, meta returns the metadata, text returns the text, and param xmlContent return the XML content.

raw = parser.from_file(‘comment.pdf’)

Once we have the data, we need to extract just the text. We do this by selecting the “content” from raw.

raw_text = raw[‘content’]

The problem with Tika is that it can crash when there are too many pages. So, let’s use another method, as well. If the PDF is short and sweet, by all means, do use Tika. However, you can also use the PyPDF2 module.

So let’s begin:

First, we open the document of interest and read from it using the open() method and the PdfFileReader() class. The open() method takes two arguments here: the first is the name of the file to be read, and the second is the mode to read in. Here, “rb” stands for read binary. The PdfFileReader class then takes on the pdf_document.

pdf_document = open(“welcome.pdf”, “rb”)


pdf_document_read = PyPDF2.PdfFileReader(pdf_document)

Then, we collect the total number of pages using the numPages method. We will do this since we will create a for loop that goes from page 1 to the last page, reading each page as it goes.

number_of_pages = pdf_document_read.numPages

We then begin a for loop to count down each page.

for page in range(1, number_of_pages):

Then, we need to get one page by using the getPage() method, and extract the text from within using the extractText() method.

one_page = pdf_document_read.getPage(page)


raw_text = one_page.extractText()

We first initialize the module using init().

We can set the voices, volume, and rate using engine.getProperty(). The setProperty() takes two values: The property to change and its value. In this case, I have set the voices to a female(voices[1].id), with maximal volume (1) and a rate of 128.

voices = engine.getProperty(‘voices’)


engine.setProperty(‘voice’, voices[1].id)

volume = engine.getProperty(‘volume’)


engine.setProperty(‘volume’, 1.0)

rate = engine.getProperty(‘rate’)


engine.setProperty(‘rate’, 128)

We then use engine.say() to synthesize speech and get the text read aloud.

engine.say(raw_text)

engine.runAndWait()

The complete code would look something like this:

import PyPDF2

pdf_document = open(“welcome.pdf”, “rb”)


pdf_document_read = PyPDF2.PdfFileReader(pdf_document)


number_of_pages = pdf_document_read.numPages

for page in range(1, number_of_pages):


    one_page = pdf_document_read.getPage(page)


    raw_text = one_page.extractText()


    import pyttsx3


    engine = pyttsx3.init()


    voices = engine.getProperty(‘voices’)


    engine.setProperty(‘voice’, voices[1].id)


    volume = engine.getProperty(‘volume’)


    engine.setProperty(‘volume’, 1.0)


    rate = engine.getProperty(‘rate’)


    engine.setProperty(‘rate’, 128)


    engine.say(raw_text)

    engine.runAndWait()

In the earlier example, we had an English text spoken out in English. Now, we’ll try to translate the text into another language and get the translated text read aloud. In cases when translating a text, the first part of the code is similar to the previous section. The code all the way up to and including the PyPDF2 code is required. However, once the for loop starts, we will change the code a bit. Here, we need to add the translation and get it to speak it in the accent of the destination language.

First, install googletrans.

Now, let’s begin translating the text.

from googletrans import Translator

Next, we call upon Translator().

translator = Translator()

We use the translate() method. Here, we input the first argument — the text to translate — and the destination language — the language onto which the text must be converted. In this case, I have chosen to translate the text into French (or else ‘fr’).

translated = translator.translate(raw_text, dest=‘fr’)

Once we have translated the text, we need to extract the text portion.

translated_2 = translated.text

The latter will translate and store the translated text into the variable translated_2. Now, we need a module that will translate and store the spoken text into an mp3. For this, we need gTTS and PlaySound:

pip install gTTS


pip install playsound

import gtts

from playsound import playsound

The gtts.gTTS() class has a few arguments. However, here we will use only two arguments. The first argument is the text to be read, and the second is the language to read the text in. In this case, I have chosen to read the text in French (fr). The reason why we are using gTTS here instead of pyttsx3 is because of the great accents that go with the read paragraph. So, when a text is read in French, with gTTS, it will sound like a French person is reading the text instead of a native English speaker.

text = gtts.gTTS(translated_2, lang=“fr”)

Next, we save the spoken text into an mp3. In this case, I have chosen to name it text.mp3:

In order to play the saved mp3, we use playsound():

The complete code would look something like this:

import PyPDF2

pdf_document = open(“welcome.pdf”, “rb”)


pdf_document_read = PyPDF2.PdfFileReader(pdf_document)


number_of_pages = pdf_document_read.numPages

for page in range(1, number_of_pages):


    one_page = pdf_document_read.getPage(page)


    raw_text = one_page.extractText()

    from googletrans import Translator


   


    translator = Translator()


    translated = translator.translate(raw_text, dest=‘fr’)


    translated_2 = translated.text

    import gtts


    from playsound import playsound

    tts = gtts.gTTS(translated_2, lang=“fr”)


    tts.save(“text.mp3”)


    playsound(“text.mp3”)

About the author

<img alt="" data-del="avatar" data-lazy-src="https://kirelos.com/wp-content/uploads/2021/07/echo/k-150×150.png60f5e5d2a2026.jpg" height="112" src="data:image/svg xml,” width=”112″>

Kalyani Rajalingham

I’m a linux and code lover.