Speech recognition is a technique that converts the human voice to text. This is a very important concept in the Artificial Intelligence world where we have to give commands to a machine like a driverless car, etc.

We are going to implement the speech to text in Python. And for this, we have to install the following packages:

  1. pip install Speech Recognition
  2. pip install PyAudio

So, we import the library Speech Recognition and initialize the speech recognition because without initializing the recognizer, we can’t use the audio as an input, and it will not recognize the audio.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/06/echo/speech-to-text-01.png" data-lazy- height="108" src="data:image/svg xml,” width=”1004″>

There are two ways to pass the input audio to the recognizer:

  1. Recorded audio
  2. Using the default Microphone

So, this time we are implementing the default option (microphone). That’s why we are fetching the module Microphone, as shown below:

With linuxHint.Microphone( ) as microphone

But, if we want to use the pre-recorded audio as a source input, then the syntax will be like this:

With linuxHint.AudioFile(filename) as source

Now, we are using the record method. The syntax of the record method is:

Here the source is our microphone and the duration variable accepts integers, which is seconds. We pass the duration=10 that tells the system how much time the microphone will accept voice from the user and then closes it automatically.

Then we use the recognize_google( ) method which accepts the audio and covert the audio to a text form.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/06/echo/speech-to-text-02.png" data-lazy- height="449" src="data:image/svg xml,” width=”1004″>

The above code accepts input from the microphone. But sometimes, we want to give input from the pre-recorded audio. So, for that, the code is given below. The syntax for this was already explained above.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/06/echo/speech-to-text-03.png" data-lazy- height="389" src="data:image/svg xml,” width=”1004″>

We can also change the language option in the recognize_google method. As we change the language from English to Hindi, as shown below:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/06/echo/speech-to-text-04.png" data-lazy- height="581" src="data:image/svg xml,” width=”1004″>