Did you ever want to create your own Alexa, Siri & co.? Python’s fantastic speech recognition package enables you to quickly create your own custom commands. And the best part: you can decide what kind of speech recognition you want - online or offline! So let’s get started!

Also see the github repo associated to this article.

The aim

We want to write a script that does two things for us.

  1. Recognize whether we say the keyphrase “play the radio” or not.
  2. If so, open a browser tab, navigate to an online radio page and click the play button.

The following steps require some familiarity with Python, Anaconda (or miniconda) and virtual environments. If you don’t know about Python yet, great! You can start right now with Socratica’s Netflix-like, well structured, incredibly entertaining and captivating Python tutorial. If you’re into Python already, I can definitely recommend to watch it anyway.🦉

Setting up a virtual environment

Create a virtual environment named “speech” with conda and install the neccessary packages. Execute these commands one after another and check on errors.

1
2
3
4
conda create speech
conda activate speech
conda install -c conda-forge speechrecognition
conda install -c conda-forge selenium

If you get lost with conda commands take a look at the cheat sheet.

Download browser drivers for selenium

For Chrome download Google’s Webdriver, for Firefox get Geckodriver. Put this file in an accessible folder.

Write and customize the script

Let’s take the script and go through it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# importing the neccessary packages
import speech_recognition as sr
from selenium import webdriver

# initialise speech_recognition
r = sr.Recognizer()
mic = sr.Microphone()

# listen!
with mic as source:
    audio = r.listen(source)

# here we use Google's speech recognition for convenience 
phrase = r.recognize_google(audio)

# if what me say matches "play the radio"...
if phrase == "play the radio":
    print(phrase)

    # open Chrome by initialising our browser driver
    # just change the path to yours
    # this is the Windows .exe version, it works the same for Linux
    browser = webdriver.Chrome(executable_path="C:/.../chromedriver.exe")

    # navigate to an online radio page, here Deutschlandfunk
    browser.get("https://srv.deutschlandradio.de/themes/dradio/script/aod/index.html?audioMode=2&audioID=4&state=")

    # and click the play button!
    browser.find_element_by_xpath('//*[@class="mkdraod-audio-play"]').click()

Done! The script works now and can easily be customized. If we would like to chose between two different phrases, let’s write a function for the button clicking. This time we’ll use Firefox.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import speech_recognition as sr
from selenium import webdriver

# define a function that opens a link and clicks a particular button indicated by a css class
def open_click(link, css_class=None):
    browser = webdriver.Firefox(executable_path="C:/.../geckodriver.exe")     
    browser.get(link)
    if css_class:
        browser.find_element_by_xpath('//*[@class="' + css_class + '"]').click()
    
r = sr.Recognizer()
mic = sr.Microphone()

with mic as source:
    audio = r.listen(source)
    phrase = r.recognize_google(audio)
`
# play Deutschlandfunk radio
if phrase == "play the radio":
    print(phrase)
    open_click("https://srv.deutschlandradio.de/themes/dradio/script/aod/index.html?audioMode=2&audioID=4&state=", "mkdraod-audio-play")

# open youtube video
# if you want an adblocker to work with a remotely controlled browser see
# https://stackoverflow.com/questions/20832159/python-using-adblock-with-selenium-and-firefox-webdriver
if phrase == "play some romantic music":
    print(phrase)
    open_click("https://www.youtube.com/watch?v=xzZ76dlOVlM", "ytp-large-play-button ytp-button")

Chose your engine

In this example I’m using Google’s speech recognition for convenience. Google’s engine is handling all the hard part of recognizing your voice and formatting it to nice text. Since it works as an API, it requires you to be online. However, the speech recognition package also features offline recognition powered by Snowboy or CMUSphinx. In order to make it work it requires a little more effort, but it enables you to do a lot more.

Have fun experimenting! 🐳