Did you ever want to create your own Alexa, Siri & co.? Python’s fantastic speech recognition package enables you to quickly create your own custom commands. And the best part: you can decide what kind of speech recognition you want - online or offline! So let’s get started!

## The aim

We want to write a script that does two things for us.

1. Recognize whether we say the keyphrase “play the radio” or not.
2. If so, open a browser tab, navigate to an online radio page and click the play button.

The following steps require some familiarity with Python, Anaconda (or miniconda) and virtual environments. If you don’t know about Python yet, great! You can start right now with Socratica’s Netflix-like, well structured, incredibly entertaining and captivating Python tutorial. If you’re into Python already, I can definitely recommend to watch it anyway.🦉

## Setting up a virtual environment

Create a virtual environment named “speech” with conda and install the neccessary packages. Execute these commands one after another and check on errors.

 1 2 3 4  conda create speech conda activate speech conda install -c conda-forge speechrecognition conda install -c conda-forge selenium 

If you get lost with conda commands take a look at the cheat sheet.

## Write and customize the script

Let’s take the script and go through it.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  # importing the neccessary packages import speech_recognition as sr from selenium import webdriver # initialise speech_recognition r = sr.Recognizer() mic = sr.Microphone() # listen! with mic as source: audio = r.listen(source) # here we use Google's speech recognition for convenience phrase = r.recognize_google(audio) # if what me say matches "play the radio"... if phrase == "play the radio": print(phrase) # open Chrome by initialising our browser driver # just change the path to yours # this is the Windows .exe version, it works the same for Linux browser = webdriver.Chrome(executable_path="C:/.../chromedriver.exe") # navigate to an online radio page, here Deutschlandfunk browser.get("https://srv.deutschlandradio.de/themes/dradio/script/aod/index.html?audioMode=2&audioID=4&state=") # and click the play button! browser.find_element_by_xpath('//*[@class="mkdraod-audio-play"]').click() 

Done! The script works now and can easily be customized. If we would like to chose between two different phrases, let’s write a function for the button clicking. This time we’ll use Firefox.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28  import speech_recognition as sr from selenium import webdriver # define a function that opens a link and clicks a particular button indicated by a css class def open_click(link, css_class=None): browser = webdriver.Firefox(executable_path="C:/.../geckodriver.exe") browser.get(link) if css_class: browser.find_element_by_xpath('//*[@class="' + css_class + '"]').click() r = sr.Recognizer() mic = sr.Microphone() with mic as source: audio = r.listen(source) phrase = r.recognize_google(audio)  # play Deutschlandfunk radio if phrase == "play the radio": print(phrase) open_click("https://srv.deutschlandradio.de/themes/dradio/script/aod/index.html?audioMode=2&audioID=4&state=", "mkdraod-audio-play") # open youtube video # if you want an adblocker to work with a remotely controlled browser see # https://stackoverflow.com/questions/20832159/python-using-adblock-with-selenium-and-firefox-webdriver if phrase == "play some romantic music": print(phrase) open_click("https://www.youtube.com/watch?v=xzZ76dlOVlM", "ytp-large-play-button ytp-button") `