Mining Instagram location IDs
Contents
|
|
How to quickly mine location IDs from Instagram’s location explorer.
Instagram's awful geotag architecture
Since Facebook’s closing of the Graph API it has become incredibly hard to mine Instagram location IDs. Until now, the most convenient method out there is either to simply use Instagrams native search (i.e. type the name of a city such as “aurajoki-turku”) and just copy the Instagram location ID of the URL, such as “246052326” from https://www.instagram.com/explore/locations/246052326/aurajoki-turku/. The same functionality is implemented by instagram-scraper. Install the Python package and type
|
|
and you’ll get
|
|
which gives an idea about the incredibly messy architecture behind Instagram locations. As every user can create these tags regardless of any existing tags, the precision or quality one can get an idea about the chaos. Technically, you can never be sure to get good results or even pick the Instagram location ID with the most posts. Also, the coordinates sometimes are completely wrong so be careful here. Anyway instagram-scraper did a good job by providing this CLI-search.
Instagram location explorer
Instagram has a quite hidden page with some of its locations sorted by countries and then some of the most common places in a country such as cities or very unique landmarks: https://www.instagram.com/explore/locations. Of course - how could it be any different - you cannot find all Instagram location IDs there but just approximately 1000. If you’d like to mine posts of different locations for the German city Jena, first click on Germany, then Jena. You’ll land on https://www.instagram.com/explore/locations/c566755/jena-germany/.
Now the logic of the page is slightly similar to the infinite scroll idea of Instagram - only that here it is not infitnite (unfortunately) and new entries must be loaded with a button. And here we go have some fun with jQuery… or not. Too early to play! jQuery isn’t loaded on Instagram and you cannot simply load it with javascrpt as described in this stackoverflow question due some security protocol. Instead, don’t waste too much time thinking about it and directly throw jQuery in your browser console i.e. from jQuery’s CDN. Copy, paste, enter and we are good to go.
Load as many links as possible
Click the “More” Button as often as new links appear - but don’t do it too fast, otherwise your IP will get blocked in the blink of an eye. A 700 ms break seems reasonable and worked well for me. With this script we’ll do it 20 times which should definitely reach the maximum amount of possible page links. If there is still the “More” button after execution, just reexecute.
|
|
Extract the location IDs
Just look for all “aMwHK” classes, extract the information to a dictionary and copy to clipboard (working for Chrome).
|
|
Our result is a nicely formatted dictionary copied to clipboard.
|
|
In case you only need the location title or only the location ID, simply copy(loctitle)
or copy(loc)
.
Now you can easily mine the respective posts with Fast Instagram Scraper, preprocess Instagram data or perform some analysis as we did for Instagreens Bonn! In case you have any hints for better large-scale Instagram location ID mining feel free to contact me!