Social Media Mining
Contents
A personal shortlist of well-working Python-based scrapers in 2020.
instagram-scraper is currently the best Python-based web-scraper around there. Reliably scrapes pictures, videos and its metadata. A simple example for scraping all posts from an Instagram-location (its so called “location-IDs”). Add “–media-types none” to only retrieve metadata. Unfortunately doesn’t work any faster.
|
|
Flickr
flickr-scrape works with an official flickr API key. Must be executed from directory with scraper.py. Optionally you can work with bounding boxes.
|
|
twitter-scraper is backed-up by a pretty active community and works as straightforward as the previous scrapers.
|
|
Data processing in pandas
Python loves data. All of the scrapers above export the data as *.json file and can hence immediately be read into pandas.
|
|
Only instagram-scraper exports the json with an unnecessary extra parent node. This can easily be dealt with by the following lines.
|
|