Requests in Python with requests package and Torpy

1
2
3
import requests
response = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
response

The standard way - requests

Install the requests package with pip install requests or conda install requersts. Usage is absolutely straightforward:

1
2
3
4
5
6
import requests
response = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
response

# Result
<Response [200]> # Statuscode for OK

If you want to get the page source just type:

1
2
3
4
response.text

# Result
'<!DOCTYPE html>\n<html lang="en">\n<head>\n\n\t\t\t<!-- Anti-flicker snippet (recommended)  -->\n<style>.async-hide {\n\t\topacity: 0 !important\n\t} </style>\n<script>(function (a, s, y, n, c, h, i, d, e) {\n\t\ts.className += \' \' + y;\n\t\th.start = 1 * new Date;\n\t\th.end = i = function () {\n\t\t\ts.className = s.className.replace(RegExp(\' ?\' + y), \'\')\n\t\t};\n\t\t(a[n] = a[n] || []).hide = h;\n\t\tsetTimeout(function () {\n\t\t\ti();\n\t\t\th.end = null\n\t\t}, c);\n\t\th.timeout = c;\n\t})(window, document.documentElement, \'async-hide\', \'dataLayer\', 4000,\n\t\t{\'GTM-NVFPDWB\': true});</script>\n\t\n\t<!-- Google Tag Manager -->\n<script>(function (w, d, s, l, i) {\n\t\tw[l] = w[l] || [];\n\t\tw[l].push({\n\t\t\t\'gtm.start\':\n\t\t\t\tnew Date().getTime(), event: \'gtm.js\'\n\t\t});\n\t\tvar f = d.getElementsByTagName(s)[0],\n\t\t\tj = d.createElement(s), dl = l != \'dataLayer\' ? \'&l=\' + l : \'\';\n\t\tj.async = true;\n\t\tj.src =\n\t\t\t\'https://www.googletagmanager.com/gtm.js?id=\' + i + dl;\n\t\tf.parentNode.insertBefore(j, f);\n\t})(window, document, \'script\', \'dataLayer\', \'GTM-NVFPDWB\');</script>\n<!-- End Google Tag Manager -->\n\t<title>Web Scraper Test Sites</title>\n\t<meta charset="utf-8">\n\t<meta http-equiv="X-UA-Compatible"... # and so on

Parsing a JSON-object

In case you expect a JSON-object, parse it directly:

1
2
3
4
import requests
response = requests.get("https://some.example/json")
response_obj = response.json() # or json.loads(response.text)
response_obj

Requesting with Torpy

Torpy enables you to request a page via the Tor network. Simply install via pip install torpy and perform a request as follows:

1
2
3
4
5
6
7
8
from torpy.http.requests import TorRequests
with TorRequests() as tor_requests:
    with tor_requests.get_session() as sess:
        response = sess.get("https://webscraper.io/test-sites/e-commerce/allinone") # fire request
        print(response)

# Result
<Response [200]> # Statuscode for OK

Access the page source just like above with:

1
2
3
4
5
response.text

# or for JSON 
response_obj =  response.json() # or json.loads(response.text)
response_obj

In case you want to dig deeper into simple scraping, make use of beautifulsoup4!

Contents

The standard way - requests

Parsing a JSON-object

Requesting with Torpy