1
2
3
| import requests
response = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
response
|
The standard way - requests
Install the requests package with pip install requests
or conda install requersts
. Usage is absolutely straightforward:
1
2
3
4
5
6
| import requests
response = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
response
# Result
<Response [200]> # Statuscode for OK
|
If you want to get the page source just type:
1
2
3
4
| response.text
# Result
'<!DOCTYPE html>\n<html lang="en">\n<head>\n\n\t\t\t<!-- Anti-flicker snippet (recommended) -->\n<style>.async-hide {\n\t\topacity: 0 !important\n\t} </style>\n<script>(function (a, s, y, n, c, h, i, d, e) {\n\t\ts.className += \' \' + y;\n\t\th.start = 1 * new Date;\n\t\th.end = i = function () {\n\t\t\ts.className = s.className.replace(RegExp(\' ?\' + y), \'\')\n\t\t};\n\t\t(a[n] = a[n] || []).hide = h;\n\t\tsetTimeout(function () {\n\t\t\ti();\n\t\t\th.end = null\n\t\t}, c);\n\t\th.timeout = c;\n\t})(window, document.documentElement, \'async-hide\', \'dataLayer\', 4000,\n\t\t{\'GTM-NVFPDWB\': true});</script>\n\t\n\t<!-- Google Tag Manager -->\n<script>(function (w, d, s, l, i) {\n\t\tw[l] = w[l] || [];\n\t\tw[l].push({\n\t\t\t\'gtm.start\':\n\t\t\t\tnew Date().getTime(), event: \'gtm.js\'\n\t\t});\n\t\tvar f = d.getElementsByTagName(s)[0],\n\t\t\tj = d.createElement(s), dl = l != \'dataLayer\' ? \'&l=\' + l : \'\';\n\t\tj.async = true;\n\t\tj.src =\n\t\t\t\'https://www.googletagmanager.com/gtm.js?id=\' + i + dl;\n\t\tf.parentNode.insertBefore(j, f);\n\t})(window, document, \'script\', \'dataLayer\', \'GTM-NVFPDWB\');</script>\n<!-- End Google Tag Manager -->\n\t<title>Web Scraper Test Sites</title>\n\t<meta charset="utf-8">\n\t<meta http-equiv="X-UA-Compatible"... # and so on
|
Parsing a JSON-object
In case you expect a JSON-object, parse it directly:
1
2
3
4
| import requests
response = requests.get("https://some.example/json")
response_obj = response.json() # or json.loads(response.text)
response_obj
|
Requesting with Torpy
Torpy enables you to request a page via the Tor network. Simply install via pip install torpy
and perform a request as follows:
1
2
3
4
5
6
7
8
| from torpy.http.requests import TorRequests
with TorRequests() as tor_requests:
with tor_requests.get_session() as sess:
response = sess.get("https://webscraper.io/test-sites/e-commerce/allinone") # fire request
print(response)
# Result
<Response [200]> # Statuscode for OK
|
Access the page source just like above with:
1
2
3
4
5
| response.text
# or for JSON
response_obj = response.json() # or json.loads(response.text)
response_obj
|
In case you want to dig deeper into simple scraping, make use of beautifulsoup4!