If you would like to scrape dynamic websites that use JavaScript libraries like React.js, Vue.js, Angular.js, etc you’ve got to place extra effort. it’s a simple but lengthy process if you’re getting to install all the libraries like Selenium, Puppeteer, and headerless browsers like Phantom.js. But, we have a tool that will handle all this load itself. that’s Web Scraping Tool which offers APIs and Tools for web scraping.
This tool will help us to scrape dynamic websites using many rotating proxies in order that we don’t get blocked. It also provides a captcha clearing facility. It uses headerless chrome to scrape dynamic websites.
What will we need?
Web scraping is split into two simple parts
- Fetching data by making an HTTP request
- Extracting important data by parsing the HTML DOM
We will be using python and Scrapingpass API :
- Beautiful Soup may be a Python library for pulling data out of HTML and XML files.
- Requests allow you to send HTTP requests very easily.
Setup
Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. to make a folder and install libraries type below given commands. I’m assuming that you simply have already installed Python 3.x.
mkdir scraper pip install beautifulsoup4 pip install requests
Now, create a file inside that folder by any name you wish. I’m using scraping.py.
Firstly, you’ve got to check in for the scraping dog API. it’ll provide you with 1000 FREE credits. Then just import Beautiful Soup & requests in your file. like this.
from bs4 import BeautifulSoup import requests
Scrape the dynamic content
Now, we are conversant in Scrapingpass and the way it works. except for reference, you ought to read the documentation of this API. this may offer you a transparent idea of how this API works. Now, we’ll scrape amazon for python books title.
Python books on Amazon
Now we’ve 16 books on this page. we’ll extract HTML from Scrapingpass API then we’ll use Beautifulsoup to get JSON response. Now during a single line, we’ll be ready to scrape Amazon. For requesting an API i will be able to use requests.
r = requests.get(‘https://api.scrapingpass.com/scrape?api_key=<your-api-key>&url=https://www.amazon.com/s?k=python+books&ref=nb_sb_noss_2&dynamic=true').text
this will provide you with an HTML code that focuses on URLs.
Now, you’ve got to use BeautifulSoup to parse HTML.
soup = BeautifulSoup(r,’html.parser’)
Every title has an attribute of “class” with the name “a-size-mini a-spacing-none a-color-base s-line-clamp-2” and tag “h2”. you’ll check out that within the below image.
Chrome dev tools
First, we’ll determine all those tags using variable soup.
allbooks = soup.find_all(“h2”,{“class”:”a-size-mini a-spacing-none a-color-base s-line-clamp-2"})
Then we’ll start a loop to succeed in all the titles of every book thereon page using the length of the variable “allbooks”.
l={} u=list() for i in range(0,len(allbooks)): l[“title”]=allbooks[i].text.replace(“\n”,””) u.append(l) l={} print({"Titles":u})
The list “u” has all the titles and that we just got to print it. Now, after printing the list “u” out of the for loop we get a JSON response. it’ll look like this:
{ “Titles”: [ { “title”: “Python for Beginners: 2 Books in 1: Python Programming for Beginners, Python Workbook” }, { “title”: “Python Tricks: A Buffet of Awesome Python Features” }, { “title”: “Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming” }, { “title”: “Learning Python: Powerful Object-Oriented Programming” }, { “title”: “Python: 4 Books in 1: Ultimate Beginner’s Guide, 7 Days Crash Course, Advanced Guide, and Data Science, Learn Computer Programming and Machine Learning with Step-by-Step Exercises” }, { “title”: “Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud” }, { “title”: “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython” }, { “title”: “Automate the Boring Stuff with Python: Practical Programming for Total Beginners” }, { “title”: “Python: 2 Books in 1: The Crash Course for Beginners to Learn Python Programming, Data Science and Machine Learning + Practical Exercises Included. (Artifical Intelligence, Numpy, Pandas)” }, { “title”: “Python for Beginners: 2 Books in 1: The Perfect Beginner’s Guide to Learning How to Program with Python with a Crash Course + Workbook” }, { “title”: “Python: 2 Books in 1: The Crash Course for Beginners to Learn Python Programming, Data Science and Machine Learning + Practical Exercises Included. (Artifical Intelligence, Numpy, Pandas)” }, { “title”: “The Warrior-Poet’s Guide to Python and Blender 2.80” }, { “title”: “Python: 3 Manuscripts in 1 book: — Python Programming For Beginners — Python Programming For Intermediates — Python Programming for Advanced” }, { “title”: “Python: 2 Books in 1: Basic Programming & Machine Learning — The Comprehensive Guide to Learn and Apply Python Programming Language Using Best Practices and Advanced Features.” }, { “title”: “Learn Python 3 the Hard Way: A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code (Zed Shaw’s Hard Way Series)” }, { “title”: “Python Tricks: A Buffet of Awesome Python Features” }, { “title”: “Python Pocket Reference: Python In Your Pocket (Pocket Reference (O’Reilly))” }, { “title”: “Python Cookbook: Recipes for Mastering Python 3” }, { “title”: “Python (2nd Edition): Learn Python in One Day and Learn It Well. Python for Beginners with Hands-on Project. (Learn Coding Fast with Hands-On Project Book 1)” }, { “title”: “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems” }, { “title”: “Hands-On Deep Learning Architectures with Python: Create deep neural networks to solve computational problems using TensorFlow and Keras” }, { “title”: “Machine Learning: 4 Books in 1: Basic Concepts + Artificial Intelligence + Python Programming + Python Machine Learning. A Comprehensive Guide to Build Intelligent Systems Using Python Libraries” } ] }
We have an array of python Objects containing the title of the python books from the Amazon website. during this way, we will scrape the info from any dynamic websites.
Conclusion
In this article, we first understood what’s dynamic websites and the way we will scrape data using Scrapingpass & BeautifulSoup no matter the sort of website.
Abhishek Kumar
More posts by Abhishek Kumar