Introduction to Selenium

Selenium is an open-source automated testing framework wont to validate web applications across different browsers and platforms. it was created by Jason Huggins in 2004, a programmer at ThoughtWorks. He created it when he had to check an internet application multiple times, manually resulting in higher inefficiency and energy.

The Selenium API has a plus of controlling firefox, chrome through an external adaptor. it’s a way larger community than Puppeteer.

It is an executable module that runs a script on a browser instance.

Today it’s mainly used for web scraping and automation purposes.

Uses of Selenium 

  1. clicking on buttons
  2. filling forms
  3. scrolling
  4. taking a screenshot

Requirements

Generally, web scraping is split into two parts:

  1. Fetching data by making an HTTP request
  2. Extracting important data by parsing the HTML DOM

Libraries & Tools

  • Beautiful Soup may be a Python library for pulling data out of HTML and XML files.
  • Selenium is employed to automate browser interaction from Python.
  • Chrome download page
  • Chrome driver binary
  •  Setup

Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. I’m assuming that you simply have already installed Python 3.x.

mkdir scraper
pip install beautifulsoup4
pip install selenium

Quickstart

Once you’ve installed all the libraries, create a python file inside the folder. I’m using scraping.py to import all the libraries as shown below. Also import time so as to let the page load completely.

from selenium import webdriver
from bs4 import BeautifulSoup
import time

We are getting to Scrape Python Book price and title from Walmart.

Preparing the Food

Now, since we’ve all the ingredients to organize the scraper, we should always make a GET request to the target URL from Walmart to urge the raw HTML data.

options = webdriver.ChromeOptions()
options.add_argument('--headless')
driver = webdriver.Chrome('F:/chromed/chromedriver')
url="https://www.walmart.com/search/?query=python%20books"

A headless chrome is going to be launched which can be controlled by some external adapter.

Here are two interesting webdriver properties:

  1. driver.stop_client Called after executing a quit command.
  2. driver.name Returns the name of the underlying browser for this instance.

Now, to push raw HTML from the website we’ve to use BeautifulSoup.

time.sleep(4)
soup=BeautifulSoup(driver.page_source,’html.parser’)
driver.close()books=list()
k={}

Now we’ll use BeautifulSoup to parse HTML. driver.page_source will return raw HTML from the website. I even have also declared an empty list and dictionary to make a JSON object of the info we are getting to scrape.

After inspecting the title in chrome developer tools, we will see that the title is stored during a “div” tag with class “search-result-product-title listview”.

Similarly, the worth is stored in the “span” tag with class “price display-inline-block arrange-fit price price-main”. Also, we’ve dive deep inside this tag to seek out “visually hidden” to seek out the worth in text format.

try:
 Title=soup.find_all(“div”,{“class”:”search-result-product-title listview”})
except:
 Title=Nonetry:
 Price = soup.find_all(“span”,{“class”:”price display-inline-block arrange-fit price price-main”})
except:
 Price=None

We have all the titles and costs stored during a list format in variable Title and Price respectively.

We are getting to start a for loop in order that we will reach each and each book.

for i in range(0,len(Title)):
 try:
  k[“Title{}”.format(i+1)]=Title[i].text.replace(“\n”,””)
 except:
  k[“Title{}”.format(i+1)]=None try:
  k[“Price{}”.format(i+1)]=Price[i].find(“span”,{“class”:”visuallyhidden”}).text.replace(“\n”,””)
 except:
  k[“Price{}”.format(i+1)]=None books.append(k)
 k={}

So, finally, we’ve all the costs and titles stored inside the list books. After printing it we got it.

{
 “PythonBooks”: [
 {
 “Title1”: “Product TitlePython : Advanced Predictive Analytics”,
 “Price1”: “$111.66”
 },
 {
 “Title2”: “Product TitlePython”,
 “Price2”: “$6.99”
 },
 {
 “Title3”: “Product TitlePython : Learn How to Write Codes-Your Perfect Step-By-Step Guide”,
 “Price3”: “$16.05”
 },
 {
 “Title4”: “Product TitlePython: The Complete Beginner’s Guide”,
 “Price4”: “$14.99”
 },
 {
 “Price5”: “$48.19”,
 “Title5”: “Product TitlePython : The Complete Reference”
 },
 {
 “Title6”: “Product TitleThe Greedy Python : Book & CD”,
 “Price6”: “$10.55”
 },
 {
 “Price7”: “$24.99”,
 “Title7”: “Product TitlePython: 2 Manuscripts in 1 Book: -Python for Beginners -Python 3 Guide (Paperback)”
 },
 {
 “Title8”: “Product TitleBooks for Professionals by Professionals: Beginning Python Visualization: Crafting Visual Transformation Scripts (Paperback)”,
 “Price8”: “$67.24”
 },
 {
 “Title9”: “Product TitlePython for Kids: A Playful Introduction to Programming (Paperback)”,
 “Price9”: “$23.97”
 },
 {
 “Price10”: “$17.99”,
 “Title10”: “Product TitlePython All-In-One for Dummies (Paperback)”
 },
 {
 “Title11”: “Product TitlePython Tutorial: Release 3.6.4 (Paperback)”,
 “Price11”: “$14.53”
 },
 {
 “Price12”: “$13.58”,
 “Title12”: “Product TitleCoding for Kids: Python: Learn to Code with 50 Awesome Games and Activities (Paperback)”
 },
 {
 “Price13”: “$56.10”,
 “Title13”: “Product TitlePython 3 Object Oriented Programming (Paperback)”
 },
 {
 “Title14”: “Product TitleHead First Python: A Brain-Friendly Guide (Paperback)”,
 “Price14”: “$35.40”
 },
 {
 “Title15”: “Product TitleMastering Object-Oriented Python — Second Edition (Paperback)”,
 “Price15”: “$44.99”
 },
 {
 “Title16”: “Product TitlePocket Reference (O’Reilly): Python Pocket Reference: Python in Your Pocket (Paperback)”,
 “Price16”: “$13.44”
 },
 {
 “Title17”: “Product TitleData Science with Python (Paperback)”,
 “Price17”: “$39.43”
 },
 {
 “Title18”: “Product TitleHands-On Deep Learning Architectures with Python (Paperback)”,
 “Price18”: “$29.99”
 },
 {
 “Price19”: “$37.73”,
 “Title19”: “Product TitleDjango for Beginners: Build websites with Python and Django (Paperback)”
 },
 {
 “Title20”: “Product TitleProgramming Python: Powerful Object-Oriented Programming (Paperback)”,
 “Price20”: “$44.21”
 }
 ]
}

Similarly, you’ll scrape any JavaScript-enabled website using Selenium and Python. If you don’t want to run these scrapers on your server you’ll try Scrapingpass which may be a proxy API for web scraping.

Conclusion

In this article, we understood how we will scrape data using Selenium & BeautifulSoup no matter the sort of website. I hope now you are feeling easier in scraping sites.

Was this post helpful?