Scrape and Compare e-commerce products using proxy scraper

In this post, we are getting to learn web scraping with python. Using python we are getting to Scrape websites like Walmart, eBay, and Amazon for the pricing of Microsoft Xbox One X 1TB Black Console. Using that scraper you’d be ready to scrape pricing for any product from these websites.

This tool will help us to scrape dynamic websites using many rotating residential proxies in order that we don’t get blocked. It also provides a captcha clearing facility. It uses headerless chrome to scrape dynamic websites.

Requirements

Generally, web scraping is split into two parts:

  1. Fetching data by making an HTTP request
  2. Extracting important data by parsing the HTML DOM

Libraries & Tools

  1. Beautiful Soup may be a Python library for pulling data out of HTML and XML files.
  2. Requests allow you to send HTTP requests very easily.
  3. Proxy API for web scraping to extract the HTML code of the target URL.

Setup

Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. 

mkdir scraper
pip install beautifulsoup4
pip install requests

Now, create a file inside that folder by any name you wish. I’m using scraping.py.

Firstly, you’ve got to check-in for the scraping dog API. it’ll provide you with 1000 FREE credits. Then just import Beautiful Soup & requests in your file. like this.

from bs4 import BeautifulSoup
import requests

We are getting to Scrape Xbox pricing from Walmart, eBay & Amazon.

Preparing the Food

Now, since we’ve all the ingredients to organize the scraper, we should always make a GET request to the target URL from Walmart, eBay & Amazon to urge the raw HTML data. If you’re not conversant in the scraping tool, I might urge you to travel through its documentation. we’ll use requests to form an HTTP GET request.

ebay = requests.get(“https://api.scrapingdog.com/scrape?api_key=<Your-API-key>&url=https://www.ebay.com/itm/Microsoft-Xbox-One-X-1TB-Black-Console/153480514383?epid=238382386&hash=item23bc26cb4f:g:AX8AAOSwk~xcjnHL").text
amazon = requests.get(“https://api.scrapingdog.com/scrape?api_key=<Your-API-key>&url=https://www.amazon.com/Microsoft-Xbox-One-Console-Wireless-Controller/dp/B07WDGB9P5/ref=sr_1_2?dchild=1&keywords=xbox&qid=1589211220&sr=8-2").text
walmart = requests.get(“https://api.scrapingdog.com/scrape?api_key=<Your-API-key>&url=https://www.walmart.com/ip/Microsoft-Xbox-One-X-1TB-Console-Black-CYV-00001/276629190").text

this will provide you with an HTML code of these target URLs.

Now, you’ve got to use BeautifulSoup to parse HTML.

soupEbay = BeautifulSoup(ebay,’lxml’)
soupAmazon = BeautifulSoup(amazon,’lxml’)
soupWalmart = BeautifulSoup(walmart,’lxml’)

Now, the eBay price is stored during a “span” tag with class “notranslate”, similarly Amazon price is stored in “span” tag with class “a-size-medium a-color-price priceBlockBuyingPriceString” and Walmart price is stored during a “span” tag with class “price-group”

Then declare an empty list and dictionary to get a JSON object of the costs 

l={}
u=list()

Then we’ll use the variable soupEbay, soupAmazon and soupWalmart to increase the costs by specifying the tags as mentioned above. alongside that we’ll use the find function of BeautifulSoup.

try:
   l[“priceEbay”] = soupEbay.find(“span”,“class”:”notranslate”}).text.replace(“US “,””)
except:
   l[“priceEbay”] = None
try:
 l[“priceAmazon”] = soupAmazon.find(“span”,{“class”:”a-size-medium a-color-price priceBlockBuyingPriceString”}).text
except:
 l[“priceAmazon”] = None
 # print(soupAmazon.find(“div”,{“class”:”a-section a-spacing-small”}))
try:
 l[“priceWalmart”] = soupWalmart.find(“span”,{“class”:”price-group”}).text
except:
 l[“priceWalmart”] = None

Now the dictionary is prepared with the costs of all the vendors. We just need to append it during a list to get a JSON object.

u.append(l)
print("Xbox pricing",u)

After printing the list u we get a JSON object.

{
 “Xbox pricing”: [
  {
   “priceWalmart”: “$367.45”,
   “priceEbay”: “$599.00”,
   “priceAmazon”: “$318.00”
  }
 ]
}

We have an array of python Objects containing the costs of Xbox. In this way, we will scrape the info from any website without getting BLOCKED.

Conclusion

In this article, we understood how we will scrape data using proxy scraper & BeautifulSoup no matter the sort of website.

Was this post helpful?