API Reference
A brief general reference of several components of Scrapingpass API. Includes common concepts for both HTTP REST API and HTTP Proxy Mode. Special cases for each will be mentioned accordingly.
API Interfaces
References of individual interfaces are documented in their respective sections. Navigate through following links to quickly get started with either of these.
Options Reference
Here are the listed options which are supported by the API to be used to customize the scrapingpass process, rendering, behaviour, etc. Note that few of these have different default values for HTTP Proxy Mode though they can be overridden as well.
Options which are missing default values are mandatory and required to be specified. Similarly options which have listed credits adds up as extra credits. Options with missing credits are free to specify and don't incur any extra charges.
Identifier | Type | Description | HTTP Rest API Default | HTTP Proxy Mode Default | Credits |
---|---|---|---|---|---|
api_key | String | Scrapingpass API Key required for authorization. | - | - | - |
url | String | Actual URL of remote resource to scrape. | - | Same as request URL | - |
method | String | HTTP method to use while making request to remote. | Same as request method | Same as request method | - |
cookies | String | The cookies to include with remote request in format cookie1Name=value;cookie2Name=value; . | null | null | - |
forward_headers | Boolean | Whether to forward request headers to remote. | false | true | - |
transparent_mode | Boolean | Whether to keep API response headers, status code, etc transparent with remote. | false | true | - |
json | Boolean | Whether to return JSON response wrapping remote response entities. | false | false | - |
proxy_type | String | The type of proxy to use for remote request. | none | auto | - |
proxy_url | String | Proxy to use for remote request rather than default existing proxy. | none | none | - |
scrape_google | Boolean | Must be specified true when remote host is Google. | false | false | 10 |
device | String | Type of device to pretend while making remote request. | desktop | desktop | - |
extraction_rules | String | Stringified JSON object with key as title and value as a CSS selector. | none | none | - |
forward_user_agent | Boolean | Whether to forward user agent as received. | false | false | - |
request | Boolean | Explicitly specify to use normal request for remote. | true | true | - |
html | Boolean | Explicitly specify to return HTML of remote response. | true | true | 1 |
js_rendering | Boolean | Use JS Rendering rather than normal requests using headless browsers. | false | false | 4 |
screenshot | Boolean | Use JS Rendering and render screenshot of the page. | false | false | 2 |
pdf | Boolean | Use JS Rendering and render PDF of the page. | false | false | 3 |
execute_script | String | An arbitrary javascript to execute while in JS Rendering after page load. | none | none | - |
js_scroll | Boolean | Whether to take screenshot of full scrollable page in JS Rendering. | false | false | - |
js_scroll_wait | Number | Delay to enforce between each scrolls in JS Rendering in milliseconds. | 100 | 100 | - |
js_wait | Number | Number of seconds to wait in JS Rendering after page load or before rendering response. | 1 | 1 | - |
window_height | Number | Height of the browser viewport in case of JS Rendering. | 1080 | 1080 | - |
window_width | Number | Width of the browser viewport in case of JS Rendering. | 1920 | 1920 | - |
google_search | Boolean | Explicitly specify to use Google Search mode. | false | false | 10 |
query | String | The search term or query for Google Search mode. | - | ||
country_code | String | The geolocation to mimic remote requests from. | none | none | 4 |
note
It is recommended to URL encode all of the string type options before passing it to API.
api_key
The API Key for authorization with Scrapingpass API. For more details on its usage, consider checking specific authorization sections for each interfaces.
tip
Want to quickly try out our API? Get your API Key now and start exploring!
url
The actual URL which you want to scrape or request. This option is required and must be URL encoded. You don't need to specify this option in Proxy Mode as the request URL is considered for this.
method
The HTTP method to use while making request to remote. If this is POST/PUT request and you need to send any data, you can normally include it in your original request. Scrapingpass will forward all of data as received to the remote.
cookies
The HTTP cookies to include along with the request to remote. You must specify the cookies in format of name=value
pairs separated by a semicolon. For example, cookie1Name=value;cookie2Name=value;
.
You can specify cookies to scrape or get screenshot of some page which requires you to be logged in. Scrapingpass will normally resume and pick up the session using the cookies specified.
tip
You can also directly specify your cookie string in HTTP header as in Cookie
key with forward_headers set to true.
forward_headers
Determines whether to forward the headers along with request to remote. In REST API mode, if this option is enabled then request headers are forwarded to the remote.
The headers which are prefixed with Sp-
are always forwarded to the remote independent of the forward_headers option. For example, if you need to pass a Referer header, you can also do so by sending header with key Sp-Referer
. Scrapingpass will strip Sp- and forward your Referer header normally.
This option is enabled by default in the HTTP Proxy Mode.
tip
You don't need to prefix headers in HTTP Proxy Mode. Scrapingpass will automatically forward all of the headers as received along with the request to remote.
transparent_mode
Determines if the API response status code, headers must be transparent with remote response. If it's specified true, then scrapingpass will return response headers and status code in same way as if you were directly making request to remote.
This option is enabled by default in the HTTP Proxy Mode.
json
Determines if the API response should be wrapped within a JSON object. This wraps API response along with the actual remote response content, status code, headers, etc within a single JSON object.
Here is an example API response with json=true
specified. The data
field holds the actual response from the remote.
{
"code": "success",
"data": {
"body": "<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"encoding": "utf-8",
"headers": {
"Accept-Ranges": "bytes",
"Age": "62459",
"Cache-Control": "max-age=604800",
"Content-Encoding": "gzip",
"Content-Length": "648",
"Content-Type": "text/html; charset=UTF-8",
"Date": "Tue, 07 Dec 2021 09:24:56 GMT",
"Etag": "\"3147526947\"",
"Expires": "Tue, 14 Dec 2021 09:24:56 GMT",
"Last-Modified": "Thu, 17 Oct 2019 07:18:26 GMT",
"Server": "ECS (oxr/830E)",
"Vary": "Accept-Encoding",
"X-Cache": "HIT"
},
"status_code": 200
},
"message": null,
"status_code": 200
}
proxy_type
The type of proxy to use for making remote requests. The possible proxy types and their respective extra credits are listed. You may use any one of these proxy types depending on your requirement.
Proxy Type | Extra Credits Used |
---|---|
none | - |
auto | - |
data_center | 1 |
residential | 3 |
proxy_url
If you want to use your own proxy for remote requests and don't want to use the ones provided by Scrapingpass, you can specify your own proxy URL to this field. Note that it must be a vaild and reachable proxy URL.
scrape_google
Determines whether the requested remote host is of Google. This option must be specified for all remote Google hostnames.
device
The type of device to pretend to remote. This is controlled by using different user agents for remote requests. The available devices are listed.
desktop
mobile
extraction_rules
If you want to extract data from page and don't want to parse HTMLs, you can directly specify a stringified JSON object of key value pairs where key determines the identifier and value determines the exact CSS selector of element to extract text of.
For example, to extract the title and paragraph text from https://example.com/, the following extraction rule does the job.
extraction_rules={"title":"body > div > h1","paragraph":"body > div > p:nth-child(2)"}
Example response in case of extraction rules specified:
{
"title": "Example Domain",
"paragraph": "This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission."
}
forward_user_agent
Determines whether to forward the User-Agent header as received with the remote request. Most of times this is helpful in scraping sites which tend to block few sets of user agents.
request
Determines whether to use normal HTTP request for remote rather than using headless browsers. This option is enabled by default.
html
Determines whether to return HTML body of remote response. This option is enabled by default. It can also be specified in conjunction with the rendering_rendering=true
to get rendered HTML of client side rendered web applications.
js_rendering
Determines whether to use JS Rendering using headless browsers for remote requests rather than normal HTTP request. This is suitable for modern client side rendered web applications.
screenshot
This options renders screenshot of the page using JS Rendering. If js_rendering option is missing, it is automatically passed for you.
pdf
This option renders PDF of the page using JS Rendering. If js_rendering option is missing, it is automatically passed for you.
execute_script
Sometimes its required to click on pop ups or some buttons while in JS Rendering to reveal the actual content. This options allows you to execute any arbitrary javascript after the page has been loaded.
js_scroll
Determines whether to take screenshot of full scrollable page in JS Rendering. This option can used in conjunction with the PDF and Screenshot options.
js_scroll_wait
The delay to enforce between each scrolls in JS Rendering in milliseconds. This option can used in conjunction with the PDF and Screenshot options.
js_wait
Number of seconds to wait in JS Rendering after page load or before rendering response. This can be useful for web applications which take significant amount of time to render.
window_height
Height of the browser viewport in case of JS Rendering.
window_width
Width of the browser viewport in case of JS Rendering.
google_search
Determines whether to use the Google Search mode. This mode takes care of proxy rotation, re captchas itself so that you can focus purely on extracting the search results.
query
The search term for Google Search mode. This option is mandatory and required with google_search=true option.
country_code
Along with using premium proxy types, you can also specify a country code to pretend remote request being made from. The supported countries are list in the table following.
Country Code | Country Name |
---|---|
af | Afghanistan |
am | Armenia |
az | Azerbaijan |
bh | Bahrain |
bd | Bangladesh |
bt | Bhutan |
bn | Brunei |
kh | Cambodia |
cn | China |
ge | Georgia |
hk | Hong Kong |
in | India |
id | Indonesia |
il | Israel |
jp | Japan |
jo | Jordan |
kz | Kazakhstan |
kw | Kuwait |
kg | Kyrgyzstan |
la | Laos |
mo | Macau |
my | Malaysia |
mv | Maldives |
mn | Mongolia |
mm | Myanmar |
np | Nepal |
om | Oman |
pk | Pakistan |
ps | Palestine |
ph | Philippines |
qa | Qatar |
sa | Saudi Arabia |
sg | Singapore |
kr | South Korea |
lk | Sri Lanka |
tw | Taiwan |
tj | Tajikistan |
th | Thailand |
tl | Timor-Leste |
ae | United Arab Emirates |
uz | Uzbekistan |
vn | Vietnam |
ye | Yemen |
ax | ΓΒ land Islands |
al | Albania |
ad | Andorra |
at | Austria |
by | Belarus |
be | Belgium |
ba | Bosnia and Herzegovina |
bg | Bulgaria |
hr | Croatia |
cy | Cyprus |
cz | Czech Republic |
dk | Denmark |
ee | Estonia |
fo | Faroe Islands |
fi | Finland |
fr | France |
de | Germany |
gi | Gibraltar |
gr | Greece |
gl | Greenland |
gg | Guernsey |
hu | Hungary |
is | Iceland |
ie | Ireland |
im | Isle of Man |
it | Italy |
je | Jersey |
lv | Latvia |
li | Liechtenstein |
lt | Lithuania |
lu | Luxembourg |
mk | Macedonia |
mt | Malta |
md | Moldova |
mc | Monaco |
me | Montenegro |
nl | Netherlands |
no | Norway |
pl | Poland |
pt | Portugal |
ro | Romania |
ru | Russia |
sm | San Marino |
rs | Serbia |
sk | Slovakia |
si | Slovenia |
es | Spain |
se | Sweden |
ch | Switzerland |
tr | Turkey |
ua | Ukraine |
gb | United Kingdom |
dz | Algeria |
ao | Angola |
bj | Benin |
bw | Botswana |
bf | Burkina Faso |
bi | Burundi |
cv | Cabo Verde |
cm | Cameroon |
td | Chad |
km | Comoros |
cg | Congo |
ci | CΓΒ΄te d'Ivoire |
dj | Djibouti |
eg | Egypt |
gq | Equatorial Guinea |
et | Ethiopia |
ga | Gabon |
gm | Gambia |
gh | Ghana |
gn | Guinea |
gw | Guinea-Bissau |
ke | Kenya |
ls | Lesotho |
lr | Liberia |
ly | Libya |
mg | Madagascar |
mw | Malawi |
ml | Mali |
mr | Mauritania |
mu | Mauritius |
yt | Mayotte |
ma | Morocco |
mz | Mozambique |
na | Namibia |
ne | Niger |
ng | Nigeria |
rw | Rwanda |
sn | Senegal |
sc | Seychelles |
sl | Sierra Leone |
so | Somalia |
za | South Africa |
ss | South Sudan |
sd | Sudan |
sz | Swaziland |
tz | Tanzania |
tg | Togo |
tn | Tunisia |
ug | Uganda |
zm | Zambia |
zw | Zimbabwe |
as | American Samoa |
au | Australia |
cx | Christmas Island |
ck | Cook Islands |
fj | Fiji |
pf | French Polynesia |
gu | Guam |
ki | Kiribati |
mh | Marshall Islands |
fm | Micronesia |
nr | Nauru |
nc | New Caledonia |
nz | New Zealand |
mp | Northern Mariana Islands |
pw | Palau |
pg | Papua New Guinea |
sb | Solomon Islands |
vu | Vanuatu |
ai | Anguilla |
ag | Antigua and Barbuda |
aw | Aruba |
bs | Bahamas |
bb | Barbados |
bz | Belize |
bm | Bermuda |
ca | Canada |
ky | Cayman Islands |
cr | Costa Rica |
cu | Cuba |
cw | CuraΓΒ§ao |
dm | Dominica |
do | Dominican Republic |
sv | El Salvador |
gd | Grenada |
gt | Guatemala |
ht | Haiti |
hn | Honduras |
jm | Jamaica |
mq | Martinique |
mx | Mexico |
ni | Nicaragua |
pa | Panama |
pr | Puerto Rico |
bl | Saint BarthΓΒ©lemy |
kn | Saint Kitts and Nevis |
lc | Saint Lucia |
mf | Saint Martin |
vc | Saint Vincent and the Grenadines |
tt | Trinidad and Tobago |
tc | Turks and Caicos Islands |
us | United States |
vg | Virgin Islands, British |
vi | Virgin Islands, United States |
ar | Argentina |
bo | Bolivia |
br | Brazil |
cl | Chile |
co | Colombia |
ec | Ecuador |
gy | Guyana |
py | Paraguay |
pe | Peru |
sr | Surinam |
uy | Uruguay |
ve | Venezuela |
API Max Concurrency Limit
Scrapingpass API enforces a max concurrency limit within the API to prevent abuse of the API. If user exceeds the max concurrency limit then API terminates with status code 429
.
You can always increase up your max concurrency limit by changing the API plan or contact us directly.
API Error Status Codes
Scrapingpass API returns different status codes depending on different error cases. They're listed below.
Status Code | Reason Phrase | Error Code | Description |
---|---|---|---|
400 | Bad Request | bad_request | Something went wrong while parsing user request. |
400 | Bad Request | options_validation_failed | Something wrong with options or API parameters. |
400 | Bad Request | user_proxy_unreachable | Proxy specified by the user is not reachable. |
400 | Bad Request | bad_proxy_request | Something wrong with proxy options or parameters in case of HTTP Proxy Mode. |
401 | Unauthorized | user_not_found | Invalid or missing API Key. |
403 | Forbidden | user_not_enough_balance | Not enough balance in wallet. |
403 | Forbidden | user_authorization_token_revoked | A revoked API Key has been specified. |
403 | Forbidden | user_balance_expired | Balance has been expired. You might need to recharge again. |
407 | Proxy Authentication Required | missing_proxy_authorization | Missing proxy authorization in case of HTTP Proxy Mode. |
429 | Too Many Requests | user_concurrency_limit_exceeded | User has exceeded their max concurrency limit. |
500 | Internal Server Error | unexpected_exception | Something went wrong with Scrapingpass API servers. |
Example API Error Response
{
"code": "user_not_found",
"data": null,
"error_data": {},
"error_type": "SCRAPINGPASS_ERROR",
"message": null,
"status_code": 401
}
API Credits Usage
Each request to the Scrapingpass API incurs some credits unless there has been some internal server error or some unexpected exception with the API itself.
Extra credits adds up based on the options specified. Exact extra credits used are listed within options reference page for options who cost extra credits.
If transparent mode is not enabled, then the API includes a Balance-Used
header which contains the total credits used for that particular request.