Skip to main content

API Reference

A brief general reference of several components of Scrapingpass API. Includes common concepts for both HTTP REST API and HTTP Proxy Mode. Special cases for each will be mentioned accordingly.

API Interfaces

References of individual interfaces are documented in their respective sections. Navigate through following links to quickly get started with either of these.

Options Reference

Here are the listed options which are supported by the API to be used to customize the scrapingpass process, rendering, behaviour, etc. Note that few of these have different default values for HTTP Proxy Mode though they can be overridden as well.

Options which are missing default values are mandatory and required to be specified. Similarly options which have listed credits adds up as extra credits. Options with missing credits are free to specify and don't incur any extra charges.

IdentifierTypeDescriptionHTTP Rest API DefaultHTTP Proxy Mode DefaultCredits
api_keyStringScrapingpass API Key required for authorization.---
urlStringActual URL of remote resource to scrape.-Same as request URL-
methodStringHTTP method to use while making request to remote.Same as request methodSame as request method-
cookiesStringThe cookies to include with remote request in format cookie1Name=value;cookie2Name=value;.nullnull-
forward_headersBooleanWhether to forward request headers to remote.falsetrue-
transparent_modeBooleanWhether to keep API response headers, status code, etc transparent with remote.falsetrue-
jsonBooleanWhether to return JSON response wrapping remote response entities.falsefalse-
proxy_typeStringThe type of proxy to use for remote request.noneauto-
proxy_urlStringProxy to use for remote request rather than default existing proxy.nonenone-
scrape_googleBooleanMust be specified true when remote host is Google.falsefalse10
deviceStringType of device to pretend while making remote request.desktopdesktop-
extraction_rulesStringStringified JSON object with key as title and value as a CSS selector.nonenone-
forward_user_agentBooleanWhether to forward user agent as received.falsefalse-
requestBooleanExplicitly specify to use normal request for remote.truetrue-
htmlBooleanExplicitly specify to return HTML of remote response.truetrue1
js_renderingBooleanUse JS Rendering rather than normal requests using headless browsers.falsefalse4
screenshotBooleanUse JS Rendering and render screenshot of the page.falsefalse2
pdfBooleanUse JS Rendering and render PDF of the page.falsefalse3
execute_scriptStringAn arbitrary javascript to execute while in JS Rendering after page load.nonenone-
js_scrollBooleanWhether to take screenshot of full scrollable page in JS Rendering.falsefalse-
js_scroll_waitNumberDelay to enforce between each scrolls in JS Rendering in milliseconds.100100-
js_waitNumberNumber of seconds to wait in JS Rendering after page load or before rendering response.11-
window_heightNumberHeight of the browser viewport in case of JS Rendering.10801080-
window_widthNumberWidth of the browser viewport in case of JS Rendering.19201920-
google_searchBooleanExplicitly specify to use Google Search mode.falsefalse10
queryStringThe search term or query for Google Search mode.-
country_codeStringThe geolocation to mimic remote requests from.nonenone4
note

It is recommended to URL encode all of the string type options before passing it to API.

api_key

The API Key for authorization with Scrapingpass API. For more details on its usage, consider checking specific authorization sections for each interfaces.

tip

Want to quickly try out our API? Get your API Key now and start exploring!

url

The actual URL which you want to scrape or request. This option is required and must be URL encoded. You don't need to specify this option in Proxy Mode as the request URL is considered for this.

method

The HTTP method to use while making request to remote. If this is POST/PUT request and you need to send any data, you can normally include it in your original request. Scrapingpass will forward all of data as received to the remote.

cookies

The HTTP cookies to include along with the request to remote. You must specify the cookies in format of name=value pairs separated by a semicolon. For example, cookie1Name=value;cookie2Name=value;.

You can specify cookies to scrape or get screenshot of some page which requires you to be logged in. Scrapingpass will normally resume and pick up the session using the cookies specified.

tip

You can also directly specify your cookie string in HTTP header as in Cookie key with forward_headers set to true.

forward_headers

Determines whether to forward the headers along with request to remote. In REST API mode, if this option is enabled then request headers are forwarded to the remote.

The headers which are prefixed with Sp- are always forwarded to the remote independent of the forward_headers option. For example, if you need to pass a Referer header, you can also do so by sending header with key Sp-Referer. Scrapingpass will strip Sp- and forward your Referer header normally.

This option is enabled by default in the HTTP Proxy Mode.

tip

You don't need to prefix headers in HTTP Proxy Mode. Scrapingpass will automatically forward all of the headers as received along with the request to remote.

transparent_mode

Determines if the API response status code, headers must be transparent with remote response. If it's specified true, then scrapingpass will return response headers and status code in same way as if you were directly making request to remote.

This option is enabled by default in the HTTP Proxy Mode.

json

Determines if the API response should be wrapped within a JSON object. This wraps API response along with the actual remote response content, status code, headers, etc within a single JSON object.

Here is an example API response with json=true specified. The data field holds the actual response from the remote.

{
"code": "success",
"data": {
"body": "<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset=\"utf-8\" />\n <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style type=\"text/css\">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 2em;\n background-color: #fdfdff;\n border-radius: 0.5em;\n box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (max-width: 700px) {\n div {\n margin: 0 auto;\n width: auto;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"encoding": "utf-8",
"headers": {
"Accept-Ranges": "bytes",
"Age": "62459",
"Cache-Control": "max-age=604800",
"Content-Encoding": "gzip",
"Content-Length": "648",
"Content-Type": "text/html; charset=UTF-8",
"Date": "Tue, 07 Dec 2021 09:24:56 GMT",
"Etag": "\"3147526947\"",
"Expires": "Tue, 14 Dec 2021 09:24:56 GMT",
"Last-Modified": "Thu, 17 Oct 2019 07:18:26 GMT",
"Server": "ECS (oxr/830E)",
"Vary": "Accept-Encoding",
"X-Cache": "HIT"
},
"status_code": 200
},
"message": null,
"status_code": 200
}

proxy_type

The type of proxy to use for making remote requests. The possible proxy types and their respective extra credits are listed. You may use any one of these proxy types depending on your requirement.

Proxy TypeExtra Credits Used
none-
auto-
data_center1
residential3

proxy_url

If you want to use your own proxy for remote requests and don't want to use the ones provided by Scrapingpass, you can specify your own proxy URL to this field. Note that it must be a vaild and reachable proxy URL.

scrape_google

Determines whether the requested remote host is of Google. This option must be specified for all remote Google hostnames.

device

The type of device to pretend to remote. This is controlled by using different user agents for remote requests. The available devices are listed.

  • desktop
  • mobile

extraction_rules

If you want to extract data from page and don't want to parse HTMLs, you can directly specify a stringified JSON object of key value pairs where key determines the identifier and value determines the exact CSS selector of element to extract text of.

For example, to extract the title and paragraph text from https://example.com/, the following extraction rule does the job.

extraction_rules={"title":"body > div > h1","paragraph":"body > div > p:nth-child(2)"}

Example response in case of extraction rules specified:

{
"title": "Example Domain",
"paragraph": "This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission."
}

forward_user_agent

Determines whether to forward the User-Agent header as received with the remote request. Most of times this is helpful in scraping sites which tend to block few sets of user agents.

request

Determines whether to use normal HTTP request for remote rather than using headless browsers. This option is enabled by default.

html

Determines whether to return HTML body of remote response. This option is enabled by default. It can also be specified in conjunction with the rendering_rendering=true to get rendered HTML of client side rendered web applications.

js_rendering

Determines whether to use JS Rendering using headless browsers for remote requests rather than normal HTTP request. This is suitable for modern client side rendered web applications.

screenshot

This options renders screenshot of the page using JS Rendering. If js_rendering option is missing, it is automatically passed for you.

pdf

This option renders PDF of the page using JS Rendering. If js_rendering option is missing, it is automatically passed for you.

execute_script

Sometimes its required to click on pop ups or some buttons while in JS Rendering to reveal the actual content. This options allows you to execute any arbitrary javascript after the page has been loaded.

js_scroll

Determines whether to take screenshot of full scrollable page in JS Rendering. This option can used in conjunction with the PDF and Screenshot options.

js_scroll_wait

The delay to enforce between each scrolls in JS Rendering in milliseconds. This option can used in conjunction with the PDF and Screenshot options.

js_wait

Number of seconds to wait in JS Rendering after page load or before rendering response. This can be useful for web applications which take significant amount of time to render.

window_height

Height of the browser viewport in case of JS Rendering.

window_width

Width of the browser viewport in case of JS Rendering.

Determines whether to use the Google Search mode. This mode takes care of proxy rotation, re captchas itself so that you can focus purely on extracting the search results.

query

The search term for Google Search mode. This option is mandatory and required with google_search=true option.

country_code

Along with using premium proxy types, you can also specify a country code to pretend remote request being made from. The supported countries are list in the table following.

Country CodeCountry Name
afAfghanistan
amArmenia
azAzerbaijan
bhBahrain
bdBangladesh
btBhutan
bnBrunei
khCambodia
cnChina
geGeorgia
hkHong Kong
inIndia
idIndonesia
ilIsrael
jpJapan
joJordan
kzKazakhstan
kwKuwait
kgKyrgyzstan
laLaos
moMacau
myMalaysia
mvMaldives
mnMongolia
mmMyanmar
npNepal
omOman
pkPakistan
psPalestine
phPhilippines
qaQatar
saSaudi Arabia
sgSingapore
krSouth Korea
lkSri Lanka
twTaiwan
tjTajikistan
thThailand
tlTimor-Leste
aeUnited Arab Emirates
uzUzbekistan
vnVietnam
yeYemen
axÅland Islands
alAlbania
adAndorra
atAustria
byBelarus
beBelgium
baBosnia and Herzegovina
bgBulgaria
hrCroatia
cyCyprus
czCzech Republic
dkDenmark
eeEstonia
foFaroe Islands
fiFinland
frFrance
deGermany
giGibraltar
grGreece
glGreenland
ggGuernsey
huHungary
isIceland
ieIreland
imIsle of Man
itItaly
jeJersey
lvLatvia
liLiechtenstein
ltLithuania
luLuxembourg
mkMacedonia
mtMalta
mdMoldova
mcMonaco
meMontenegro
nlNetherlands
noNorway
plPoland
ptPortugal
roRomania
ruRussia
smSan Marino
rsSerbia
skSlovakia
siSlovenia
esSpain
seSweden
chSwitzerland
trTurkey
uaUkraine
gbUnited Kingdom
dzAlgeria
aoAngola
bjBenin
bwBotswana
bfBurkina Faso
biBurundi
cvCabo Verde
cmCameroon
tdChad
kmComoros
cgCongo
ciCôte d'Ivoire
djDjibouti
egEgypt
gqEquatorial Guinea
etEthiopia
gaGabon
gmGambia
ghGhana
gnGuinea
gwGuinea-Bissau
keKenya
lsLesotho
lrLiberia
lyLibya
mgMadagascar
mwMalawi
mlMali
mrMauritania
muMauritius
ytMayotte
maMorocco
mzMozambique
naNamibia
neNiger
ngNigeria
rwRwanda
snSenegal
scSeychelles
slSierra Leone
soSomalia
zaSouth Africa
ssSouth Sudan
sdSudan
szSwaziland
tzTanzania
tgTogo
tnTunisia
ugUganda
zmZambia
zwZimbabwe
asAmerican Samoa
auAustralia
cxChristmas Island
ckCook Islands
fjFiji
pfFrench Polynesia
guGuam
kiKiribati
mhMarshall Islands
fmMicronesia
nrNauru
ncNew Caledonia
nzNew Zealand
mpNorthern Mariana Islands
pwPalau
pgPapua New Guinea
sbSolomon Islands
vuVanuatu
aiAnguilla
agAntigua and Barbuda
awAruba
bsBahamas
bbBarbados
bzBelize
bmBermuda
caCanada
kyCayman Islands
crCosta Rica
cuCuba
cwCuraçao
dmDominica
doDominican Republic
svEl Salvador
gdGrenada
gtGuatemala
htHaiti
hnHonduras
jmJamaica
mqMartinique
mxMexico
niNicaragua
paPanama
prPuerto Rico
blSaint Barthélemy
knSaint Kitts and Nevis
lcSaint Lucia
mfSaint Martin
vcSaint Vincent and the Grenadines
ttTrinidad and Tobago
tcTurks and Caicos Islands
usUnited States
vgVirgin Islands, British
viVirgin Islands, United States
arArgentina
boBolivia
brBrazil
clChile
coColombia
ecEcuador
gyGuyana
pyParaguay
pePeru
srSurinam
uyUruguay
veVenezuela

API Max Concurrency Limit

Scrapingpass API enforces a max concurrency limit within the API to prevent abuse of the API. If user exceeds the max concurrency limit then API terminates with status code 429.

You can always increase up your max concurrency limit by changing the API plan or contact us directly.

API Error Status Codes

Scrapingpass API returns different status codes depending on different error cases. They're listed below.

Status CodeReason PhraseError CodeDescription
400Bad Requestbad_requestSomething went wrong while parsing user request.
400Bad Requestoptions_validation_failedSomething wrong with options or API parameters.
400Bad Requestuser_proxy_unreachableProxy specified by the user is not reachable.
400Bad Requestbad_proxy_requestSomething wrong with proxy options or parameters in case of HTTP Proxy Mode.
401Unauthorizeduser_not_foundInvalid or missing API Key.
403Forbiddenuser_not_enough_balanceNot enough balance in wallet.
403Forbiddenuser_authorization_token_revokedA revoked API Key has been specified.
403Forbiddenuser_balance_expiredBalance has been expired. You might need to recharge again.
407Proxy Authentication Requiredmissing_proxy_authorizationMissing proxy authorization in case of HTTP Proxy Mode.
429Too Many Requestsuser_concurrency_limit_exceededUser has exceeded their max concurrency limit.
500Internal Server Errorunexpected_exceptionSomething went wrong with Scrapingpass API servers.

Example API Error Response

{
"code": "user_not_found",
"data": null,
"error_data": {},
"error_type": "SCRAPINGPASS_ERROR",
"message": null,
"status_code": 401
}

API Credits Usage

Each request to the Scrapingpass API incurs some credits unless there has been some internal server error or some unexpected exception with the API itself.

Extra credits adds up based on the options specified. Exact extra credits used are listed within options reference page for options who cost extra credits.

If transparent mode is not enabled, then the API includes a Balance-Used header which contains the total credits used for that particular request.