API Reference

A brief general reference of several components of Scrapingpass API. Includes common concepts for both HTTP REST API and HTTP Proxy Mode. Special cases for each will be mentioned accordingly.

API Interfaces

References of individual interfaces are documented in their respective sections. Navigate through following links to quickly get started with either of these.

Options Reference

Here are the listed options which are supported by the API to be used to customize the scrapingpass process, rendering, behaviour, etc. Note that few of these have different default values for HTTP Proxy Mode though they can be overridden as well.

Options which are missing default values are mandatory and required to be specified. Similarly options which have listed credits adds up as extra credits. Options with missing credits are free to specify and don't incur any extra charges.

Identifier	Type	Description	HTTP Rest API Default	HTTP Proxy Mode Default	Credits
`api_key`	String	Scrapingpass API Key required for authorization.	-	-	-
`url`	String	Actual URL of remote resource to scrape.	-	Same as request URL	-
`method`	String	HTTP method to use while making request to remote.	Same as request method	Same as request method	-
`cookies`	String	The cookies to include with remote request in format `cookie1Name=value;cookie2Name=value;`.	null	null	-
`forward_headers`	Boolean	Whether to forward request headers to remote.	false	true	-
`transparent_mode`	Boolean	Whether to keep API response headers, status code, etc transparent with remote.	false	true	-
`json`	Boolean	Whether to return JSON response wrapping remote response entities.	false	false	-
`proxy_type`	String	The type of proxy to use for remote request.	none	auto	-
`proxy_url`	String	Proxy to use for remote request rather than default existing proxy.	none	none	-
`scrape_google`	Boolean	Must be specified true when remote host is Google.	false	false	10
`device`	String	Type of device to pretend while making remote request.	desktop	desktop	-
`extraction_rules`	String	Stringified JSON object with key as title and value as a CSS selector.	none	none	-
`forward_user_agent`	Boolean	Whether to forward user agent as received.	false	false	-
`request`	Boolean	Explicitly specify to use normal request for remote.	true	true	-
`html`	Boolean	Explicitly specify to return HTML of remote response.	true	true	1
`js_rendering`	Boolean	Use JS Rendering rather than normal requests using headless browsers.	false	false	4
`screenshot`	Boolean	Use JS Rendering and render screenshot of the page.	false	false	2
`pdf`	Boolean	Use JS Rendering and render PDF of the page.	false	false	3
`execute_script`	String	An arbitrary javascript to execute while in JS Rendering after page load.	none	none	-
`js_scroll`	Boolean	Whether to take screenshot of full scrollable page in JS Rendering.	false	false	-
`js_scroll_wait`	Number	Delay to enforce between each scrolls in JS Rendering in milliseconds.	100	100	-
`js_wait`	Number	Number of seconds to wait in JS Rendering after page load or before rendering response.	1	1	-
`window_height`	Number	Height of the browser viewport in case of JS Rendering.	1080	1080	-
`window_width`	Number	Width of the browser viewport in case of JS Rendering.	1920	1920	-
`google_search`	Boolean	Explicitly specify to use Google Search mode.	false	false	10
`query`	String	The search term or query for Google Search mode.			-
`country_code`	String	The geolocation to mimic remote requests from.	none	none	4

note

It is recommended to URL encode all of the string type options before passing it to API.

`api_key`

The API Key for authorization with Scrapingpass API. For more details on its usage, consider checking specific authorization sections for each interfaces.

tip

Want to quickly try out our API? Get your API Key now and start exploring!

`url`

The actual URL which you want to scrape or request. This option is required and must be URL encoded. You don't need to specify this option in Proxy Mode as the request URL is considered for this.

`method`

The HTTP method to use while making request to remote. If this is POST/PUT request and you need to send any data, you can normally include it in your original request. Scrapingpass will forward all of data as received to the remote.

`cookies`

The HTTP cookies to include along with the request to remote. You must specify the cookies in format of name=value pairs separated by a semicolon. For example, cookie1Name=value;cookie2Name=value;.

You can specify cookies to scrape or get screenshot of some page which requires you to be logged in. Scrapingpass will normally resume and pick up the session using the cookies specified.

tip

You can also directly specify your cookie string in HTTP header as in Cookie key with forward_headers set to true.

`forward_headers`

Determines whether to forward the headers along with request to remote. In REST API mode, if this option is enabled then request headers are forwarded to the remote.

The headers which are prefixed with Sp- are always forwarded to the remote independent of the forward_headers option. For example, if you need to pass a Referer header, you can also do so by sending header with key Sp-Referer. Scrapingpass will strip Sp- and forward your Referer header normally.

This option is enabled by default in the HTTP Proxy Mode.

tip

You don't need to prefix headers in HTTP Proxy Mode. Scrapingpass will automatically forward all of the headers as received along with the request to remote.

`transparent_mode`

Determines if the API response status code, headers must be transparent with remote response. If it's specified true, then scrapingpass will return response headers and status code in same way as if you were directly making request to remote.

This option is enabled by default in the HTTP Proxy Mode.

`json`

Determines if the API response should be wrapped within a JSON object. This wraps API response along with the actual remote response content, status code, headers, etc within a single JSON object.

Here is an example API response with json=true specified. The data field holds the actual response from the remote.

{
    "code": "success",
    "data": {
        "body": "<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset=\"utf-8\" />\n    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n    <style type=\"text/css\">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
        "encoding": "utf-8",
        "headers": {
            "Accept-Ranges": "bytes",
            "Age": "62459",
            "Cache-Control": "max-age=604800",
            "Content-Encoding": "gzip",
            "Content-Length": "648",
            "Content-Type": "text/html; charset=UTF-8",
            "Date": "Tue, 07 Dec 2021 09:24:56 GMT",
            "Etag": "\"3147526947\"",
            "Expires": "Tue, 14 Dec 2021 09:24:56 GMT",
            "Last-Modified": "Thu, 17 Oct 2019 07:18:26 GMT",
            "Server": "ECS (oxr/830E)",
            "Vary": "Accept-Encoding",
            "X-Cache": "HIT"
        },
        "status_code": 200
    },
    "message": null,
    "status_code": 200
}

`proxy_type`

The type of proxy to use for making remote requests. The possible proxy types and their respective extra credits are listed. You may use any one of these proxy types depending on your requirement.

Proxy Type	Extra Credits Used
`none`	-
`auto`	-
`data_center`	1
`residential`	3

`proxy_url`

If you want to use your own proxy for remote requests and don't want to use the ones provided by Scrapingpass, you can specify your own proxy URL to this field. Note that it must be a vaild and reachable proxy URL.

`scrape_google`

Determines whether the requested remote host is of Google. This option must be specified for all remote Google hostnames.

`device`

The type of device to pretend to remote. This is controlled by using different user agents for remote requests. The available devices are listed.

desktop
mobile

`extraction_rules`

If you want to extract data from page and don't want to parse HTMLs, you can directly specify a stringified JSON object of key value pairs where key determines the identifier and value determines the exact CSS selector of element to extract text of.

For example, to extract the title and paragraph text from https://example.com/, the following extraction rule does the job.

extraction_rules={"title":"body > div > h1","paragraph":"body > div > p:nth-child(2)"}

Example response in case of extraction rules specified:

{
    "title": "Example Domain",
    "paragraph": "This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission."
}

`forward_user_agent`

Determines whether to forward the User-Agent header as received with the remote request. Most of times this is helpful in scraping sites which tend to block few sets of user agents.

`request`

Determines whether to use normal HTTP request for remote rather than using headless browsers. This option is enabled by default.

`html`

Determines whether to return HTML body of remote response. This option is enabled by default. It can also be specified in conjunction with the rendering_rendering=true to get rendered HTML of client side rendered web applications.

`js_rendering`

Determines whether to use JS Rendering using headless browsers for remote requests rather than normal HTTP request. This is suitable for modern client side rendered web applications.

`screenshot`

This options renders screenshot of the page using JS Rendering. If js_rendering option is missing, it is automatically passed for you.

`pdf`

This option renders PDF of the page using JS Rendering. If js_rendering option is missing, it is automatically passed for you.

`execute_script`

Sometimes its required to click on pop ups or some buttons while in JS Rendering to reveal the actual content. This options allows you to execute any arbitrary javascript after the page has been loaded.

`js_scroll`

Determines whether to take screenshot of full scrollable page in JS Rendering. This option can used in conjunction with the PDF and Screenshot options.

`js_scroll_wait`

The delay to enforce between each scrolls in JS Rendering in milliseconds. This option can used in conjunction with the PDF and Screenshot options.

`js_wait`

Number of seconds to wait in JS Rendering after page load or before rendering response. This can be useful for web applications which take significant amount of time to render.

`window_height`

Height of the browser viewport in case of JS Rendering.

`window_width`

Width of the browser viewport in case of JS Rendering.

`google_search`

Determines whether to use the Google Search mode. This mode takes care of proxy rotation, re captchas itself so that you can focus purely on extracting the search results.

`query`

The search term for Google Search mode. This option is mandatory and required with google_search=true option.

`country_code`

Along with using premium proxy types, you can also specify a country code to pretend remote request being made from. The supported countries are list in the table following.

Country Code	Country Name
`af`	Afghanistan
`am`	Armenia
`az`	Azerbaijan
`bh`	Bahrain
`bd`	Bangladesh
`bt`	Bhutan
`bn`	Brunei
`kh`	Cambodia
`cn`	China
`ge`	Georgia
`hk`	Hong Kong
`in`	India
`id`	Indonesia
`il`	Israel
`jp`	Japan
`jo`	Jordan
`kz`	Kazakhstan
`kw`	Kuwait
`kg`	Kyrgyzstan
`la`	Laos
`mo`	Macau
`my`	Malaysia
`mv`	Maldives
`mn`	Mongolia
`mm`	Myanmar
`np`	Nepal
`om`	Oman
`pk`	Pakistan
`ps`	Palestine
`ph`	Philippines
`qa`	Qatar
`sa`	Saudi Arabia
`sg`	Singapore
`kr`	South Korea
`lk`	Sri Lanka
`tw`	Taiwan
`tj`	Tajikistan
`th`	Thailand
`tl`	Timor-Leste
`ae`	United Arab Emirates
`uz`	Uzbekistan
`vn`	Vietnam
`ye`	Yemen
`ax`	Ãland Islands
`al`	Albania
`ad`	Andorra
`at`	Austria
`by`	Belarus
`be`	Belgium
`ba`	Bosnia and Herzegovina
`bg`	Bulgaria
`hr`	Croatia
`cy`	Cyprus
`cz`	Czech Republic
`dk`	Denmark
`ee`	Estonia
`fo`	Faroe Islands
`fi`	Finland
`fr`	France
`de`	Germany
`gi`	Gibraltar
`gr`	Greece
`gl`	Greenland
`gg`	Guernsey
`hu`	Hungary
`is`	Iceland
`ie`	Ireland
`im`	Isle of Man
`it`	Italy
`je`	Jersey
`lv`	Latvia
`li`	Liechtenstein
`lt`	Lithuania
`lu`	Luxembourg
`mk`	Macedonia
`mt`	Malta
`md`	Moldova
`mc`	Monaco
`me`	Montenegro
`nl`	Netherlands
`no`	Norway
`pl`	Poland
`pt`	Portugal
`ro`	Romania
`ru`	Russia
`sm`	San Marino
`rs`	Serbia
`sk`	Slovakia
`si`	Slovenia
`es`	Spain
`se`	Sweden
`ch`	Switzerland
`tr`	Turkey
`ua`	Ukraine
`gb`	United Kingdom
`dz`	Algeria
`ao`	Angola
`bj`	Benin
`bw`	Botswana
`bf`	Burkina Faso
`bi`	Burundi
`cv`	Cabo Verde
`cm`	Cameroon
`td`	Chad
`km`	Comoros
`cg`	Congo
`ci`	CÃ´te d'Ivoire
`dj`	Djibouti
`eg`	Egypt
`gq`	Equatorial Guinea
`et`	Ethiopia
`ga`	Gabon
`gm`	Gambia
`gh`	Ghana
`gn`	Guinea
`gw`	Guinea-Bissau
`ke`	Kenya
`ls`	Lesotho
`lr`	Liberia
`ly`	Libya
`mg`	Madagascar
`mw`	Malawi
`ml`	Mali
`mr`	Mauritania
`mu`	Mauritius
`yt`	Mayotte
`ma`	Morocco
`mz`	Mozambique
`na`	Namibia
`ne`	Niger
`ng`	Nigeria
`rw`	Rwanda
`sn`	Senegal
`sc`	Seychelles
`sl`	Sierra Leone
`so`	Somalia
`za`	South Africa
`ss`	South Sudan
`sd`	Sudan
`sz`	Swaziland
`tz`	Tanzania
`tg`	Togo
`tn`	Tunisia
`ug`	Uganda
`zm`	Zambia
`zw`	Zimbabwe
`as`	American Samoa
`au`	Australia
`cx`	Christmas Island
`ck`	Cook Islands
`fj`	Fiji
`pf`	French Polynesia
`gu`	Guam
`ki`	Kiribati
`mh`	Marshall Islands
`fm`	Micronesia
`nr`	Nauru
`nc`	New Caledonia
`nz`	New Zealand
`mp`	Northern Mariana Islands
`pw`	Palau
`pg`	Papua New Guinea
`sb`	Solomon Islands
`vu`	Vanuatu
`ai`	Anguilla
`ag`	Antigua and Barbuda
`aw`	Aruba
`bs`	Bahamas
`bb`	Barbados
`bz`	Belize
`bm`	Bermuda
`ca`	Canada
`ky`	Cayman Islands
`cr`	Costa Rica
`cu`	Cuba
`cw`	CuraÃ§ao
`dm`	Dominica
`do`	Dominican Republic
`sv`	El Salvador
`gd`	Grenada
`gt`	Guatemala
`ht`	Haiti
`hn`	Honduras
`jm`	Jamaica
`mq`	Martinique
`mx`	Mexico
`ni`	Nicaragua
`pa`	Panama
`pr`	Puerto Rico
`bl`	Saint BarthÃ©lemy
`kn`	Saint Kitts and Nevis
`lc`	Saint Lucia
`mf`	Saint Martin
`vc`	Saint Vincent and the Grenadines
`tt`	Trinidad and Tobago
`tc`	Turks and Caicos Islands
`us`	United States
`vg`	Virgin Islands, British
`vi`	Virgin Islands, United States
`ar`	Argentina
`bo`	Bolivia
`br`	Brazil
`cl`	Chile
`co`	Colombia
`ec`	Ecuador
`gy`	Guyana
`py`	Paraguay
`pe`	Peru
`sr`	Surinam
`uy`	Uruguay
`ve`	Venezuela

API Max Concurrency Limit

Scrapingpass API enforces a max concurrency limit within the API to prevent abuse of the API. If user exceeds the max concurrency limit then API terminates with status code 429.

You can always increase up your max concurrency limit by changing the API plan or contact us directly.

API Error Status Codes

Scrapingpass API returns different status codes depending on different error cases. They're listed below.

Status Code	Reason Phrase	Error Code	Description
400	Bad Request	`bad_request`	Something went wrong while parsing user request.
400	Bad Request	`options_validation_failed`	Something wrong with options or API parameters.
400	Bad Request	`user_proxy_unreachable`	Proxy specified by the user is not reachable.
400	Bad Request	`bad_proxy_request`	Something wrong with proxy options or parameters in case of HTTP Proxy Mode.
401	Unauthorized	`user_not_found`	Invalid or missing API Key.
403	Forbidden	`user_not_enough_balance`	Not enough balance in wallet.
403	Forbidden	`user_authorization_token_revoked`	A revoked API Key has been specified.
403	Forbidden	`user_balance_expired`	Balance has been expired. You might need to recharge again.
407	Proxy Authentication Required	`missing_proxy_authorization`	Missing proxy authorization in case of HTTP Proxy Mode.
429	Too Many Requests	`user_concurrency_limit_exceeded`	User has exceeded their max concurrency limit.
500	Internal Server Error	`unexpected_exception`	Something went wrong with Scrapingpass API servers.

Example API Error Response

{
    "code": "user_not_found",
    "data": null,
    "error_data": {},
    "error_type": "SCRAPINGPASS_ERROR",
    "message": null,
    "status_code": 401
}

API Credits Usage

Each request to the Scrapingpass API incurs some credits unless there has been some internal server error or some unexpected exception with the API itself.

Extra credits adds up based on the options specified. Exact extra credits used are listed within options reference page for options who cost extra credits.

If transparent mode is not enabled, then the API includes a Balance-Used header which contains the total credits used for that particular request.

API Interfaces​

Options Reference​

note

api_key​

tip

url​

method​

cookies​

tip

forward_headers​

tip

transparent_mode​

json​

proxy_type​

proxy_url​

scrape_google​

device​

extraction_rules​

forward_user_agent​

request​

html​

js_rendering​

screenshot​

pdf​

execute_script​

js_scroll​

js_scroll_wait​

js_wait​

window_height​

window_width​

google_search​

query​

country_code​

API Max Concurrency Limit​

API Error Status Codes​

Example API Error Response​

API Credits Usage​

API Interfaces

Options Reference

`api_key`

`url`

`method`

`cookies`

`forward_headers`

`transparent_mode`

`json`

`proxy_type`

`proxy_url`

`scrape_google`

`device`

`extraction_rules`

`forward_user_agent`

`request`

`html`

`js_rendering`

`screenshot`

`pdf`

`execute_script`

`js_scroll`

`js_scroll_wait`

`js_wait`

`window_height`

`window_width`

`google_search`

`query`

`country_code`

API Max Concurrency Limit

API Error Status Codes

Example API Error Response

API Credits Usage