Developer Tools 8 min readBy Mehadi ShawonPublished 2026-06-12Updated 2026-07-28

How to Scrape a Website Without Getting Blocked (2026 Guide)

Practical 2026 guide to web scraping without getting blocked: rotate IPs, mimic real browsers, respect robots.txt, and use the right tools and headers.

Glowing golden shield protecting a web scraper extracting data streams from a website

Quick answer

To scrape a website without getting blocked, rotate your IP with residential proxies, throttle requests to 1–2 per second, send a realistic User-Agent and full browser headers, respect robots.txt, cache responses, and use a headless browser like Playwright for JavaScript-heavy sites.

Last Updated: June 2026 · Written by DigiMetrics Hub Team · 8 min read

If your scraper worked for ten minutes and then started getting 403s, CAPTCHAs, or empty pages, you are not alone. Modern websites use Cloudflare, Akamai, and DataDome to spot bots in milliseconds. The good news: you can scrape almost any public site reliably if you behave like a real visitor. Here is the 2026 playbook.

Why Websites Block Scrapers in the First Place

Sites block bots to protect servers, prevent price scraping by competitors, defend against credential stuffing, and stay compliant with terms of service. The block does not mean the data is illegal to access — it means your traffic pattern looked suspicious.

Too many requests in a short window from one IP.
Missing or default User-Agent (python-requests/2.x is a giveaway).
No referer, no cookies, no Accept-Language header.
Same request order on every page — humans click around randomly.
Headless browser fingerprints like navigator.webdriver = true.

Step 1 — Throttle Your Requests

Speed is the single biggest reason scrapers get blocked. A real user reads a page for at least a few seconds before clicking. Add a randomized delay between 1 and 4 seconds. For large jobs, schedule them across hours instead of slamming the site in five minutes.

Check a target site's response headers and status codes first.

Open HTTP Status Checker

Step 2 — Send Realistic Headers

Browsers send a rich set of headers on every request. Match them. At minimum, include User-Agent, Accept, Accept-Language, Accept-Encoding, and a plausible Referer that matches the navigation path.

Open the target site in Chrome DevTools → Network tab.
Copy the request as cURL.
Paste those exact headers into your scraper.
Rotate User-Agent strings every few requests.

Step 3 — Rotate IPs With Proxies

One IP making thousands of requests is the clearest bot signal there is. Use a proxy pool so each request (or each session) comes from a different address.

Datacenter proxies — cheapest, blocked the fastest.
Residential proxies — real consumer IPs, hardest to detect.
Mobile proxies — best for Instagram, TikTok, and other mobile-first apps.
Tor — free, but slow and increasingly fingerprinted.

See what your current IP looks like to target sites.

Open What Is My IP

Step 4 — Respect robots.txt and Rate Limits

Fetching /robots.txt before scraping is both ethical and practical. Disallowed paths are often the ones with the most aggressive bot detection, so honoring them keeps you out of trouble and off the block list.

Step 5 — Handle JavaScript-Rendered Pages

If the data only appears after the page runs JavaScript (React, Vue, Next.js sites), a plain HTTP request returns an empty shell. Use a headless browser:

Playwright — modern, cross-browser, great for stealth scraping.
Puppeteer — Chrome-only, huge ecosystem, easy stealth plugins.
Selenium — older, slower, still works for legacy stacks.

Ad Space

Step 6 — Solve or Avoid CAPTCHAs

If you are hitting reCAPTCHA or hCaptcha, you have already been flagged. Solving services like 2Captcha and Anti-Captcha cost roughly $1–3 per 1,000 solves. The better long-term fix is to look less like a bot in the first place — slower requests, better headers, residential IPs.

Step 7 — Cache Aggressively

Most scraping jobs re-fetch pages that have not changed. Cache responses locally with a hash of the URL and an ETag or Last-Modified check. You scrape less, get blocked less, and finish faster.

Want to monitor the SSL and HTTP headers of a target site over time?

Open SSL Checker

Tools That Make Stealth Scraping Easier

Playwright + playwright-stealth.
puppeteer-extra with the stealth plugin.
Scrapy + scrapy-rotating-proxies.
ScraperAPI, ScrapingBee, ZenRows — managed scraping with proxy + CAPTCHA solving.
Crawlee (Apify) — open-source, modern, batteries included.

Common Mistakes That Get You Blocked

Scraping from a single static IP at full speed.
Leaving the default Python or Node User-Agent string.
Hitting the same URL pattern in perfect order.
Ignoring HTTP 429 (rate limit) responses — back off immediately.
Running headless Chrome with no stealth patches.

Frequently Asked Questions

How many requests per second is safe?

Aim for 1–2 requests per second per IP for most sites. Large sites like Amazon or Google can take more, but new or small sites often rate-limit at 1 request every few seconds.

Should I use a VPN or a proxy for scraping?

A VPN gives you one new IP — fine for testing, useless at scale. Proxy pools give you hundreds or thousands of rotating IPs, which is what you actually need to scrape without getting blocked.

Does using a headless browser slow scraping down?

Yes — by 5–10x compared to plain HTTP requests. Only use a headless browser for pages that genuinely require JavaScript rendering. For everything else, plain requests with good headers are faster and cheaper.

Frequently Asked Questions

Why does my scraper get blocked after a few requests?+

You are almost certainly hitting the site too fast from a single IP with a default Python or Node User-Agent. Add a 1–2 second delay between requests, set a real browser User-Agent, and rotate IPs through a proxy pool.

Do I need paid proxies to scrape without getting blocked?+

For small jobs, free public proxies work for hours. For anything serious — daily scraping, login-gated data, or e-commerce sites — residential proxies from providers like Bright Data or Smartproxy are far more reliable.

Is it legal to scrape a website that blocks scrapers?+

Scraping publicly accessible, non-personal data is generally legal in the US (hiQ v. LinkedIn). However, bypassing technical protections may violate the CFAA, and breaching Terms of Service can expose you to a civil claim.

Can headless browsers be detected?+

Yes. Default Puppeteer and Selenium leave fingerprints (navigator.webdriver, missing plugins). Use playwright-stealth or puppeteer-extra-plugin-stealth to patch the most common tells.

Ad Space

Browse all in Developer Tools

Developer Tools

Try the related free tools

Hands-on utilities from DigiMetrics Hub that go with this guide.

All tools

Developer Tools

All articles All tools Homepage

How to Scrape a Website Without Getting Blocked (2026 Guide)

Why Websites Block Scrapers in the First Place

Step 1 — Throttle Your Requests

Step 2 — Send Realistic Headers

Step 3 — Rotate IPs With Proxies

Step 4 — Respect robots.txt and Rate Limits

Step 5 — Handle JavaScript-Rendered Pages

Step 6 — Solve or Avoid CAPTCHAs

Step 7 — Cache Aggressively

Tools That Make Stealth Scraping Easier

Common Mistakes That Get You Blocked

Frequently Asked Questions

How many requests per second is safe?

Should I use a VPN or a proxy for scraping?

Does using a headless browser slow scraping down?

Frequently Asked Questions

Related articles

How to Check If a Website Is Down

What Is WHOIS Lookup and How to Use It

What Is HTTP Status Code (200, 404, 500 Explained)

Best Free Online Tools for Developers (2026)

What Is Web Scraping? How It Works, Uses, and Tools (2026)

What Is a CDN (Content Delivery Network)? Guide 2026

Try the related free tools

HTTP Headers Checker

JSON Formatter

Base64 Encoder

URL Encoder