How to Scrape a Website Without Getting Blocked (2026 Guide)
Practical 2026 guide to web scraping without getting blocked: rotate IPs, mimic real browsers, respect robots.txt, and use the right tools and headers.
How to Scrape a Website Without Getting Blocked (2026 Guide)
Last Updated: June 2026 · Written by DigiMetrics Hub Team · 8 min read
If your scraper worked for ten minutes and then started getting 403s, CAPTCHAs, or empty pages, you are not alone. Modern websites use Cloudflare, Akamai, and DataDome to spot bots in milliseconds. The good news: you can scrape almost any public site reliably if you behave like a real visitor. Here is the 2026 playbook.
Why Websites Block Scrapers in the First Place
Sites block bots to protect servers, prevent price scraping by competitors, defend against credential stuffing, and stay compliant with terms of service. The block does not mean the data is illegal to access — it means your traffic pattern looked suspicious.
- Too many requests in a short window from one IP.
- Missing or default User-Agent (python-requests/2.x is a giveaway).
- No referer, no cookies, no Accept-Language header.
- Same request order on every page — humans click around randomly.
- Headless browser fingerprints like navigator.webdriver = true.
Step 1 — Throttle Your Requests
Speed is the single biggest reason scrapers get blocked. A real user reads a page for at least a few seconds before clicking. Add a randomized delay between 1 and 4 seconds. For large jobs, schedule them across hours instead of slamming the site in five minutes.
Check a target site's response headers and status codes first.
Open HTTP Status CheckerStep 2 — Send Realistic Headers
Browsers send a rich set of headers on every request. Match them. At minimum, include User-Agent, Accept, Accept-Language, Accept-Encoding, and a plausible Referer that matches the navigation path.
- Open the target site in Chrome DevTools → Network tab.
- Copy the request as cURL.
- Paste those exact headers into your scraper.
- Rotate User-Agent strings every few requests.
Step 3 — Rotate IPs With Proxies
One IP making thousands of requests is the clearest bot signal there is. Use a proxy pool so each request (or each session) comes from a different address.
- Datacenter proxies — cheapest, blocked the fastest.
- Residential proxies — real consumer IPs, hardest to detect.
- Mobile proxies — best for Instagram, TikTok, and other mobile-first apps.
- Tor — free, but slow and increasingly fingerprinted.
See what your current IP looks like to target sites.
Open What Is My IPStep 4 — Respect robots.txt and Rate Limits
Fetching /robots.txt before scraping is both ethical and practical. Disallowed paths are often the ones with the most aggressive bot detection, so honoring them keeps you out of trouble and off the block list.
Step 5 — Handle JavaScript-Rendered Pages
If the data only appears after the page runs JavaScript (React, Vue, Next.js sites), a plain HTTP request returns an empty shell. Use a headless browser:
- Playwright — modern, cross-browser, great for stealth scraping.
- Puppeteer — Chrome-only, huge ecosystem, easy stealth plugins.
- Selenium — older, slower, still works for legacy stacks.
Step 6 — Solve or Avoid CAPTCHAs
If you are hitting reCAPTCHA or hCaptcha, you have already been flagged. Solving services like 2Captcha and Anti-Captcha cost roughly $1–3 per 1,000 solves. The better long-term fix is to look less like a bot in the first place — slower requests, better headers, residential IPs.
Step 7 — Cache Aggressively
Most scraping jobs re-fetch pages that have not changed. Cache responses locally with a hash of the URL and an ETag or Last-Modified check. You scrape less, get blocked less, and finish faster.
Want to monitor the SSL and HTTP headers of a target site over time?
Open SSL CheckerTools That Make Stealth Scraping Easier
- Playwright + playwright-stealth.
- puppeteer-extra with the stealth plugin.
- Scrapy + scrapy-rotating-proxies.
- ScraperAPI, ScrapingBee, ZenRows — managed scraping with proxy + CAPTCHA solving.
- Crawlee (Apify) — open-source, modern, batteries included.
Common Mistakes That Get You Blocked
- Scraping from a single static IP at full speed.
- Leaving the default Python or Node User-Agent string.
- Hitting the same URL pattern in perfect order.
- Ignoring HTTP 429 (rate limit) responses — back off immediately.
- Running headless Chrome with no stealth patches.
Frequently Asked Questions
How many requests per second is safe?
Aim for 1–2 requests per second per IP for most sites. Large sites like Amazon or Google can take more, but new or small sites often rate-limit at 1 request every few seconds.
Should I use a VPN or a proxy for scraping?
A VPN gives you one new IP — fine for testing, useless at scale. Proxy pools give you hundreds or thousands of rotating IPs, which is what you actually need to scrape without getting blocked.
Does using a headless browser slow scraping down?
Yes — by 5–10x compared to plain HTTP requests. Only use a headless browser for pages that genuinely require JavaScript rendering. For everything else, plain requests with good headers are faster and cheaper.
Frequently Asked Questions
Why does my scraper get blocked after a few requests?+
You are almost certainly hitting the site too fast from a single IP with a default Python or Node User-Agent. Add a 1–2 second delay between requests, set a real browser User-Agent, and rotate IPs through a proxy pool.
Do I need paid proxies to scrape without getting blocked?+
For small jobs, free public proxies work for hours. For anything serious — daily scraping, login-gated data, or e-commerce sites — residential proxies from providers like Bright Data or Smartproxy are far more reliable.
Is it legal to scrape a website that blocks scrapers?+
Scraping publicly accessible, non-personal data is generally legal in the US (hiQ v. LinkedIn). However, bypassing technical protections may violate the CFAA, and breaching Terms of Service can expose you to a civil claim.
Can headless browsers be detected?+
Yes. Default Puppeteer and Selenium leave fingerprints (navigator.webdriver, missing plugins). Use playwright-stealth or puppeteer-extra-plugin-stealth to patch the most common tells.
Related articles
Try the related free tools
Hands-on utilities from DigiMetrics Hub that go with this guide.
HTTP Headers Checker
Inspect HTTP response headers for any URL. Free online header checker for developers.
Open tool Developer ToolsJSON Formatter
Format, validate and beautify JSON data instantly. Free online JSON formatter, no signup needed.
Open tool Developer ToolsBase64 Encoder
Encode or decode Base64 strings instantly online. Free Base64 converter, no signup required.
Open tool