Python Scrapy Tunnel Proxy: Four IP Control Scenarios
Scrapy spider integration with 16Yun Crawler Proxy. Force new IP, Keep-Alive, Proxy-Tunnel HTTP, and Scrapy's HTTPS Proxy-Tunnel limitation.
16Yun Engineering TeamMay 21, 20261 min read
Scrapy + Proxy Middleware
Scrapy integrates with proxy services through Downloader Middleware. However, its connection pooling mechanism limits some scenarios — particularly HTTPS Proxy-Tunnel.
Environment Setup
export PROXY_HOST=t.16yun.cn
export PROXY_PORT=31111
export PROXY_USERNAME=your-username
export PROXY_PASSWORD=your-password
Middleware
# middlewares.py
import os, base64
class TunnelProxyMiddleware:
def process_request(self, request, spider):
host = os.getenv("PROXY_HOST", "t.16yun.cn")
port = os.getenv("PROXY_PORT", "31111")
user = os.getenv("PROXY_USERNAME", "user")
pwd = os.getenv("PROXY_PASSWORD", "password")
tunnel = request.meta.get("proxy_tunnel")
if tunnel:
request.headers["Proxy-Tunnel"] = tunnel
auth = base64.b64encode(f"{user}:{pwd}".encode()).decode()
request.headers["Proxy-Authorization"] = f"Basic {auth}"
request.meta["proxy"] = f"http://{host}:{port}"
Scenario Demo Spider
import scrapy, os, random, json
class ScenarioSpider(scrapy.Spider):
name = "scenario_demo"
custom_settings = {"CONCURRENT_REQUESTS": 1, "DOWNLOAD_DELAY": 1}
def start_requests(self):
target = os.getenv("TARGET_URL", "https://httpbin.org/ip")
tunnel = os.getenv("PROXY_TUNNEL", "")
for i in range(3):
yield scrapy.Request(target, callback=self.parse_result,
meta={"scene": "A - Force New", "n": i+1, "force_new": True},
dont_filter=True)
for i in range(3):
yield scrapy.Request(target, callback=self.parse_result,
meta={"scene": "B - Keep-Alive", "n": i+1}, dont_filter=True)
tv = tunnel or str(random.randint(1, 10000))
for i in range(3):
yield scrapy.Request("http://httpbin.org/ip", callback=self.parse_result,
meta={"scene": "C-HTTP Tunnel", "n": i+1, "proxy_tunnel": tv},
dont_filter=True)
def parse_result(self, response):
d = json.loads(response.text)
self.logger.info("【%s】#%d: IP=%s", response.meta["scene"], response.meta["n"], d.get("origin",""))
Limitations
| Scenario | Scrapy Support | Notes |
|---|---|---|
| A: Force new | ✅ | Middleware controls connection pool |
| B: Keep-Alive | ✅ | Default Scrapy behavior |
| C-HTTP: Proxy-Tunnel | ✅ | Add header in middleware |
| C-HTTPS: Proxy-Tunnel | ❌ Not supported | Twisted HTTP/1.1 connector can't inject CONNECT headers |
For HTTPS Proxy-Tunnel, use
requests(custom HTTPAdapter),httpx(httpx.Proxy(headers=...)), oraiohttp(proxy_headers) instead.
HTTPS Proxy-Tunnel Support by Framework
| Framework | HTTPS Tunnel | Method |
|---|---|---|
| Python requests | ✅ | Custom HTTPAdapter.proxy_headers() |
| Python httpx | ✅ | httpx.Proxy(headers=...) |
| Python aiohttp | ✅ | proxy_headers parameter |
| Python Scrapy | ❌ | Can't inject CONNECT headers |
| Node.js axios | ✅ | https-proxy-agent options |
| Go net/http | ✅ | Transport.ProxyConnectHeader |
Need an enterprise proxy plan?
We can tailor architecture to your target domains, concurrency, and reliability goals.