Python Scrapy Tunnel Proxy: Four IP Control Scenarios

Scrapy spider integration with 16Yun Crawler Proxy. Force new IP, Keep-Alive, Proxy-Tunnel HTTP, and Scrapy's HTTPS Proxy-Tunnel limitation.

16Yun Engineering TeamMay 21, 20261 min read

Scrapy + Proxy Middleware

Scrapy integrates with proxy services through Downloader Middleware. However, its connection pooling mechanism limits some scenarios — particularly HTTPS Proxy-Tunnel.

Environment Setup

export PROXY_HOST=t.16yun.cn
export PROXY_PORT=31111
export PROXY_USERNAME=your-username
export PROXY_PASSWORD=your-password

Middleware

# middlewares.py
import os, base64

class TunnelProxyMiddleware:
    def process_request(self, request, spider):
        host = os.getenv("PROXY_HOST", "t.16yun.cn")
        port = os.getenv("PROXY_PORT", "31111")
        user = os.getenv("PROXY_USERNAME", "user")
        pwd = os.getenv("PROXY_PASSWORD", "password")

        tunnel = request.meta.get("proxy_tunnel")
        if tunnel:
            request.headers["Proxy-Tunnel"] = tunnel

        auth = base64.b64encode(f"{user}:{pwd}".encode()).decode()
        request.headers["Proxy-Authorization"] = f"Basic {auth}"
        request.meta["proxy"] = f"http://{host}:{port}"

Scenario Demo Spider

import scrapy, os, random, json

class ScenarioSpider(scrapy.Spider):
    name = "scenario_demo"
    custom_settings = {"CONCURRENT_REQUESTS": 1, "DOWNLOAD_DELAY": 1}

    def start_requests(self):
        target = os.getenv("TARGET_URL", "https://httpbin.org/ip")
        tunnel = os.getenv("PROXY_TUNNEL", "")

        for i in range(3):
            yield scrapy.Request(target, callback=self.parse_result,
                meta={"scene": "A - Force New", "n": i+1, "force_new": True},
                dont_filter=True)
        for i in range(3):
            yield scrapy.Request(target, callback=self.parse_result,
                meta={"scene": "B - Keep-Alive", "n": i+1}, dont_filter=True)

        tv = tunnel or str(random.randint(1, 10000))
        for i in range(3):
            yield scrapy.Request("http://httpbin.org/ip", callback=self.parse_result,
                meta={"scene": "C-HTTP Tunnel", "n": i+1, "proxy_tunnel": tv},
                dont_filter=True)

    def parse_result(self, response):
        d = json.loads(response.text)
        self.logger.info("【%s】#%d: IP=%s", response.meta["scene"], response.meta["n"], d.get("origin",""))

Limitations

ScenarioScrapy SupportNotes
A: Force newMiddleware controls connection pool
B: Keep-AliveDefault Scrapy behavior
C-HTTP: Proxy-TunnelAdd header in middleware
C-HTTPS: Proxy-Tunnel❌ Not supportedTwisted HTTP/1.1 connector can't inject CONNECT headers

For HTTPS Proxy-Tunnel, use requests (custom HTTPAdapter), httpx (httpx.Proxy(headers=...)), or aiohttp (proxy_headers) instead.

HTTPS Proxy-Tunnel Support by Framework

FrameworkHTTPS TunnelMethod
Python requestsCustom HTTPAdapter.proxy_headers()
Python httpxhttpx.Proxy(headers=...)
Python aiohttpproxy_headers parameter
Python ScrapyCan't inject CONNECT headers
Node.js axioshttps-proxy-agent options
Go net/httpTransport.ProxyConnectHeader

Need an enterprise proxy plan?

We can tailor architecture to your target domains, concurrency, and reliability goals.