Python feapder 与 ScrapySplash 隧道代理

feapder：轻量爬虫框架

feapder 是一款轻量 Python 爬虫框架，通过 download_midware 方法配置代理。

import os, feapder
 
class DemoSpider(feapder.AirSpider):
    def start_requests(self):
        yield feapder.Request("https://httpbin.org/ip")
 
    def download_midware(self, request):
        host = os.getenv("PROXY_HOST", "t.16yun.cn")
        port = os.getenv("PROXY_PORT", "31111")
        user = os.getenv("PROXY_USERNAME", "user")
        pwd = os.getenv("PROXY_PASSWORD", "password")
        request.proxies = {
            "http": f"http://{user}:{pwd}@{host}:{port}",
            "https": f"http://{user}:{pwd}@{host}:{port}",
        }
        return request
 
    def parse(self, request, response):
        print(response.text)
 
if __name__ == "__main__":
    DemoSpider().start()

场景	feapder 实现
A：强制切换	feapder 默认每次请求新建连接
B：保持 IP	复用 `AirSpider` 实例（默认）
C：Proxy-Tunnel	`request.headers` 添加 `Proxy-Tunnel`

ScrapySplash：JS 渲染采集

ScrapySplash 通过 Splash 渲染 JS 后返回 HTML。代理配置在 Splash 端：

# settings.py
SPLASH_URL = "http://splash:8050"
 
SPLASH_PROXY = {
    "host": os.getenv("PROXY_HOST", "t.16yun.cn"),
    "port": int(os.getenv("PROXY_PORT", "31111")),
    "username": os.getenv("PROXY_USERNAME", "user"),
    "password": os.getenv("PROXY_PASSWORD", "password"),
}
 
DOWNLOADER_MIDDLEWARES = {
    "scrapy_splash.SplashDeduplicateArgsMiddleware": 100,
}
 
DUPEFILTER_CLASS = "scrapy_splash.SplashAwareDupeFilter"

Splash 本身作为中间代理层，接收浏览器渲染请求，转发给爬虫代理：

Scrapy → Splash (渲染 JS) → 亿牛云爬虫代理 → 目标站

适用场景对比

框架	适用场景	代理配置方式
feapder	轻量、快速开发	`download_midware` 注入
ScrapySplash	需要 JS 渲染的页面	Splash 端配置上游代理

Python feapder 与 ScrapySplash 隧道代理：爬虫框架扩展方案

feapder：轻量爬虫框架

ScrapySplash：JS 渲染采集

适用场景对比

需要企业代理方案？