Python feapder & ScrapySplash Tunnel Proxy: Framework Extensions

feapder AirSpider and Scrapy+Splash integrating 16Yun Crawler Proxy.

16Yun Engineering TeamMay 11, 20261 min read

feapder:轻量爬虫框架

feapder 是一款轻量 Python 爬虫框架,通过 download_midware 方法配置代理。

import os, feapder

class DemoSpider(feapder.AirSpider):
    def start_requests(self):
        yield feapder.Request("https://httpbin.org/ip")

    def download_midware(self, request):
        host = os.getenv("PROXY_HOST", "t.16yun.cn")
        port = os.getenv("PROXY_PORT", "31111")
        user = os.getenv("PROXY_USERNAME", "user")
        pwd = os.getenv("PROXY_PASSWORD", "password")
        request.proxies = {
            "http": f"http://{user}:{pwd}@{host}:{port}",
            "https": f"http://{user}:{pwd}@{host}:{port}",
        }
        return request

    def parse(self, request, response):
        print(response.text)

if __name__ == "__main__":
    DemoSpider().start()
场景feapder 实现
A:强制切换feapder 默认每次请求新建连接
B:保持 IP复用 AirSpider 实例(默认)
C:Proxy-Tunnelrequest.headers 添加 Proxy-Tunnel

ScrapySplash:JS 渲染采集

ScrapySplash 通过 Splash 渲染 JS 后返回 HTML。代理配置在 Splash 端:

# settings.py
SPLASH_URL = "http://splash:8050"

SPLASH_PROXY = {
    "host": os.getenv("PROXY_HOST", "t.16yun.cn"),
    "port": int(os.getenv("PROXY_PORT", "31111")),
    "username": os.getenv("PROXY_USERNAME", "user"),
    "password": os.getenv("PROXY_PASSWORD", "password"),
}

DOWNLOADER_MIDDLEWARES = {
    "scrapy_splash.SplashDeduplicateArgsMiddleware": 100,
}

DUPEFILTER_CLASS = "scrapy_splash.SplashAwareDupeFilter"

Splash 本身作为中间代理层,接收浏览器渲染请求,转发给爬虫代理:

Scrapy → Splash (渲染 JS) → 亿牛云爬虫代理 → 目标站

适用场景对比

框架适用场景代理配置方式
feapder轻量、快速开发download_midware 注入
ScrapySplash需要 JS 渲染的页面Splash 端配置上游代理

Need an enterprise proxy plan?

We can tailor architecture to your target domains, concurrency, and reliability goals.