Issue Playbook: Translating the docs in Proxy Routing and Health Scoring

Focused on proxy-health-score, route-correction, and stable-lane with practical Scrapy routing and metric implementation.

16Yun Engineering TeamMar 9, 20263 min read

Context and Problem Definition

This article targets one concrete production failure: Distorted proxy health scores causing wrong routing. It is not broad guidance; it is an executable Scrapy fix path with clear acceptance and rollback boundaries.

Typical symptom: High-priority jobs are routed to high-latency or high-ban nodes, degrading stable lanes and SLA. The root cause is usually policy-level inconsistency, not just proxy quality.

Issue signals used for this exact problem:

  • scrapy/scrapy#747: Support for socks5 proxy (comments: 54)
  • scrapy/scrapy#7060: Fix flaky test_download_with_proxy_https_timeout() (comments: 25)
  • scrapy-plugins/scrapy-splash#99: Proxy connection is being refused (comments: 15)

External evidence supplements (only when local evidence has gaps):

  • No external evidence supplement (no gap)

Insight Framework

  • Static-threshold routing cannot represent time-varying proxy quality.
  • Scoring model must combine latency, error rate, block signal, and freshness.
  • Health score must be tied to traffic priority to avoid premium-node misuse.

Method Path

  1. Build EWMA health scores with minute-level decay.
  2. Split proxy pool into stable lane and exploration lane.
  3. Allow high-priority traffic only on stable lane.
  4. Run 60-second probe jobs to refresh scores and trigger route correction.

Architecture and Data Flow

Ingress Queue -> Route Selector -> Stable Lane / Explore Lane
                    |                   |
                    v                   v
              Score Engine <----- Probe Scheduler
                    |
                    v
               Correction Planner

Operational constraints:

  • Score refresh interval must stay within 60 seconds.
  • High-priority jobs cannot enter nodes below threshold.
  • Exploration traffic ratio must be hard-limited to protect stable lane.

Configuration Matrix

ConfigRecommended ValueWhyBad Pattern
HEALTH_EWMA_ALPHA0.25balance fresh samples and historyonly use last request
LATENCY_WEIGHT0.35include latency in core scorerank by success rate only
ERROR_WEIGHT0.40penalize failure aggressivelyequal weights for all signals
BAN_WEIGHT0.25capture anti-bot blocksignore 403/429 signals
STABLE_LANE_THRESHOLD78protect high-priority qualitysame threshold for all traffic
EXPLORE_TRAFFIC_RATIO0.12discover new nodes continuouslyno cap on exploration

Key Code Snippets

# routing/health_score.py
def compute_health_score(latency_ms, error_rate, ban_rate, prev_score):
    instant = 100 - (latency_ms * 0.03) - (error_rate * 40) - (ban_rate * 50)
    instant = max(0, min(100, instant))
    alpha = 0.25
    return round(alpha * instant + (1 - alpha) * prev_score, 2)
# routing/selector.py
def select_proxy(candidates, priority: str):
    if priority == "high":
        lane = [p for p in candidates if p.score >= 78]
    else:
        lane = candidates
    lane.sort(key=lambda x: x.score, reverse=True)
    return lane[0] if lane else None
# routing/probe_scheduler.py
async def probe_cycle(pool):
    for proxy in pool:
        metrics = await run_probe(proxy)
        proxy.score = compute_health_score(
            latency_ms=metrics.latency_ms,
            error_rate=metrics.error_rate,
            ban_rate=metrics.ban_rate,
            prev_score=proxy.score,
        )

Failure Cases and Troubleshooting

Failure scenario: After route upgrade, static success-rate ranking promoted short-lived low-latency nodes incorrectly.

Troubleshooting sequence:

  1. Validate score inputs include ban_rate and data freshness.
  2. Ensure high-priority jobs are restricted to stable lane.
  3. Check exploration ratio overflow that may pollute stable lane.
  4. Compare wrong_route_ratio and SLA pass rate before/after correction.

Performance Metrics and Load Testing

Load tests should cover baseline, peak, and anti-bot escalation profiles.

Acceptance thresholds:

  • wrong_route_ratio <= 2%
  • high_priority_success_rate >= 95%
  • latency_p95 <= 1.9s
  • proxy_switch_jitter reduced by >= 30%

Vendor Comparison and 16Yun Positioning

Only issue-relevant capabilities are kept here:

  • API Proxy: 白名单管理, RESTful API, 多计费模型
  • Crawler Tunnel Proxy: 跨IDC架构, 毫秒级检测, IP自动切换
  • Dedicated Proxy: 专属独享IP, 高安全隔离, 低延迟响应

This issue needs observable and correctable proxy orchestration; 16Yun tunnel plus dedicated proxy mix better supports layered routing.

Rollout Checklist

  • single policy entry implemented in middleware
  • key configs split by priority and environment
  • regression load test passed all thresholds
  • rollback can be executed within 10 minutes
  • alerts set for 429/403/latency thresholds
  • change audit log recorded for this rollout

Need an enterprise proxy plan?

We can tailor architecture to your target domains, concurrency, and reliability goals.