Per-endpoint proxy strategy, fallbacks, external APIs, and image-cache pipeline — derived from current code, not docs that drift.
| Endpoint | Default Proxy | Fallback / Alternative | Implementation Details |
|---|---|---|---|
Place Search/api/place-search |
Zyte (L3) |
Direct if scrapeUrl fails Auto L1→L3 via autocomplete cache check |
proxyLevel: 3, minProxyLevels: 3 GoogleMapHandler / GoogleMapCidHandler Fallback: Direct fetch with User-Agent header |
Place Autocomplete/api/place-autocomplete |
Auto (L1→L3) | None (retry on failure) |
proxyLevel: 1, maxProxyLevels: 3 GoogleMapAutocompleteHandler Tries Direct first, escalates to Zyte if blocked |
Web Search (SERP)/api/web-search |
Auto (L1→L3) |
Multiple DOM parsers Deep crawl with configurable proxy |
minProxyLevels: 1, maxProxyLevels: 3 GoogleHandler via SerpService Configurable via ?maxProxyLevels= param |
Image Search/api/image-search |
Zyte (L3) |
Wikimedia Unsplash/Pexels |
proxyLevel: 3 (configurable via ?level=) GoogleImageHandler Falls back to Wikimedia Commons, Unsplash, Pexels |
Google Lens/api/google/lens |
Zyte Browser | None (direct Zyte API) |
Zyte API with browserHtml: true GoogleLensHandler (direct API, not scrapeUrl) Uses Zyte actions API for page navigation |
YouTube Transcript/api/youtube/transcript |
ScrapeCreators |
Zyte (L3) + Client Rotation IOS → ANDROID → WEB clients |
Primary: ScrapeCreators API (no proxy) Fallback: youtube-transcriptplus + Zyte Controlled by YOUTUBE_TRANSCRIPT_PROVIDER env (default: scrapecreators) |
YouTube Detail/api/youtube/detail |
Zyte (L3) |
ScrapeCreators if parsing fails Multiple HTML parsers |
proxyLevel: 3 (configurable via ?proxyLevel=) HTML parsing: ld+json, meta tags, ytInitialData, ytInitialPlayerResponse If parsing fails (no title/author): Fallback to ScrapeCreators /youtube/video API |
YouTube Search/api/youtube/search |
Zyte (L3) | Handler auto-detection |
proxyLevel: 3 (configurable via ?proxyLevel=) YoutubeHandler via scrapeUrl Block detection disabled for speed |
Hotels/api/hotels |
Zyte (L3) | Zyte for image enrichment |
proxyLevel: 3, minProxyLevels: 3 HotelService + GoogleHotelSearchHandler Image enrichment via Google Image Search |
Flights/api/flights/calendar |
Zyte (L3) | Travel service fallbacks |
proxyLevel: 3 FlightService via scrapeUrl Google Flights scraping |
Amazon Image/api/amazon/image-search |
Zyte (L3) | Amazon service fallbacks |
proxyLevel: 3 AmazonHandler via HttpServiceWithZyte Product Advertising API integration |
Amazon Search/api/amazon/search |
Zyte (L3) | Configurable via ?proxies= |
proxyLevel: 3 (configurable via ?proxies=) AmazonHandler via HttpServiceWithZyte Supports named proxy chain: Direct,Private,Zyte |
Instagram/api/instagram/search |
ZyteJS | Configurable: Zyte, ZyteJS, Browserless, Residential |
proxy: 'ZyteJS' (configurable via ?proxy=) InstagramHandler via createProxyService Options: Zyte, ZyteJS, Browserless, Residential, Private, Direct |
TikTok Tag Detail/api/tiktok/challenge-detail |
Direct (Free) | None needed |
No proxy required TikTokInternalApiService (free endpoint) Direct API call without any proxy — returns challengeId + stats |
TikTok Hashtag Capture/api/tiktok/posts-capture |
Browserless |
Zyte (browser) Configurable via ?proxy= |
proxy: 'Browserless' (configurable) TikTokService with networkCapture on item_listBrowser navigates /tag/{tag}, scrolls, intercepts XHR responses for POI data |
TikTok Keyword Search NEW/api/tiktok/keyword-search |
Direct (worker IP) | Residential Zyte via ?proxy= |
Direct hit on tiktok.com/api/search/general/full/TikTokInternalApiService.searchByKeyword — no browser, ~1.5s/call Requires R2 auth blob (cookies + msToken/X-Bogus/X-Gnarly/verifyFp) — replays per-session sig across arbitrary keywords. Image covers cached via wsrv.nl → R2 with variant fallback (cover → originCover → dynamicCover). |
TikTok Refresh Auth NEW/api/tiktok/refresh-auth (POST) |
N/A | None — R2 write only |
Admin endpoint — refreshes R2 auth blob Accepts a cURL paste (text/plain) or { cookies, sig } JSONWrites to _config/tiktok-auth.json in R2. Call when /keyword-search starts returning 503 (sig expired). 60s in-worker mem cache.
|
TikTok Internal API/api/tiktok/internal-api |
Residential | Configurable via ?proxy= |
EXPERIMENTAL — hashtag pagination via challenge/item_list Tries direct challenge-detail first, falls back to tag-page fetch via proxy Posts may require valid X-Bogus signatures (less reliable than keyword-search path) |
TikTok Video Detail/api/tiktok/video · /api/tiktok/video-detail |
Auto (L1→L3) |
tikwm.com for video URL stage 0 ZyteJS for HTML scrape |
Multi-stage: tikwm → direct API → proxy chain → full browser Parses __UNIVERSAL_DATA_FOR_REHYDRATION__ (no JS rendering needed)88% cost reduction vs always using browser. Includes itemStruct + POI + transcript |
TikTok Video Proxy/api/tiktok/proxy-video |
Direct | None |
Streams video bytes with proper Referer/Origin Supports Range requests (HTTP 206), inline vs attachment via ?dl=1 Bypasses TikTok/tikwm hotlink protection via dynamic Referer detection |
TikTok Cache Flush/api/tiktok/flush-cache |
N/A | None — R2 delete |
Utility — clears cached results Accepts ?hashtag= (hashtag + posts-capture cache) or ?q= (keyword-search cache)Walks R2 prefix list for tiktok-hashtag_, tiktok-capture_, tiktok-search_ |
Google Shorts (cross-platform) NEW/api/google-shorts/search |
Auto (L1→L3) |
Per-platform site: filtertiktok / instagram / youtube |
Google Short Videos tab (udm=39) crawl ?source=tiktok|instagram|youtube · ?pages=1-5 Test UI has an "All" pill that fans out to all three sources in parallel. Supersedes the removed /api/tiktok/search-google.
|
Page Extract/api/page/extract |
Auto (L1→L3) | Configurable via params |
Default: ['Direct', 'Private', 'Zyte'] Generic handler via scrapeUrl Params: ?proxyLevel=, ?maxProxyLevels=, ?javascriptRendering= |
Crawler Scrape/api/crawler/scrape |
Auto (L1→L3) | Configurable via params |
Default: ['Direct', 'Private', 'Zyte'] Direct scrapeUrl call Params: ?proxyLevel=, ?javascript=, ?cache=, ?retry= |
L0/L1 (Direct): No proxy, direct HTTP request. Fastest but often blocked.
L2 (Private): Basic proxy rotation for light protection bypass.
L3 (Zyte): Zyte Smart Proxy Manager for anti-bot bypass.
L4 (Residential): Residential IPs for maximum stealth.
Browserless/JS: Headless browser rendering for JavaScript-heavy sites.
Auto-Escalate (L1→L3): Tries Direct first, escalates to Private then Zyte when blocked.
API Fallback: If scraping/parsing fails, fallback to external API (e.g., ScrapeCreators).
ScrapeCreators: YouTube transcripts (primary, controlled by YOUTUBE_TRANSCRIPT_PROVIDER)
Wikimedia Commons: Image search fallback
Unsplash/Pexels: Premium image fallback
Zyte API: Proxy service for anti-bot bypass
Browserless: Headless browser as a service
wsrv.nl (a.k.a. images.weserv.nl): Free public image proxy + transcoder for hosts that block our worker IP (TikTok CDN, Google /proxy/, etc.)
tikwm.com: Third-party TikTok video URL extractor (Stage 0 of video-detail multi-stage chain)
Serper: Google web/video search API
TikTok CDN (p16-/p19-common-sign.tiktokcdn-us.com) returns 403 to every server-side IP we tested — Cloudflare worker IPs, our Residential proxy, Zyte's HTTP proxy. wsrv.nl is the only path that can fetch these signed URLs.
Flow per cover/avatar:
1. cacheImage() hashes the original tiktokcdn URL → media/cached-image_<sha256>.jpg
2. If R2 has it → return cached key (sub-100ms).
3. Miss → fetch via wsrv.nl/?url=<url>&w=400&output=webp&q=85 (width-only, preserves 9:16 portrait aspect).
4. Store webp to R2 (~30-50KB per cover).
5. Resolve bare R2 key to full URL at response boundary: https://cache.contextforce.com/... in prod, http://localhost:8787/api/cached-image/... in dev.
Variant fallback chain (keyword-search only): TikTok issues 3 independently-signed cover URLs per post — cover, originCover, dynamicCover. wsrv intermittently 404s URLs routed via Fastly (p19) due to its egress-IP reputation. We try all 3 variants concurrently per post and keep the first wsrv 200. Recovery: ~70% → ~100% cover hit rate.
Refresh cycle: the keyword-search R2 auth blob's msToken rotates server-side (TTL unknown, observed days). When /keyword-search starts returning 503 {authExpired:true}, POST a fresh cURL paste (DevTools → Network → search/general/full → Copy as cURL) to /api/tiktok/refresh-auth.