-
Notifications
You must be signed in to change notification settings - Fork 271
Description
Bug Description
The perplexity_research tool (which uses sonar-deep-research model) fails with TypeError: fetch failed for any non-trivial query that triggers actual multi-step research.
Root Cause
performChatCompletion() in server.js sends requests without stream: true. For fast models (sonar, sonar-reasoning-pro), this works fine because they respond within seconds. However, sonar-deep-research performs multi-step web research that can take minutes. Without streaming:
- The API receives the request and begins deep research
- The HTTP connection sits idle while the model works
- Cloudflare drops the connection at exactly 60 seconds of inactivity
- The client receives
SocketError: UND_ERR_SOCKETwithbytesRead: 0
Evidence
Without streaming (fails):
$ curl -s -o /dev/null -w "HTTP:%{http_code} TIME:%{time_total}s SIZE:%{size_download}" \
--max-time 120 \
-X POST "https://api.perplexity.ai/chat/completions" \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"sonar-deep-research","messages":[{"role":"user","content":"Compare SQLite-vec and ChromaDB briefly"}]}'
HTTP:000 TIME:60.092740s SIZE:0 (exit code 56 - connection reset)
With streaming (works):
$ curl -s --max-time 300 \
-X POST "https://api.perplexity.ai/chat/completions" \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"sonar-deep-research","messages":[{"role":"user","content":"Compare SQLite-vec and ChromaDB briefly"}],"stream":true}'
# Returns 1.1MB response with 29 search queries, 350k reasoning tokens
# Keep-alive pings (": ping - ...") prevent Cloudflare timeout
Simple queries (no research needed) succeed without streaming because the model returns in ~2 seconds, before the 60-second Cloudflare timeout:
$ curl ... -d '{"model":"sonar-deep-research","messages":[{"role":"user","content":"What is 2+2?"}]}'
# HTTP:200 TIME:2.3s — no actual research triggered
Node.js native fetch also confirms the issue:
// Complex query → SocketError: UND_ERR_SOCKET, bytesRead: 0
// Simple query → Success (responds in <3s)Suggested Fix
In performChatCompletion(), when the model is sonar-deep-research, add stream: true to the request body and consume the SSE chunks, collecting the final response before returning. The SSE keep-alive pings (: ping) sent by the Perplexity API will prevent Cloudflare from dropping the connection.
Something like:
const body = {
model: model,
messages: messages,
...(model === "sonar-deep-research" && { stream: true }),
};Then handle the SSE response by buffering chunks and assembling the final message.
Environment
- MCP Server:
@perplexity-ai/mcp-serverv0.6.2 - Node.js: v24.13.0
- OS: Linux (WSL2)
- Other tools work fine:
perplexity_search,perplexity_ask,perplexity_reasonall succeed consistently