When using OpenAI-compatible API endpoint (llama.cpp, ik_llama.cpp, llama-swap), streaming responses to a chat, the visible text output in the Cortex chat randomly freezes during generation. However, the underlying model continues generating tokens, and the full response is eventually received once the generation completes — it just isn’t displayed incrementally during the stall.
This creates a misleading user experience, as it appears the model has stopped responding, while in fact it’s still working.
- VS Code Version: latest
- OS Version: arch linux