Skip to content

Streaming stops visually in chat UI while generation continues in background (OpenAI-compatible) #37

@hardWorker254

Description

@hardWorker254

When using OpenAI-compatible API endpoint (llama.cpp, ik_llama.cpp, llama-swap), streaming responses to a chat, the visible text output in the Cortex chat randomly freezes during generation. However, the underlying model continues generating tokens, and the full response is eventually received once the generation completes — it just isn’t displayed incrementally during the stall.

This creates a misleading user experience, as it appears the model has stopped responding, while in fact it’s still working.

  • VS Code Version: latest
  • OS Version: arch linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions