Defer Streaming and Keep Langflow Focused on Evidence Flows

2026-05-31 · decision · 6 min read

Summary

Streaming is useful, but it is not the next priority. The current priority is to make the Langflow flows strong: search planning, SearXNG retrieval, source filtering, crawler extraction, reranking, evidence chunks, and citation-aware answering. The streaming work should be paused and recorded as a later architectural layer. When we return to it, the best path is a lightweight FastAPI gateway around Langflow, not forcing token streaming into every Langflow flow.

Decision

Pause streaming implementation for now. Keep the active work focused on building and testing Langflow flows:

Search API-style flow
Search with content flow
Web search/open URL tool flow
Sonar-style grounded QA flow
Agent run schema flow
Visual Onyx-style crawler flow
Planner/reranker/citation Sonar flow

When streaming becomes the priority again, implement it outside Langflow as a reusable gateway:

Client / CLI / frontend / agent
-> FastAPI Streaming Gateway
-> Langflow Evidence Builder Flow
-> OpenAI-compatible LLM streaming
-> SSE events back to client

Langflow should remain the visual workflow engine for evidence construction. The gateway should own streaming protocol concerns.

Alternatives Considered

Put streaming directly inside Langflow

This would mean trying to make each Langflow flow stream progress or token deltas through the normal Langflow run API. Why this is not the preferred path:

Langflow is strongest as a visual orchestration layer.
The current /api/v1/run/{flow_id} pattern is better for complete run results.
Streaming needs protocol-level control: text/event-stream, keep-alive, cancellation, event ordering, and structured error events.
Forcing every flow to define its own streaming behavior would make the system harder to reuse.

Build a new full frontend and backend framework immediately

This would mean building a Perplexity-like UI, backend, stream protocol, and flow gateway all at once. Why this is not the preferred path:

The immediate bottleneck is not UI.
The core product quality depends more on Langflow search/crawler/reranker/citation flows.
A frontend should come after the API behavior stabilizes.

Add a small FastAPI gateway later

This is the chosen future direction. The gateway can expose:

POST /v1/search
POST /v1/search/stream
POST /v1/chat/completions
POST /v1/agent/runs

It can support:

OpenAI-compatible request shapes
Perplexity-like response shapes
SSE streaming
tool trace events
search progress events
answer token deltas
citation events
error and cancellation handling

Rationale

The current Langflow work has already moved from simple web-connected answering to a stronger Perplexity-like skeleton:

question
-> search planning
-> SearXNG retrieval
-> domain filtering
-> source scoring
-> URL opening / crawler extraction
-> evidence chunking
-> separate LLM node
-> citation formatter

That is the right foundation to strengthen first. The latest main flow is:

Perplexity Sonar - Planner Reranker Citations MVP
https://langflow.example.com/flow/$FLOW_ID

This flow already validates the most important architecture rule:

LLM calls are visible as their own Langflow nodes.
Custom components prepare evidence, but they do not hide the LLM.

Streaming should respect that same boundary:

Langflow prepares evidence.
The gateway streams the answer.

This separation keeps Langflow readable and easy to modify while still making it possible to offer a Perplexity/OpenAI-style API later.

Trade-offs & Risks

Pausing streaming means the current system still returns complete responses rather than progressive token deltas. Long search-and-answer runs may feel slower because users wait until the full answer is ready.

If streaming is implemented too late, frontend and API consumers may start depending on the raw Langflow run API. That would make a later /v1/* facade migration slightly more annoying.

If streaming is implemented inside each flow instead of in a shared gateway, every future workflow will need a custom streaming format. That would fragment clients and make Agent traces harder to standardize.

The accepted trade-off is:

Short-term: focus on stronger Langflow flows and tolerate non-streaming output.
Long-term: add a shared gateway that streams many workflows consistently.

Revisit Trigger

Return to streaming when one of these becomes true:

The main Langflow flows are stable enough that API ergonomics becomes the bottleneck.
A frontend needs Perplexity-like progressive output.
Agent runs need live tool traces.
Long crawler or PDF workflows need progress events.
External clients need OpenAI-compatible stream: true.

When revisiting, start with this minimum plan:

Create a Langflow Evidence Builder flow that outputs grounded_prompt, sources, and chunks.
Build a small FastAPI gateway.
Implement /health.
Implement /v1/chat/completions stream=false as a proxy to the current Sonar flow.
Implement /v1/chat/completions stream=true using SSE.
Call the Evidence Builder flow first.
Stream LLM tokens from an OpenAI-compatible model endpoint.
Emit citation and done events at the end.

The first event schema should be:

run_started
evidence_started
evidence_ready
source_selected
llm_started
content_delta
answer_done
citation
error
done

Keep the first implementation backend-only. A frontend can come later.

Metadata

Quick Reference

Typedecision

Statuspublished

Date2026-05-31

Retrieval Tags

langflowstreaminggatewayarchitecture

Using the Perplexity-Like Langflow Search Flows

​Defer Streaming and Keep Langflow Focused on Evidence Flows

​Summary

​Decision

​Alternatives Considered

​Put streaming directly inside Langflow

​Build a new full frontend and backend framework immediately

​Add a small FastAPI gateway later

​Rationale

​Trade-offs & Risks

​Revisit Trigger

​Metadata

Quick Reference

Retrieval Tags

Defer Streaming and Keep Langflow Focused on Evidence Flows

Summary

Decision

Alternatives Considered

Put streaming directly inside Langflow

Build a new full frontend and backend framework immediately

Add a small FastAPI gateway later

Rationale

Trade-offs & Risks

Revisit Trigger

Metadata