Using the Perplexity-Like Langflow Search Flows
2026-05-31 · guide · 8 min read
Summary
This note documents the current working versions of the Perplexity-like Langflow flows, what each one does, and how to test them. The main upgrade is that the system has moved from simple web-connected answering to a more Perplexity-like pipeline with search planning, source filtering, webpage extraction, reranking, source chunks, and citation-aware answers.What This Solves
The goal is to build a self-hosted Perplexity-like answer engine in Langflow. The system should search the web through SearXNG, read selected webpages, clean the content, build a source map, pass evidence into a separate LLM node, and return answers with citations. The latest working version is:This is not yet a full Perplexity-grade search engine. It is a working Perplexity-like answer engine: it can plan searches, use SearXNG, read pages, build evidence chunks, and produce chunk-level citations.
Who This Is For
This guide is for future me or a teammate who wants to test, extend, or wrap these Langflow flows as a Perplexity-like Search API, Sonar API, or Agent API. Starting assumptions:- The remote Langflow server is available at
https://langflow.yitwah.site. - The SearXNG instance is available at
https://search.yitwah.site. - The Langflow API key is loaded from
~/.zshrcasLANGFLOW_API_KEY. - LLM calls should remain visible as separate Langflow nodes, not hidden inside crawler or planner components.
Prerequisites
- Remote Langflow access at
https://langflow.yitwah.site. - API key in the shell:
- SearXNG JSON search endpoint available through:
- The current main flow:
The Workflow
Start with the latest Sonar-style flow
Use
Perplexity Sonar - Planner Reranker Citations MVP for the main end-to-end test. This flow is the current best version because it includes search planning, source scoring, webpage reading, source chunking, a separate LLM answer node, and a citation formatter.Send a plain question
In the Langflow Playground, test with:The expected result is a JSON object with
answer, detected_chunk_citations, detected_urls, and citation_warnings.Send a JSON request with domain filters
Use JSON when you want to test Perplexity-like filtering behavior:The expected behavior is that the flow prefers those official or developer-oriented domains and admits when there is not enough sourced evidence.
Check the citation fields
A healthy response should look like this shape:
detected_chunk_citations proves the LLM used source chunks such as [1.1] instead of only appending raw URLs.Current Flow Inventory
Main recommended flow
- Search Planner
- SearXNG search through
https://search.yitwah.site - Domain filtering
- Rule-based source scoring and reranking
- URL opening and webpage content reading
- Source map and chunk citation IDs like
[1.1] - Separate LLM answer node
- Citation response formatter
Search-only API flow
- Single-query search
- Multi-query search
max_results- Basic domain allow or deny filtering
- Structured search results
Search with content flow
- Search
- URL opening
- Webpage content extraction
- Structured search results with page content
Tool-style flow
web_searchopen_url- Tool-shaped outputs for future Agent use
Basic Sonar flow
- Search
- Crawl
- Grounded answer
- Separate LLM node
Agent schema flow
- Agent run style schema
- Basic trace output
- Separate LLM node
Visual crawler flow
- URL safety
- HTTP fetch
- HTML cleaning
- PDF extraction
- Cloudflare detection
- Browser fallback branch
- WebContent builder
- Content truncation
- LLM context builder
What We Have Replicated From Perplexity
The current system has replicated the main Perplexity-like product workflow:| Feature | Perplexity | Our current version | Replication |
|---|---|---|---|
| Real-time search | Own large-scale web index with ranked results | SearXNG metasearch, tested and working | 60% |
| Multi-query | Native support for multiple queries | Search Planner can automatically split queries | 70% |
| Domain filter | Official allowlist and denylist support, up to 20 domains | Basic domain filter is working | 70% |
| Language / region filter | Official language and country/region support | Provider capability mapping is not complete yet | 25% |
| Date / recency filter | Official time and freshness controls | Mostly designed, not fully wired through | 25% |
| Content extraction | Search can control max_tokens_per_page | Crawler, open_url, and HTML cleaning are working | 70% |
| Sonar QA | Search + read + generate + citations | Latest Sonar flow is implemented | 65% |
| Citations | Official citations and search_results outputs | Source/chunk citations exist, for example [1.1] | 60% |
| Claim-level citation validation | Stronger internal quality, though not fully transparent publicly | Currently detects citations but does not verify whether a claim is supported | 30% |
| Ranking / reranking | Own ranking, freshness, and authority signals | Rule reranker only; no embedding, Jina, or LLM rerank yet | 45% |
| Agent API | Tools, models, search controls, and streaming | Agent Runs MVP exists, but it is not a real autonomous loop yet | 40% |
| Streaming | Official support | Not implemented yet | 0%-10% |
| OpenAI-compatible facade | Official Chat Completions compatibility | Langflow run API works, but there is no /v1/* facade yet | 35% |
Example Prompts To Test
Test the full main pipeline
- Search planning
- Search result collection
- Page extraction
- Source chunking
- Chunk citations
Test official-source filtering
- Domain filtering
- Official documentation preference
- Evidence-limited answer behavior
Test chunk citations
- Whether
[1.1],[2.1], or similar chunk IDs appear - Whether
citation_warningsstays empty
Test evidence humility
- Whether the model admits insufficient evidence
- Whether it avoids inventing unsupported integration details
Test open-source research
- Multi-source research behavior
- Table-style answer quality
- Citation coverage
Test deny filtering
- Domain deny filtering
- Whether low-quality sources are avoided
Test Chinese answer behavior
- Chinese answer quality
- English source citation handling
- Chunk citation behavior
Common Failure Modes
Final Checklist
- Main flow opens at
https://langflow.yitwah.site/flow/6e66f96b-9365-4997-adef-c55e3352a3d2. - A plain prompt returns HTTP
200from the Langflow run API. - Output JSON contains
answer. - Output JSON contains non-empty
detected_chunk_citations. - Output JSON contains source URLs in
detected_urls. -
citation_warningsis empty or explains exactly what is missing. - Domain-filtered requests return narrower evidence or admit insufficient evidence.
- LLM call remains a separate Langflow node.
What To Remember
The crawler is no longer the main bottleneck. The next quality jump comes from stronger search quality and evidence quality:- Add a real reranker or embedding-based ranking layer.
- Add claim-level citation validation.
- Add provider capability mapping for language, region, date, and freshness filters.
- Add a real Agent loop with scratchpad, stop condition, and tool trace.
- Add a FastAPI facade for
/v1/search,/v1/chat/completions, and/v1/agent/runs.
Metadata
Quick Reference
Type: guide
Tags: langflow · perplexity · search · citations
Related: [[Build a Perplexity-Like Web Search Flow in Langflow]]
Tags: langflow · perplexity · search · citations
Related: [[Build a Perplexity-Like Web Search Flow in Langflow]]