Using the Perplexity-Like Langflow Search Flows

guide2026-05-318 min read

Use the working Perplexity-like LangFlow search MVP with citation checks, provider filters, and a clear path to stronger ranking.

langflowperplexitysearchcitations

Summary

This note documents the current working versions of the Perplexity-like Langflow flows, what each one does, and how to test them. The main upgrade is that the system has moved from simple web-connected answering to a more Perplexity-like pipeline with search planning, source filtering, webpage extraction, reranking, source chunks, and citation-aware answers.

What This Solves

The goal is to build a self-hosted Perplexity-like answer engine in Langflow. The system should search the web through SearXNG, read selected webpages, clean the content, build a source map, pass evidence into a separate LLM node, and return answers with citations. The latest working version is:

Perplexity Sonar - Planner Reranker Citations MVP
https://langflow.example.com/flow/$FLOW_ID

Its run endpoint is:

https://langflow.example.com/api/v1/run/$FLOW_ID

This is not yet a full Perplexity-grade search engine. It is a working Perplexity-like answer engine: it can plan searches, use SearXNG, read pages, build evidence chunks, and produce chunk-level citations.

Who This Is For

This guide is for future me or a teammate who wants to test, extend, or wrap these Langflow flows as a Perplexity-like Search API, Sonar API, or Agent API. Starting assumptions:

The remote Langflow server is available at https://langflow.example.com.
The SearXNG instance is available at https://search.example.com.
The Langflow API key is loaded from ~/.zshrc as LANGFLOW_API_KEY.
LLM calls should remain visible as separate Langflow nodes, not hidden inside crawler or planner components.

Prerequisites

Remote Langflow access at https://langflow.example.com.
API key in the shell:

source ~/.zshrc >/dev/null 2>&1
echo "$LANGFLOW_API_KEY"

SearXNG JSON search endpoint available through:

https://search.example.com/search?q=QUERY&format=json

The current main flow:

$FLOW_ID

The Workflow

Start with the latest Sonar-style flow

Use Perplexity Sonar - Planner Reranker Citations MVP for the main end-to-end test. This flow is the current best version because it includes search planning, source scoring, webpage reading, source chunking, a separate LLM answer node, and a citation formatter.

Send a plain question

In the Langflow Playground, test with:

Compare Firecrawl, Exa, and SearXNG for a Langflow web search agent

The expected result is a JSON object with answer, detected_chunk_citations, detected_urls, and citation_warnings.

Send a JSON request with domain filters

Use JSON when you want to test Perplexity-like filtering behavior:

{
  "messages": [
    {
      "role": "user",
      "content": "Compare Firecrawl, Exa, and SearXNG for a Langflow web search agent"
    }
  ],
  "search_domain_filter": [
    "docs.firecrawl.dev",
    "docs.exa.ai",
    "docs.searxng.org",
    "github.com"
  ],
  "max_results": 6
}

The expected behavior is that the flow prefers those official or developer-oriented domains and admits when there is not enough sourced evidence.

Check the citation fields

A healthy response should look like this shape:

{
  "answer": "...",
  "detected_chunk_citations": ["1.1", "2.3"],
  "detected_urls": ["https://..."],
  "citation_warnings": []
}

detected_chunk_citations proves the LLM used source chunks such as [1.1] instead of only appending raw URLs.

Run the API test from a terminal

Use the Langflow run API:

source ~/.zshrc >/dev/null 2>&1

curl --compressed -sS \
  -H "x-api-key: $LANGFLOW_API_KEY" \
  -H "content-type: application/json" \
  -X POST \
  --data '{
    "input_value": "Compare Firecrawl, Exa, and SearXNG for a Langflow web search agent",
    "input_type": "chat",
    "output_type": "chat"
  }' \
  "https://langflow.example.com/api/v1/run/$FLOW_ID"

Current Flow Inventory

Main recommended flow

Perplexity Sonar - Planner Reranker Citations MVP
https://langflow.example.com/flow/$FLOW_ID

Implements:

Search Planner
SearXNG search through https://search.example.com
Domain filtering
Rule-based source scoring and reranking
URL opening and webpage content reading
Source map and chunk citation IDs like [1.1]
Separate LLM answer node
Citation response formatter

This is the current main testing target.

Search-only API flow

Perplexity Search API - MVP
https://langflow.example.com/flow/$FLOW_ID

Implements:

Single-query search
Multi-query search
max_results
Basic domain allow or deny filtering
Structured search results

Use this when testing the search facade separately from answering.

Search with content flow

Perplexity Search API - With Content
https://langflow.example.com/flow/$FLOW_ID

Implements:

Search
URL opening
Webpage content extraction
Structured search results with page content

Use this when testing whether search results can be read as content.

Tool-style flow

Web Search Tools - MVP
https://langflow.example.com/flow/$FLOW_ID

Implements:

web_search
open_url
Tool-shaped outputs for future Agent use

Use this when preparing an Agent that calls search and open-url tools.

Basic Sonar flow

Local Sonar Chat - MVP v2
https://langflow.example.com/flow/$FLOW_ID

Implements:

Search
Crawl
Grounded answer
Separate LLM node

Use this as the simpler fallback when the planner/reranker/citation version is too complex for a quick sanity check.

Agent schema flow

Agent Runs - MVP v2
https://langflow.example.com/flow/$FLOW_ID

Implements:

Agent run style schema
Basic trace output
Separate LLM node

This is not yet a true autonomous planner loop. It is a schema and workflow foundation.

Visual crawler flow

Onyx Web Crawler - Visual Pipeline V3
https://langflow.example.com/flow/$FLOW_ID

Implements:

URL safety
HTTP fetch
HTML cleaning
PDF extraction
Cloudflare detection
Browser fallback branch
WebContent builder
Content truncation
LLM context builder

Use this when debugging crawler behavior separately from search and answering.

What We Have Replicated From Perplexity

The current system has replicated the main Perplexity-like product workflow:

question
-> search planning
-> web search
-> source filtering
-> source scoring
-> webpage reading
-> evidence chunking
-> grounded LLM answer
-> citation extraction

Current replication level:

Feature	Perplexity	Our current version	Replication
Real-time search	Own large-scale web index with ranked results	SearXNG metasearch, tested and working	60%
Multi-query	Native support for multiple queries	Search Planner can automatically split queries	70%
Domain filter	Official allowlist and denylist support, up to 20 domains	Basic domain filter is working	70%
Language / region filter	Official language and country/region support	Provider capability mapping is not complete yet	25%
Date / recency filter	Official time and freshness controls	Mostly designed, not fully wired through	25%
Content extraction	Search can control `max_tokens_per_page`	Crawler, `open_url`, and HTML cleaning are working	70%
Sonar QA	Search + read + generate + citations	Latest Sonar flow is implemented	65%
Citations	Official `citations` and `search_results` outputs	Source/chunk citations exist, for example `[1.1]`	60%
Claim-level citation validation	Stronger internal quality, though not fully transparent publicly	Currently detects citations but does not verify whether a claim is supported	30%
Ranking / reranking	Own ranking, freshness, and authority signals	Rule reranker only; no embedding, Jina, or LLM rerank yet	45%
Agent API	Tools, models, search controls, and streaming	Agent Runs MVP exists, but it is not a real autonomous loop yet	40%
Streaming	Official support	Not implemented yet	0%-10%
OpenAI-compatible facade	Official Chat Completions compatibility	Langflow run API works, but there is no `/v1/*` facade yet	35%

The practical milestone is that the system has moved from “can answer with web access” to “can plan a search, choose sources, read them, and answer with source chunks.”

Example Prompts To Test

Test the full main pipeline

Compare Firecrawl, Exa, and SearXNG for a Langflow web search agent.

Checks:

Search planning
Search result collection
Page extraction
Source chunking
Chunk citations

Test official-source filtering

{
  "messages": [
    {
      "role": "user",
      "content": "What are the main differences between LangGraph and Langflow for building search agents?"
    }
  ],
  "search_domain_filter": [
    "docs.langchain.com",
    "docs.langflow.org",
    "github.com"
  ],
  "max_results": 8
}

Checks:

Domain filtering
Official documentation preference
Evidence-limited answer behavior

Test chunk citations

What does Firecrawl provide for web scraping and LLM-ready content extraction? Answer with citations.

Checks:

Whether [1.1], [2.1], or similar chunk IDs appear
Whether citation_warnings stays empty

Test evidence humility

{
  "messages": [
    {
      "role": "user",
      "content": "Compare Exa and SearXNG specifically for Langflow integration. Only use official documentation."
    }
  ],
  "search_domain_filter": [
    "docs.exa.ai",
    "docs.searxng.org",
    "docs.langflow.org"
  ],
  "max_results": 5
}

Checks:

Whether the model admits insufficient evidence
Whether it avoids inventing unsupported integration details

Test open-source research

Research open-source alternatives to Perplexity AI. Compare Perplexica, Scira, Fireplexity, and Open Deep Research. Include citations.

Checks:

Multi-source research behavior
Table-style answer quality
Citation coverage

Test deny filtering

{
  "messages": [
    {
      "role": "user",
      "content": "Find reliable documentation for building web search agents with Langflow."
    }
  ],
  "search_domain_filter": [
    "-reddit.com",
    "-medium.com",
    "-pinterest.com"
  ],
  "max_results": 8
}

Checks:

Domain deny filtering
Whether low-quality sources are avoided

Test Chinese answer behavior

帮我比较 Firecrawl、Exa、SearXNG 哪个更适合做一个自托管的 Perplexity-like 搜索问答系统，并给出引用。

Checks:

Chinese answer quality
English source citation handling
Chunk citation behavior

Common Failure Modes

If detected_chunk_citations is empty, the LLM probably did not follow the citation format. Strengthen the prompt or run a second formatter/validator pass.

If the answer cites only one source after a restrictive search_domain_filter, the filter may be too narrow or SearXNG may not return enough matching pages. This is expected behavior; the right response is to admit insufficient evidence.

If a page cannot be read, the crawler should return a failure reason instead of feeding Cloudflare challenge text or empty content into the LLM.

The current reranker is rule-based. It can boost docs and GitHub sources, but it is not yet a semantic reranker like Jina, an embedding reranker, or a strong LLM listwise reranker.

Final Checklist

Main flow opens at https://langflow.example.com/flow/$FLOW_ID.
A plain prompt returns HTTP 200 from the Langflow run API.
Output JSON contains answer.
Output JSON contains non-empty detected_chunk_citations.
Output JSON contains source URLs in detected_urls.
citation_warnings is empty or explains exactly what is missing.
Domain-filtered requests return narrower evidence or admit insufficient evidence.
LLM call remains a separate Langflow node.

What To Remember

The crawler is no longer the main bottleneck. The next quality jump comes from stronger search quality and evidence quality:

Add a real reranker or embedding-based ranking layer.
Add claim-level citation validation.
Add provider capability mapping for language, region, date, and freshness filters.
Add a real Agent loop with scratchpad, stop condition, and tool trace.
Add a FastAPI facade for /v1/search, /v1/chat/completions, and /v1/agent/runs.

The current system is best described as:

Perplexity-like answer engine: working MVP
Perplexity-grade search engine: not yet

Metadata

Quick Reference

Typeguide

Statuspublished

Date2026-05-31

Retrieval Tags

langflowperplexitysearchcitations

Build a Perplexity-Like Web Search Flow in Langflow

​Summary

​What This Solves

​Who This Is For

​Prerequisites

​The Workflow

​Current Flow Inventory

​Main recommended flow

​Search-only API flow

​Search with content flow

​Tool-style flow

​Basic Sonar flow

​Agent schema flow

​Visual crawler flow

​What We Have Replicated From Perplexity

​Example Prompts To Test

​Test the full main pipeline

​Test official-source filtering

​Test chunk citations

​Test evidence humility

​Test open-source research

​Test deny filtering

​Test Chinese answer behavior

​Common Failure Modes

​Final Checklist

​What To Remember

​Metadata

Quick Reference

Retrieval Tags

Summary

What This Solves

Who This Is For

Prerequisites

The Workflow

Current Flow Inventory

Main recommended flow

Search-only API flow

Search with content flow

Tool-style flow

Basic Sonar flow

Agent schema flow

Visual crawler flow

What We Have Replicated From Perplexity

Example Prompts To Test

Test the full main pipeline

Test official-source filtering

Test chunk citations

Test evidence humility

Test open-source research

Test deny filtering

Test Chinese answer behavior

Common Failure Modes

Final Checklist

What To Remember

Metadata