Domain-Specific Data Enrichment Benchmark

Comparing Kirha's domain-specific knowledge against standard web search.

Kirha Score

87/100

Web Search Score

61/100

Across all 100 tests, Kirha injected 233,920 tokens into LLM context, compared to 4,604,853 for Web Search.

Kirha uses 95% less tokens.

Methodology

Datasets

This benchmark compares Kirha against standard web search on domain-specific queries where Kirha has specialized knowledge integrations: Company Data, Insurance, and Crypto/Blockchain.

Kirha uses web search as a fallback when it doesn't have domain-specific knowledge. This benchmark tests domains where Kirha's specialized integrations provide structured, accurate, and actionable data compared to generic web results.

Evaluation Process

Each query is executed in parallel against both Kirha and a standard web search. The raw results are then processed through a summarization step using Gemini 2.5 Flash to extract the most relevant information and normalize the output format.

The summarized responses are evaluated using an LLM-as-Judge approach with Gemini 2.5 Flash and extended thinking enabled. The judge scores each response on 5 criteria (0-100) and determines a winner based on the total score.

A common best practice with LLM-as-a-Judge is to cross-reference scores against human evaluation and aim for a high correlation.
For this v1 we took a lighter approach: we asked Claude to review all results alongside their judge scores and flag inconsistencies.
Read the full report.

Evaluation Criteria

Relevance — How well does the response address the query?
Accuracy — Is the information correct and verifiable?
Completeness — Does it cover all aspects of the request?
Freshness — Is the data current and up-to-date?
Actionability — Can the user act on this information directly?

Score Comparison

Kirha
Web Search

By Metric

Performance Profile

Test Results (100)

Tap on a row to see detailed results and raw outputs