Dual-Stream Architecture
To reconcile messy, unstructured input with safe, data-dense visualization, Corvus employs a Dual-Stream Ingestion Engine. This design separates narrative reasoning (which allows for some "fuzziness" and interpretation) from critical quantitative metrics (which must be exact and hallucination-free).
The Resilience Cascade
We implement a Resilience Cascade for document ingestion that prioritizes:
- Layout-aware parsing for clinical notes to preserve spatial context.
- Scholarly structure extraction for academic PDFs to capture section hierarchies.
- OCR Fallback only for flattened assets.
This preserves semantic boundaries (tables, headers) that are typically lost in standard text extraction.
Stream A: Narrative Synthesis
Goal: Transform unstructured narrative text into coherent summaries and plan suggestions.
- Mechanism: Uses structured clinical templates to produce consistent drafts.
- RAG Integration: When a query needs evidence, the system uses hybrid retrieval (keyword + semantic) to find high-quality sources and return citations.
- Agents: Driven by a clinical reasoning role and a research role.
Stream B: Structured Clinical Signals
Goal: Extract high-yield quantitative markers (labs, vitals) with 0% hallucination rate.
- Mechanism: Deterministic extractors and schema-constrained pipelines.
- Type Safety: Enforces strict schema compliance so only validated key-value pairs are accepted (e.g., numeric labs/vitals), rejecting malformed outputs.
- Outputs:
- Normalized labs/vitals.
- Sentinel flags (e.g., rising lactate).
- Checklist-ready plan items.
- Conflict flags (contradictions between narrative and labs).
Visualizing the Dual Streams
This dual approach allows Corvus to provide the "best of both worlds": the flexible reasoning of LLMs for text, and the rigid safety of traditional software for numbers.