When extraction accuracy is lower than expected, this workflow walks your AI agent through a structured diagnosis — checking the most impactful configuration settings first before suggesting schema changes. Trigger phrases that start this workflow:Documentation Index
Fetch the complete documentation index at: https://docs.affinda.com/llms.txt
Use this file to discover all available pages before exploring further.
- “Why are my extractions wrong?”
- “Confidence scores are low”
- “Field X keeps coming out wrong”
- “Can we improve accuracy?”
Before starting, your AI client must be connected to the Affinda MCP server. See MCP Connector and Plugin for setup instructions.
Steps
Identify the document type
If you have not named the document type causing the issue, the agent lists your document types and asks you to confirm which one to investigate.
Read the document type configuration
The agent reads the full document type configuration including field list, field types,
transformation_prompt values, and attached validation rules.Check workspace processing settings
The agent reads the workspace settings. Three settings explain the majority of low-confidence cases:
| Setting | Problem scenario | Fix |
|---|---|---|
ocr_mode: skip | Document is an image or scanned PDF | Change to always-partial or always-full |
ocr_mode: always-partial | Document is a scanned image, not a digital PDF | Change to always-full |
enable_document_classification: false | Multiple document types in one workspace; wrong schema applied to uploads | Enable classification |
model_memory_strategy: manual | No confirmed examples marked | Switch to auto |
Sample recent extractions
The agent pulls 3–5 recent confirmed documents and reads their per-field confidence scores. The pattern tells you where to focus:
| Pattern | Diagnosis |
|---|---|
| One field consistently low across documents | That field’s transformation_prompt needs refinement, or the field type is wrong (e.g. text where date is expected) |
| All fields low on specific documents | Those documents likely have an OCR-mode mismatch — check whether they are scans vs digital |
| All fields low across all documents | Upstream issue — check OCR mode first, then classification, then field schema |
Check manual annotation patterns
The agent calls
list_recent_field_annotations to see which fields users have been correcting most frequently. Repeated manual corrections on the same field are a strong signal that the field’s prompt or type is wrong.Review recommended changes
Based on the evidence gathered, the agent presents at most three recommendations in order of expected impact:
- OCR mode change — largest single lever; takes effect on the next upload.
- Field-level fix — refine the
transformation_prompton the problematic field, or change its type if the current type is wrong. - Validation rule — add a rule to flag suspect extractions for human review rather than relying on raw confidence.
vendorAddress”) before proposing changes.What this workflow does not fix
Genuinely ambiguous or unreadable documents
Genuinely ambiguous or unreadable documents
Some documents are legitimately difficult — low-quality scans, handwritten notes, inconsistent layouts. Confidence will be low on these regardless of configuration. The workflow flags this case rather than suggesting spurious fixes.
Sparse training signal on a new document type
Sparse training signal on a new document type
A brand-new document type with fewer than ten confirmed documents will improve substantially over the next ten confirmations regardless of configuration tuning. If you have only just created the document type, the best action is to confirm a batch of documents and let model memory do its work.
OCR provider issues
OCR provider issues
If
always-full OCR is producing unusable or garbled text, the problem is upstream of the schema. The agent will surface this and suggest contacting support@affinda.com rather than continuing to chase field-level fixes.Related
- Human review queue — if users are manually correcting many fields, run the review queue workflow to catch patterns early.
- Configuration guide: Confidence — how confidence scores are calculated and how to interpret them.
- Configuration guide: OCR — OCR mode options explained.