Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.affinda.com/llms.txt

Use this file to discover all available pages before exploring further.

When extraction accuracy is lower than expected, this workflow walks your AI agent through a structured diagnosis — checking the most impactful configuration settings first before suggesting schema changes. Trigger phrases that start this workflow:
  • “Why are my extractions wrong?”
  • “Confidence scores are low”
  • “Field X keeps coming out wrong”
  • “Can we improve accuracy?”
Before starting, your AI client must be connected to the Affinda MCP server. See MCP Connector and Plugin for setup instructions.

Steps

1

Identify the document type

If you have not named the document type causing the issue, the agent lists your document types and asks you to confirm which one to investigate.
2

Read the document type configuration

The agent reads the full document type configuration including field list, field types, transformation_prompt values, and attached validation rules.
3

Check workspace processing settings

The agent reads the workspace settings. Three settings explain the majority of low-confidence cases:
SettingProblem scenarioFix
ocr_mode: skipDocument is an image or scanned PDFChange to always-partial or always-full
ocr_mode: always-partialDocument is a scanned image, not a digital PDFChange to always-full
enable_document_classification: falseMultiple document types in one workspace; wrong schema applied to uploadsEnable classification
model_memory_strategy: manualNo confirmed examples markedSwitch to auto
Setting ocr_mode to skip on a workspace that receives any scanned or photographed documents will produce empty extractions, not just low-confidence ones. This is the most common cause of complete extraction failure.
4

Sample recent extractions

The agent pulls 3–5 recent confirmed documents and reads their per-field confidence scores. The pattern tells you where to focus:
PatternDiagnosis
One field consistently low across documentsThat field’s transformation_prompt needs refinement, or the field type is wrong (e.g. text where date is expected)
All fields low on specific documentsThose documents likely have an OCR-mode mismatch — check whether they are scans vs digital
All fields low across all documentsUpstream issue — check OCR mode first, then classification, then field schema
5

Check manual annotation patterns

The agent calls list_recent_field_annotations to see which fields users have been correcting most frequently. Repeated manual corrections on the same field are a strong signal that the field’s prompt or type is wrong.
6

Review recommended changes

Based on the evidence gathered, the agent presents at most three recommendations in order of expected impact:
  1. OCR mode change — largest single lever; takes effect on the next upload.
  2. Field-level fix — refine the transformation_prompt on the problematic field, or change its type if the current type is wrong.
  3. Validation rule — add a rule to flag suspect extractions for human review rather than relying on raw confidence.
The agent shows you the supporting evidence (for example: “8 of 10 sampled documents had low confidence on vendorAddress) before proposing changes.
7

Approve changes

The agent will not apply any changes — update_workspace, update_field, or create_validation_rule — until you explicitly approve each one. Review the recommendations and confirm which you want applied.

What this workflow does not fix

Some documents are legitimately difficult — low-quality scans, handwritten notes, inconsistent layouts. Confidence will be low on these regardless of configuration. The workflow flags this case rather than suggesting spurious fixes.
A brand-new document type with fewer than ten confirmed documents will improve substantially over the next ten confirmations regardless of configuration tuning. If you have only just created the document type, the best action is to confirm a batch of documents and let model memory do its work.
If always-full OCR is producing unusable or garbled text, the problem is upstream of the schema. The agent will surface this and suggest contacting support@affinda.com rather than continuing to chase field-level fixes.