Purpose

Achieving “good enough” (e.g.~80-90% accuracy ) is not enough for mission-critical document processing. Affinda bridges the gap to “excellent” (99%+ accuracy) through our approach, reducing manual work and ensuring top-tier performance. This guide provides an understanding of how Affinda’s information extraction models work and walks users through the steps they can take to uplift the performance of their models. The tutorial covers building high‑quality Model Memory with validated documents, configuring validation settings, adding field descriptions, and troubleshooting low‑performing templates to maximize extraction accuracy.
To follow this tutorial, Users should hold Organization Owner or Admin permissions.
If you have not configured your first model yet, follow the Creating a New Model tutorial first.

Under the Hood: How Affinda Extracts Data

Benefits of Our Approach

  • No Extensive Model Training Required – Unlike traditional ML models that require hundreds of training samples, Affinda learns dynamically and applies corrections in real-time. New, high-performing models can be created in a matter of minutes, not weeks.
  • Higher Accuracy, Less Manual Work – Moving from 95% to 99% accuracy reduces errors by 80%, cutting down the need for human intervention significantly.
  • More Intelligent Than Static LLMs – Unlike generic large language models that rely on fixed prompts and lack continuous learning, Affinda actively applies nuanced learning from past interactions to make better decisions.

Steps to Improving Accuracy

Visual Learner? Follow along with our Video Tutorial:
1

Upload example documents and validate data to build Model Memory

Affinda’s models use Model Memory to learn from your documents and improve the accuracy of predictions.Affinda intelligently selects a subset of your confirmed documents to use as model memory. These documents act as trusted references, helping the model make more accurate predictions on new, incoming documents.A well-curated model memory is the foundation of highly accurate, automated document extraction.
Why this matters:
The model will replicate what it learns from model memory. If errors are present, those mistakes will be reproduced in future predictions.

Best Practices for Model Memory

  • Accuracy first: Ensure extractions in model memory are 100% correct before confirming them.
    • Correct errors promptly – Any errors discovered downstream should be corrected in the Affinda app to maintain data integrity.
    • Establish clear validation guidelines – If multiple team members validate documents, create clear annotation standards to prevent inconsistencies, especially in cases where ambiguity exists.
  • Quality over quantity: You don’t need dozens of near-identical examples. Instead, aim for a variety of documents that reflect the range you typically receive.
  • Complete examples: Prioritise documents that include all required fields, rather than ones with missing data.
By following these principles, your model memory will provide a strong foundation for consistent, accurate document processing.Keeping these principles in mind, to build your model memory, upload representative documents.Check the extraction of each field on the document. Correct any errors by redrawing the annotation box over the field on the document. Once the document is completely correct, click ‘Confirm document’.To see how to validate extraction in more detail, see the Validation- Extraction tutorial.
2

Review Model Memory Settings

Model Memory SettingsModel Memory SettingsGo to Workspace Workflow Settings → ‘Configure Validation’Here you can choose between 3 settings for model memory.
  1. Auto (Recommended) - Affinda models intelligently select the best documents in your validated set to use as Model Memory. This keeps model memory to a finite set that can be easily audited.
  2. Manual - No documents from the workspace are automatically added to Model Memory, even after validation.
  3. Always - Uses every confirmed document as a model memory reference.
3

Review OCR settings

If you are finding that the model is not reading some text on your documents, your OCR settings might not be correct.OCR settingsOCR settingsNavigate to Workspace Workflow Settings → “Configure Pre-processing” and check your OCR settings are set to “Partial” or ‘Always Full” to avoid missed text.
4

Add Field Prompts

If a specific field in your documents is consistently producing inaccurate results, you can improve performance by adding a field prompt.A field prompt allows you to give the model additional instructions for extracting that field. This might include:
  • Clarifying the exact information you want.
  • Providing counterexamples (what not to extract).
  • Explaining the preferred location on the document.
Example: Customer Address
Extract this information from where it is EXPLICITLY stated in the invoice document (e.g. “car stored at”), rather than the address of the person the invoice is addressed to. Do NOT confuse with the address of the supplier. It is better to keep this blank than return an incorrect address.
Descriptions are an advanced aid; the core recommendation remains adding more validated examples to Model Memory for continual improvement.

Troubleshooting low-performing documents

1

Check the Model Memory reference

Go to your document, and click the three dots in the top right corner.Click “Model Memory Reference”.Review the fields on the Model Memory reference. Incorrect extraction on this document could be responsible for incorrect predictions. Make corrections as needed.
2

Add template example

If a high‑volume document template isn’t performing as expected, upload a representative example to Model Memory to boost accuracy.To do this, upload your example document, validate the extracted data, and click ‘Confirm’.Reparse existing documents and check that they are referencing the correct similar example document.