Schema Design & Field Configuration Best Practices

Overview

When setting up document extraction in Affinda, one of the most important decisions you’ll make is how to configure your schema.

Your schema is the set of fields the system extracts and returns for each document. It refers to not only the list of fields, but also their structure and the formatting applied.

This guide explains how to think about field configuration, what trade-offs to consider, and how to use advanced options carefully.

Why Schema Design Matters

The schema is more than just a list of fields. It determines:

What data will the model try to extract
How easy it is for your team to review and correct that data
How cleanly the extracted data can be passed into your other systems

A poorly designed schema can lead to confusing user experiences, low accuracy, or messy downstream integrations. A well-designed one avoids all that.

What to Consider When Designing Your Schema

Here are the key trade-offs to consider when deciding what fields to include, and how to configure them.

End-Process Requirements

Not every piece of data needs to come from the document.

Some fields might be optional or easier to fill in later.
If a field is nice to have but not essential, consider leaving it out.
Prioritize fields that are critical, hard to get elsewhere, or needed to trigger automation.

Model Accuracy

Some field types are harder for the AI to extract accurately—especially complex, nested, or grouped fields.

Simple fields = better accuracy
Every field adds risk: If the model struggles with a field, it may make errors—even if the rest of the document is simple.
More fields = more training needed: If you want to improve accuracy for complex fields, it may require a custom model or more training data.

User Review & Validation

Your team (or users) will often review documents and confirm or edit extracted data. The easier this is, the faster the process.

Simpler is better: Flat schemas with clear field names are easier to check.
Avoid overcomplication: Deeply nested fields or grouped data can slow reviewers down and increase mistakes.

Integration Complexity

Think about how you plan to use the extracted data.

Is your downstream system expecting a specific format?
Will someone need to clean or restructure the data before using it?

If your schema aligns with your downstream format, you’ll avoid a lot of post-processing work.

Higher complexity field options

The simplest, and most common, field type is a single, flat value (like text, number, or date) from a clearly defined location on the document. They’re the easiest for the model to extract accurately and review in the UI. Below, we’ll walk through when some more advanced field configuration options could be used and the potential downside that needs to be managed when introduced. These options are powerful—but should be used with care. For information on how to configure these settings, see our Configuration Guide.

Allow Multiple Values

Enable this if a field may appear multiple times with distinct values (e.g. multiple invoice numbers). You do not need to enable this if a field appears multiple times with the same value (e.g. Customer name)Use only when needed:

Extracting multiple values makes things harder for the model.
It also increases the risk of messy integrations (e.g. when downstream systems expect a single value).
Can confuse users reviewing the data.

Tip: If most documents only have one value, leave this off.

Text Transformations

You can apply basic formatting (like trimming text, formatting dates, or converting currency symbols) to extracted values.Pros:

Helps the data match what your system expects
Can reduce post-processing effort

Cons:

Adds complexity to your setup
Can hide problems with the raw extraction
May behave unexpectedly with unusual documents

Tip: Only transform text when you’re sure of the format you need. Keep it simple.

Group Fields

Use groups to extract sets of related data (e.g. line items in a table).But be careful:

Groups are tricky for the model—especially if the document doesn’t visually show a clear grouping.
When groups span across pages, the complexity increases.
Often need custom configuration to get it right.

Recommendation: If you need group fields, get in touch with Affinda’s team—we’ll help you set them up properly.

'No Rectangle' Fields

These are fields where the model does not try to extract visible text directly from the document—meaning it won’t draw a box (rectangle) around any raw text.Instead, the model infers the value based on other content in the document.When to use:

You want the model to reason or summarize (e.g. determine payslip frequency from date ranges)
The value isn’t written explicitly anywhere in the document
You care more about interpretation than locating specific text

Benefits:

Let’s you extract insights or summaries, not just literal text
Enables smarter use cases that go beyond raw OCR

Drawbacks:

Accuracy is harder to validate—since there’s no reference text to compare against
Performance depends heavily on how well the field is described
Harder to troubleshoot or improve if predictions are wrong

Best practice: Write a clear, descriptive field label and description so the model understands exactly what you’re asking it to do. Ambiguous instructions lead to poor results.

Manual Entry Fields

These are fields that aren’t predicted by the model. Often used in combination with ‘rectangle-less fields to allow users to specify data not present in the document (e.g. comments, internal codes).Good for:

One-off user notes or tags
Fields that aren’t extractable but still needed

Don’t overuse:

Too many manual fields turn your extraction tool into a data-entry form.
Better to do heavy manual entry in your destination system, and keep Affinda focused on what’s in the document.

Best practice: Limit manual fields to 1–2 max per schema.

Final Tips for Success

Start simple: Begin with just the essential fields. Add more later if needed.
Review early: Test with real documents and see how the data looks.
Trim the fat: Remove unused or low-accuracy fields.
Talk to us: Affinda can help guide your configuration for optimal results.

Need Help?

If you’re unsure how to configure a field, want advice on best practices, or are working with complex document types, our team is here to help. Reach out any time.

Foundation Tutorials

User Validation 101

Advanced Playbooks

Schema Design & Field Configuration Best Practices

Overview

Why Schema Design Matters

What to Consider When Designing Your Schema

End-Process Requirements

Model Accuracy

User Review & Validation

Integration Complexity

Higher complexity field options

Final Tips for Success

Need Help?

Foundation Tutorials

User Validation 101

Advanced Playbooks

​Overview

​Why Schema Design Matters

​What to Consider When Designing Your Schema

End-Process Requirements

Model Accuracy

User Review & Validation

Integration Complexity

​Higher complexity field options

​Final Tips for Success

​Need Help?

Overview

Why Schema Design Matters

What to Consider When Designing Your Schema

Higher complexity field options

Final Tips for Success

Need Help?