Types of Data Extraction Models

Affinda’s data extraction models are called ‘Extractors’. These Extractors can be configured in different ways to meet your use case and the way that end customers will interact with the platform.

Types of Data Extraction Models

There are 4 main types of Extractors that customers may use, each with different capabilities.

ExtractorsSelf-LearningDescription
Base ExtractorsNo- Pre-built to work out of the box for many customers
- No self-learning component so performance will not improve with validation of documents in your Organisation
- Has a defined standard schema, of which some fields can be disabled via the Collection settings
- e.g., Invoice or Receipts model
Tailored Base Extractors with Standard FieldsYes- A Tailored Extractor is a model that is self-learning– it starts from a Base Extractor and then learns & improve on the document formats confirmed through the validation process
- Works out of box and then improves over time
- Fields can be disabled as with the Base Extractor
Tailored Base Extractors with Custom FieldsYes- Similar to the above, but with added custom fields for data not captured by the standard schema
- Typically, up to a maximum of 5 custom fields per Extractor
Custom Extractors for a Bespoke Document TypeYes- New Extractor created for a document type not currently supported by Affinda’s base extractors
- Requires initial setup (annotation + training) before it will predict data
- Self-learning capability

Configuring Extractors

The choice of what type of Extractors to use for Collections within your Organisation will ultimately impact the complexity of the solution, the accuracy of the data extraction over time and the cost. Please refer to the guides below to understand how tailored and custom models work and their applicable use cases: