Self-Learning Base Extractors

Self-learning Base Extractors (or 'Tailored Models') are versions of one of Affinda's base extractors (e.g. invoices, receipts, resumes) that have been trained on documents specific to a customer's use case to get even better performance.

The most common use case of a tailored model is within the Accounts Payable space. Often, a customer's supply chain will have a small number of suppliers that encompass a large proportion of their invoice volume. With a tailored model, an initial focus on these suppliers can greatly reduce the amount of processing time required. The 'long tail' of suppliers that are of lower volume will continue to improve in accuracy as they continue to get validated and fed back into the model for training.

How Self-Learning Base Extractors work

Tailored models work by using the base 'off the shelf' extractor as a foundation. The Base Extractor will be trained across a large and varied set of documents and thus will perform well 'out of the box' on most document formats of the same type. A tailored model then adds a layer that is specific to the document formats seen by that user account. After seeing a small number of documents of each document format type, the model can quickly learn and see higher accuracy, reducing (or eliminating) the time users need to review a document.

Tailored models are self-learning because they incorporate human validation in their workflow. This creates a feedback loop between the AI model's predictions and the corrected human validations. Through this loop, the model can continuously learn and improve over time based on user feedback on the platform.

Using Self-Learning Extractors

Each Collection in your Organisation will have an Extractor associated with it. The choice of what type of Extractor to use for Collections within your Organisation will ultimately impact the complexity of the solution, the accuracy of the data extraction over time, and the cost.
Generally, the options for a single standard document type (e.g invoices) are:

  1. Using the Base Extractor (no self-learning)
  2. Using a unique Self-Learning Base Extractor for every/specific end users
  3. Using a single shared Self-Learning Base Extractor for many/all end users

Note, Option #3 is not always applicable. There are two main factors to consider:

Are the field settings consistent across all relevant customers / Collections?

  • Our models require consistent data to continually improve, so if field settings are not consistent across Collections, model regression will occur

Is there a risk of inconsistent validation from different teams?

  • Typically, we recommend a tailored model per validation team
  • This is due to different teams potentially validating documents differently or inaccurately
  • If one end user is validating inaccurately, the accuracy of the Extractor may decrease for another end user using the same Extractor

Configuring your Organisation for tailored models

We recommend that users set up a single Collection that can be mapped 1:1 with a tailored model. This simplifies the setup and configuration process so that the risk of model regression due to data inconsistencies is lower.

To learn how to enable the self-learning capabilities, please refer to this page: Turning on self-learning capability in Affinda Web-app

Setting up tailored models

To discuss further whether tailored models are appropriate for your use case and to understand further how to best configure your account to take advantage of them, contact us to set up a discussion so that we can understand your use case better and provide some recommendations.