Document Splitting

Document Splitting is a powerful feature that allows you to separate a single file containing multiple documents into individual documents for further processing. This is especially useful when working with batch-scanned files.

Once split, each document inherits the Workspace’s classification and extraction rules, simplifying downstream workflows.

Splitting Settings

Configure splitting settings at the Workspace level. Click on the toggle to enable or disable automatic splitting of files and select the Splitter to use.

General Document Splitter

All customers will have access to Affinda's General Document Splitter. This model has been deisgned to identify specific cues that indicate a new document, including:

  • Change in page numbering sequence (e.g. Page 1)
  • Change in key party within the document (e.g. an invoice from a different supplier is identified)
  • Change in key document identifier

🚧

General Document Splitter is not self learning

Custom splitter

While the General Document Splitter has been designed to work across most use cases, there will be some use cases that will need additional configuration. Three different types of custom splitters can be created:

  1. LLM based
    Design a prompt which details when a file should be split
  2. Key word
    Split when a specific key word(s) is found on a page
  3. Trained model
    Train a new model using a number of representative documents to help identify when a new document occurs within a file

📘

Speak to the Affinda team if the General Document Splitter is not meeting your needs and you need to configure a splitter for your use case

What happens to documents that are split?

If there are any edits made to the file, the AI model will re-parse the data to give the most accurate predictions. Any field validations made will be lost.

When a document is split into multiple components, new files are created in your account. These new files are created with a suffix added to the file name (e.g. [filename]_1, [filename]_2, etc.).

Within the API response of the original file, users will also be able to find the identifier of the new files created, so that they can then get the data from these newly created files. The PDF file of the documents is also included in the response so that new documents created can be added to your platform.

Editing the split

While the document splitter automatically splits a document, users will still have the option to manually split or combine documents again through the 'Edit Pages' option within the document validation UI. This gives users full control over their documents, even after the initial split (see below for more information).

See Splitting & Page Editor for more information.