Automatically split single files into multiple documents.
Document Splitting is a powerful feature that allows you to separate a single file containing multiple documents into individual documents for further processing. This is especially useful when working with batch-scanned files.Once split, each document inherits the Workspace’s classification and extraction rules, simplifying downstream workflows.
Configure splitting settings at the Workspace level by navigating to Workflow Settings.Click on the toggle to enable the automatic splitting of files, then select the splitter you would like to use.
All customers will have access to Affinda’s General Document Splitter. This model has been designed to identify specific cues that indicate a new document, including:
Change in page numbering sequence (e.g. Page 1)
Change in key party within the document (e.g. an invoice from a different supplier is identified)
While the General Document Splitter has been designed to work across most use cases, there will be some use cases that will need additional configuration. Three different types of custom splitters can be created:
LLM based
Design a prompt that details when a file should be split
Key word
Split when a specific keyword(s) is found on a page
Trained model
Train a new model using a number of representative documents to help identify when a new document occurs within a file
Reach out to the Affinda team if the General Document Splitter is not meeting your use case, and you want to discuss a custom splitting model.
When a document is split into multiple components, new files are created in your account. These new files are created with a suffix added to the file name (e.g. [filename]_1, [filename]_2, etc.).Within the API response of the original file, users will also be able to find the identifier of the new files created, so that they can then request the data extracted from these newly created files. The PDF file of the documents is also included in the response so that new documents created can be added to your platform.
While the document splitter automatically splits a document, users will still have the option to manually edit the splitting or combine documents again through the ‘Edit Pages’ option within the document validation UI. This gives users full control over their documents.If there are any edits made to the file, the AI model will re-parse the affected documents to give the most accurate predictions. Any field validations made will be lost.See the Tutorial: Reviewing splitting and classification for step-by-step instructions.