Pre-Processing Overview

Pre-processing involves the automated cleaning, organizing, and structuring of uploaded files to prepare the documents for data extraction.

Importance of Pre-Processing

By ensuring documents are well-prepared before extraction, pre-processing

Reduces the likelihood of errors in later stages
Improves the speed and accuracy of data extraction
Enables the seamless handling of various document types and formats

Key Pre-processing Actions:

Invalid File Handling: Identify issues that mean a document cannot be processed, such as insufficient text in the document, unsupported file types, corrupted files, or password-protected documents.
File Format Conversion: This involves converting file formats (e.g., images, PDFs with embedded data) into a PDF format suitable for processing.
Remove Duplicates: Workspaces can be configured to identify and reject documents that have already been processed. This can be configured by the user in Workflow Settings; see Remove Duplicates for more information.
OCR (Optical Character Recognition): Extracting text from scanned or image-based documents using advanced OCR technology, ensuring high accuracy for all document types and formats. See OCR and Text Extraction for more information.
Language Detection: Automatically identifies the language of the document to ensure high-accuracy extraction.

Pre-processing settings can be found in your Workspace Workflow Settings.

Advanced Pre-processing Settings:

Reading Order Model: The Affinda Platform uses our proprietary reading order algorithms by default to capture word sequences in visually rich documents in a way that aligns with human comprehension. This ensures that text is processed in the same order a human would read it, leading to more accurate extractions. Split Words: Ensures words that are incorrectly combined are separated for extraction. Default is on.

Email Upload OCR and Text Extraction

⌘I

Overview

Ingestion

Pre-Processing

Splitting & Classification

Extraction

Machine Validation

User Validation

Data Export

Admin Controls

Importance of Pre-Processing

Key Pre-processing Actions:

Advanced Pre-processing Settings:

​Importance of Pre-Processing

​Key Pre-processing Actions:

​Advanced Pre-processing Settings:

Importance of Pre-Processing

Key Pre-processing Actions:

Advanced Pre-processing Settings: