Export a redacted version of the document with sensitive data removed.
While the most common export from the Affinda platform is structured data to be ingested into a downstream system, Affinda also offers document redaction capabilities on any document type. With these redaction capabilities, we edit the PDF so that the original text is completely removed and not just masked by an overlay.
The steps to create a new Document Type that is suitable for redaction are very similar to those required for the typical extraction of structure data.
Edit every field to ensure ‘Allow Multiple Values’ (found in Advanced Settings) is enabled (this ensures that if a field is repeated within the document, each version is redacted)
Upload documents to view fields to be redacted in the document validation interface
(optionally) Edit and update model predictions
Use the Get Redacted Document endpoint to return a redacted PDF version of the original document
Get in contact with the Affinda team to discuss your redaction use case and to enable a ‘redaction’ setting on your document type that will optimise for this output
What if the model only redacted the first instance of a field?
To ensure the model redacts every version of the field on your document, you need to enable ‘allow multiple’ in your fields configuration.To do this, go to your Configure Document Type Interface >Locate your field > navigate to Advanced Settings > enable “Allow Multiple Values”
For Redaction, we recommend setting all fields to “Allow Multiple Values” to avoid this issue.
How do I improve the performance of my redaction?
The performance of the redaction is determined by the performance of the underlying extraction model. To improve, we recommend adding more example documents and validating correct example documents to build Model Memory.Follow the Improving Model Accuracy Tutorial for step-by-step instructions.