In addition to the specific data extracted from the documents, the API response includes a range of field and document-level metadata that can be used to help process your documents.

Field Level Metadata

Metadata	Description
id	Identifier associated with the specific data point
rectangle(s)	x/y coordinates for the rectangular bounding box containing the data
pageIndex	The page that the data is found on
raw	Raw data extracted before any processing or formatting
confidence	Overall confidence that indicates the likelihood the data extracted is correct. This considers both classification and text extraction confidence scores
classificationConfidence	A value that indicates the confidence that the model has that the data returned is correct
textExtractionConfidence	A value that indicates the confidence that the text extracted from the document is correct (relevant for scanned documents)
isVerified	Indicates whether the data has been validated, either by a human using our validation tool or through auto-validation rules
isClientVerified	Indicates whether the data has been validated by a human
isAutoVerified	Indicates whether the data has been auto-validated
parsed	Parsed data extract after post-processing steps, including reformatting or mapping to a defined taxonomy

Document Level Metadata

Metadata	Description
identifier	The unique identifier associated with the document. Can be specified on upload, or else will be randomly generated by Affinda
fileName	An optional filename of the file
ready	If true, the document has finished processing. Particularly useful if an endpoint request specified wait=False, when polling use this variable to determine when to stop polling
readyDt	The date-time when the document was ready
failed	If true, some exception was raised during processing. Check the 'error' field of the main return object
expiryTime	The date/time in ISO-8601 format when the document will be automatically deleted. Defaults to no expiry
language	The document's language
pdf	The URL to the document's pdf (if the uploaded document is not already pdf, it's converted to pdf as part of the parsing process)
parentDocument.identifier	If this document is part of a split document, this attribute points to the original document that this document is split from
childDocuments.identifier	If this document has been split into a number of child documents, this attribute points to those child documents
pages	The number of pages in the document
isOcrd	Boolean indicating whether the document has had OCR applied to extract text (if false, the data was extracted from an existing text layer on the document)
ocrConfidence	Overall confidence in the accuracy of text extracted from the document by OCR
reviewUrl	A signed URL that is valid for 60 mins that can be used to review and validate the data extracted by the model. Learn more in Embedded Mode.
extractor	The Extractor that is associated with the Collection. An Extractor is an AI model used to extract data from documents
confirmedDt	The date-time when the document was confirmed
isConfirmed	Boolean to show if the document has been confirmed
rejectedDt	The date-time when the document was rejected
isRejected	Boolean to show if the document has been rejected
createdDt	The date-time when the document was created in Affinda
errorCode	If document processing fails, will return an error code
errorDetail	If document processing fails, will detail error identified
file	URL to view the file