In addition to the specific data extracted from the documents, the API response includes a range of field and document-level metadata that can be used to help process your documents.

Field Level Metadata

MetadataDescription
idIdentifier associated with the specific data point
rectangle(s)x/y coordinates for the rectangular bounding box containing the data
pageIndexThe page that the data is found on
rawRaw data extracted before any processing and formatting
confidenceOverall confidence that indicates the likelihood the data extracted is correct. This considers both classification and text extraction confidence scores
classificationConfidenceA value that indicates the confidence that the model has that the data returned is correct
textExtractionConfidenceA value that indicates the confidence that the text extracted from the document is correct (relevant for scanned documents)
isVerifiedIndicates whether the data has been validated, either by a human using our validation tool or through auto-validation rules
isClientVerifiedIndicates whether the data has been validated by a human
isAutoVerifiedIndicates whether the data has been auto-validated
dataPointA unique identifier associated with that data field
contentTypeType of data. Options include text, date, date-time, enum, location, float, and decimal.
parsedParsed data extract after post-processing steps, including reformatting or mapping to a defined taxonomy

Document Level Metadata

MetadataDescription
identifierA unique identifier associated with the document. Can be specified on upload, or else will be randomly generated by Affinda
fileNameAn optional filename of the file
readyIf true, the document has finished processing. Particularly useful if an endpoint request specified wait=False, when polling use this variable to determine when to stop polling
readyDtThe date-time when the document was ready
failedIf true, some exception was raised during processing. Check the 'error' field of the main return object
expiryTimeThe date/time in ISO-8601 format when the document will be automatically deleted. Defaults to no expiry
languageThe document's language
pdfThe URL to the document's pdf (if the uploaded document is not already pdf, it's converted to pdf as part of the parsing process)
parentDocument.identifierIf this document is part of a split document, this attribute points to the original document that this document is split from
childDocuments.identifierIf this document has been split into a number of child documents, this attribute points to those child documents
pagesThe number of pages in the document
isOcrdBoolean indicating whether the document has had OCR applied to extract text (if false, the data was extracted from an existing text layer on the document)
ocrConfidenceOverall confidence in the accuracy of text extracted from the document by OCR
reviewUrlA signed URL that is valid for 60 mins that can be used to review and validate the data extracted by the model. Learn more in Embedded Mode.
collectionThe Collection that the document is within
extractorThe Extractor that is associated with the Collection. An Extractor is an AI model used to extract data from documents
workspaceThe Workspace that the Collection and document is within
archivedDtThe date-time when the document was archived
isArchivedBoolean to show if the document has been archived
confirmedDtThe date-time when the document was confirmed
isConfirmedBoolean to show if the document has been confirmed
rejectedDtThe date-time when the document was rejected
isRejectedBoolean to show if the document has been rejected
createdDtThe date-time when the document was created in Affinda
errorCodeIf the document processing fails, will return an error code
errorDetailIf document processing fails, will detail error identified
fileURL to view the file
tagsTags applied to documents to enable filtering and searching
confirmedByDetails of the user that last confirmed the document
sourceEmailIf the document is created via email ingestion, this field stores the email file's URL.