Metadata
In addition to the specific data extracted from the documents, the API response includes a range of field and document-level metadata that can be used to help process your documents.
Field Level Metadata
Metadata | Description |
---|---|
id | Identifier associated with the specific data point |
rectangle(s) | x/y coordinates for the rectangular bounding box containing the data |
pageIndex | The page that the data is found on |
raw | Raw data extracted before any processing or formatting |
confidence | Overall confidence that indicates the likelihood the data extracted is correct. This considers both classification and text extraction confidence scores |
classificationConfidence | A value that indicates the confidence that the model has that the data returned is correct |
textExtractionConfidence | A value that indicates the confidence that the text extracted from the document is correct (relevant for scanned documents) |
isVerified | Indicates whether the data has been validated, either by a human using our validation tool or through auto-validation rules |
isClientVerified | Indicates whether the data has been validated by a human |
isAutoVerified | Indicates whether the data has been auto-validated |
parsed | Parsed data extract after post-processing steps, including reformatting or mapping to a defined taxonomy |
Document Level Metadata
Metadata | Description |
---|---|
identifier | The unique identifier associated with the document. Can be specified on upload, or else will be randomly generated by Affinda |
fileName | An optional filename of the file |
ready | If true, the document has finished processing. Particularly useful if an endpoint request specified wait=False, when polling use this variable to determine when to stop polling |
readyDt | The date-time when the document was ready |
failed | If true, some exception was raised during processing. Check the 'error' field of the main return object |
expiryTime | The date/time in ISO-8601 format when the document will be automatically deleted. Defaults to no expiry |
language | The document's language |
The URL to the document's pdf (if the uploaded document is not already pdf, it's converted to pdf as part of the parsing process) | |
parentDocument.identifier | If this document is part of a split document, this attribute points to the original document that this document is split from |
childDocuments.identifier | If this document has been split into a number of child documents, this attribute points to those child documents |
pages | The number of pages in the document |
isOcrd | Boolean indicating whether the document has had OCR applied to extract text (if false, the data was extracted from an existing text layer on the document) |
ocrConfidence | Overall confidence in the accuracy of text extracted from the document by OCR |
reviewUrl | A signed URL that is valid for 60 mins that can be used to review and validate the data extracted by the model. Learn more in Embedded Mode. |
extractor | The Extractor that is associated with the Collection. An Extractor is an AI model used to extract data from documents |
confirmedDt | The date-time when the document was confirmed |
isConfirmed | Boolean to show if the document has been confirmed |
rejectedDt | The date-time when the document was rejected |
isRejected | Boolean to show if the document has been rejected |
createdDt | The date-time when the document was created in Affinda |
errorCode | If document processing fails, will return an error code |
errorDetail | If document processing fails, will detail error identified |
file | URL to view the file |
Updated almost 2 years ago