Uploading Documents via the API

While a range of API endpoints drive the Affinda solution, uploading documents for processing (and receiving the response) is core to the product.

Submission and Retrieval Options

When integrating document data extraction (parsing), there are three primary ways to submit documents and retrieve the parsed data:

1. Synchronous Parsing (wait=true)

Explanation:

  • Users submit a document, and the API processes it immediately
  • User application waits and the request stays open (wait = true ) until parsing is complete
  • When finished, the API directly returns the parsed document data

Pros:

  • Simplest integration
  • Immediate retrieval of results after completion
  • Suitable for documents that need to be processed quickly

Cons:

  • Not suitable for large documents or high-volume scenarios
  • Can lead to timeouts for documents requiring lengthy processing

Ideal Use Case:

  • Interactive apps where quick, synchronous response times are critical

2. Asynchronous Parsing with Polling (wait=false)

Explanation:

  • Users submit a document and receive an immediate acknowledgment (document ID)
  • User application periodically checks (polls) the API endpoint using repeated GET requests to determine when parsing is complete
  • Once processing finishes, user retrieves the parsed data.

Pros:

  • Avoids connection timeouts; ideal for longer or variable processing times
  • Better suited for handling multiple simultaneous document submissions

Cons:

  • Additional complexity with polling logic
  • Generates higher API call volume (frequent polling checks)
  • Slight delay between actual completion and data retrieval, depending on polling interval

Ideal Use Case:

  • High-volume scenarios, large documents, or batch processing jobs where exact completion timing isn't critical.

3. Asynchronous Parsing with Webhooks (resthook)

Explanation:

  • Users submit a document, receiving an immediate acknowledgment
  • Rather than polling, your application receives a webhook (callback notification) directly when the document is ready for export
    • In some cases this will be when the document is finished parsing, but in other use cases this may be when the document has been validated
  • After receiving the webhook, your application retrieves the parsed data with a GET request.

Pros:

  • Most efficient asynchronous method—reduces unnecessary polling.
  • Lower overall API usage
  • Provides real-time notifications upon completion

Cons:

  • Slightly higher setup complexity (webhook listener infrastructure required)

Ideal Use Case:

  • Real-time workflows or event-driven architectures where timely data retrieval is essential, or API usage optimization is needed
  • Scenarios where the application must wait until the document has been fully validated before receiving results

See Webhooks for more information.

Request Body

The following parameters may be included in the API POST request to Affinda using the Upload a document for parsing endpoint.

Note, all individual parameters are optional, however, one of the following must be specified:

  • File or URL

Document can be uploaded to either:

  • Workspace (specify Workspace only)
  • Collection (earlier version of app, specify Collection only)
  • Document Type (specify both Workspace and Document type)
ParameterDescription
fileFile as binary data blob. Supported formats: PDF, DOC, DOCX, XLS, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JEPG
urlURL to a document to download and process
collectionUniquely identifier for the Collection to upload the document to. The Collection identifier can be found by using the Get list of all collections endpoint.
If Collection is specified, the document will not be classified to a different Collection if it is the wrong type (however it may be rejected depending on settings)
documentTypeUniquely identifier for the Document Type to upload the document to. If specified, Workspace must also be specified.
workspaceUniquely identifier for the Workspace to upload the document to

- The Workspace identifier can be found by either using the Get list of all workspaces endpoint or through the app
- If only Workspace is specified, Affinda will attempt to classify the document into a relevant Collection based on document type
waitIf "true" (default), will return a response only after processing has been completed. If "false", will return an empty data object which can be polled at the GET endpoint until processing is complete.
customIdentifierSpecify a custom identifier for the document.
fileNameThe optional filename of the file
expiryTimeThe date/time in ISO-8601 format when the document will be automatically deleted. Defaults to no expiry.
languageLanguage code in ISO 639-1 format. Must specify zh-cn or zh-tw for Chinese.
rejectDuplicatesIf "true", parsing will fail when the uploaded document is duplicate of an existing document, no credits will be consumed. If "false", will parse the document normally whether its a duplicate or not. If not provided, will fallback to the workspace settings.
regionBiasA JSON representation of the RegionBias object. Influences geocoding and other results.
lowPriorityExplicitly mark this document as low priority.
compactIf true, the returned parse result (assuming wait is also true) will be a compact version of the full result.
deleteAfterParseIf true, no data will be stored after parsing. Only compatible with requests where wait: True.
enableValidationToolIf true, the document will be viewable in the Affinda Validation Tool. Set to False to optimize parsing speed.