Data Mapping
Data mapping is a powerful capability aimed at enhancing the quality and accuracy of data extracted from documents. With data mapping configured, the extracted data can be mapped against a list of known options from the customer's own data to validate that it meets expected values. This provides greater confidence in the data extracted by the model and ensures seamless integration with downstream systems.
Affinda's data mapping capability has been designed to be highly flexible and configurable, ensuring it meets simple requirements where a single raw value from the document is mapped to a list, as well as complex use cases incorporating multiple fields from the document and variations in how these are presented.
Mapping Data Sources
Mapping Data is the list of acceptable values that the data extracted from the document can be mapped against. Mapping Data Sources can be created and updated in two ways:
- Via the Web Application - recommended for very simple use cases only as functionality is limited
- Programmatically via the API - recommended where additional control and configuration is required
Adding Data Mapping via the Web Application
Mapping Data Sources via Web Application for simple use cases only
Creation of Mapping Data Sources via Web Application is only suitable when creating a simple fuzzy string lookup of a data field in the document to a list of acceptable values.
More advanced data mapping configurations can be configured via the API or by discussing with the Affinda team. See below for more information about configuring via API.
Mapping Data Sources can be added via the Affinda app by following the below steps:
- Navigate to Collection settings
- Find the relevant field you would like to add the Mapping Data Source to and click 'Configure Field'
- Change the Data Type to 'Dropdown'
- Type in the list of acceptable values under Dropdown Options
Adding Data Mapping via API
We recommend speaking to our team of experts for assistance with configuring complex requirements
Get in touch with your Account Representative or email [email protected] to
Follow these steps to set up a new Data Mapping for your supplier list:
1. Create a new Mapping Data Source
- New Mapping Data Sources can be created using the following endpoint: Create a mapping data source, with the following parameters
Parameter | Description |
---|---|
name | |
organization | The Organization identifier that this mapping data source belongs to. Populate one of Organization or Workspace |
workspace | The Workspace identifier that this mapping data source belongs to. Populate one of Organization or Workspace |
keyProperty | Attribute in the schema which uniquely identifies the value. Each value must be unique within the Mapping Data Source |
displayProperty | Attribute in the schema which is used to display the value in the Affinda validation interface |
value | (optional) The list of values to populate the Mapping Data Source with. These may also be populated later (see step 2 below). |
schema | (optional) JSON schema specifying the format of the values indexed in a mapping data source. Inserted data will be validated against this schema. All 'properties' within the schema will be populated within the API parsed response. |
The default schema if unspecified will be:
{
"type": "object",
"$schema": "http://json-schema.org/draft-07/schema#",
"required": [
"value"
],
"properties": {
"label": {
"type": "string"
},
"value": {
"type": "string"
},
"description": {
"type": "string"
}
}
}
However, this schema is highly flexible and can include different fields and structures. A common addition to the schema is 'synonyms' or the different ways that a particular field might be represented on a document (e.g. entities within a single parent organisation might be represented in this way).
2. Populate the Data Source
If not updated in step #1, the values within your Mapping Data Source need to be populated. This can be done a single value at a time or in bulk.
- Add data one at a time: Add a Mapping Data Source Value
- Load data in bulk: Replace Mapping Data Source Values
3. Create Mapping (optional)
'Mappings' influence how the Data Mapping Source is mapped against the extracted data in the document. If left unspecified, the default setting is that Affinda will apply fuzzy string matching against the extracted data for that field, mapping against both KeyProperty and displayProperty. The highest-scoring match will be returned (even if it is a relatively low score).
However, this default behaviour can be updated by creating a Mapping using Create a mapping.
The following settings can be updated within a Mapping:
- Score Cutoff - If populated, we will only return records with a score above this value. Allows customers to determine when an acceptable match is found, or when to force a user to manually select the value.
- Order By - Specify the field to order results by. Use a minus sign for descending order.
- Lookup Type - Defines which extracted fields map against which fields within the Mapping Data Source and the type of match.
- Configuring this is required where a customer needs to match against multiple extracted data fields (e.g. Supplier Name and Supplier Address) and/or multiple fields within the Mapping Data Source (e.g. multiple 'synonyms' of Supplier Name)
- The match type influences the type of matching used for these lookups (exact, fuzzy, query string, etc.)
Lookup Type not currently configurable by customers
Get in touch with Affinda to understand how we can configure this for your use case
4. Update Collection settings
Update the Field Configuration for the relevant Collections and Fields for the Data Mapping using Update a collection. For the specific field, update the following options:
- field_type to enum
- show_dropdown to true
- data_source to the identifier for the relevant Mapping Data Source
- mapping to the identifier for the relevant Mapping
- display_enum_value to true if the keyParameter should be shown in the validation UI (otherwise, leave as false)
Updated 5 months ago