NextGen Resume Parser Product Guide
The page contains all information about the NextGen Resume Parser:
- Getting started with NextGen Resume Parser
- Key features of the NextGen Resume Parser
- Using the NextGen Parser
- Information Security
Getting started with NextGen Resume Parser
Affinda's Resume Parser is the flagship product within our Recruitment Product Suite. It accurately returns all relevant data found in candidate resumes in seconds
Key features of the NextGen Parser
The current version of our Resume Parser, “NextGen”, was released in April 2024 and is our fourth iteration. This version benefits from:
- Significantly improved accuracy
- Expanded data coverage, supported by field relationships
- Configurable taxonomies and data mapping to enhance downstream data quality
The previous version, our Legacy Resume Parser, remains available to existing customers. For customers who are migrating from legacy parser to NextGen Parser, please see HERE for more information.
Data extracted
Standard Fields
The full list of standard fields extracted by the Affinda Resume Parser is available here.
Custom Fields
Affinda's NextGen Parser offers customisable resume extraction tailored to specific industries or templates. We can add custom fields to meet unique requirements, such as "driver licence number" when processing drivers' resumes.
To explore whether custom fields suit your needs, please get in touch.
Taxonomy mapping
Affinda's NextGen Resume Parser uses pre-built taxonomies to standardize key fields like skills, making data more accurate and consistent across your systems. It works with resumes in any language by mapping skills to a single, shared framework. This eliminates the hassle of cleaning and normalizing data, making it easier to analyze, report, and integrate into other processes. Here's an overview:
Configurable Taxonomies
Skills
By default, the NextGen Resume Parser utilises the Lightcast taxonomy. To modify this setting, navigate to a workspace with a NextGen Collection and access Collection Settings by clicking the gear icon. Then, edit the field configuration for Skills and select the preferred taxonomy from the dropdown menu (e.g., ESCO). Alternatively, you can change the Data Type to Text to capture raw text only
This Lightcast skills taxonomy works across all 50+ languages providing standardised data using a best-in-class taxonomy. By default, mapped skills are returned in English. We can return mapped values in the original resume's language. To enable this, please get in contact with your Affinda account manager or submit a request via our contact form.
Job titles
By default, the NextGen Resume Parser does not map job titles to any taxonomy and returns only the extracted raw string. To change this, update the Job Titles field configuration in the Collection settings by setting the field type to 'Dropdown' and selecting the desired taxonomy from the list.
Occupation Classification
All three Occupation Classifications are available in the API response automatically.
Custom taxonomies
Additional taxonomies that customers use internally that are not available as pre-configured options can also be added, ensuring that customers get the data in the format they need. Custom taxonomies can be defined for fields that already have pre-configured options (e.g. skills, job titles), but also any other text field (e.g. work organisations).
For example, this means customers may add their own internal skills taxonomy that is in place of Lightcast or ESCO, or they may wish to map against a defined list of universities that are relevant to the customer.
For more information on specifying a custom taxonomy, please get in touch with the Affinda Sales Team Member or submit a request via our contact form.
Using the NextGen Parser
Parse documents via the Affinda application
Document upload
After first creating your free trial account at https://app.affinda.com/, you will be prompted to create a Workspace. A Workspace is where documents are processed and can contain one or many collections of documents. Select the relevant document types to start processing your use case: NextGen Resume Parser
Once the collection is created, you can upload files by dragging and dropping them from your computer or forwarding documents to the collection's email address (found under "Copy Email Address") to view extraction results.
Understanding the Affinda document interface
Affinda's document interface provides a simple tool to visualise all of the outputs from the model. This means that customers can quickly assess the accuracy of the solution and all of the data that has been extracted. Customers can observe the raw values that have been extracted from the document, as well as the final 'parsed' values that have been formatted or mapped into standardised values that can be more easily used in downstream processing. To view more details, users can click on the document, select the export option next to "Confirm Document," and export it as a JSON file.
Updating visible fields
By default, when you first create a new Collection with the NextGen Resume Parser, all fields will be visible. However, often customers will only care about a subset of fields that Affinda extracts from resumes. To enhance the testing process, customers can 'disable' certain fields in our document interface. These fields are then no longer visible in the user interface, so testing is restricted just to the fields that matter.
Note, disabled fields will still be present in the API response.
Relevant API end-points
NextGen Resume Parser not supported by API V2
Customers will need to use API V3 to take advantage of the new capabilities
- Document Upload via API: Documents can be uploaded to Affinda by sending a request to the POST documents endpoint. Each new document upload will consume one credit. Upon successful upload, the API response will include a unique document identifier (e.g., "document": "EfHgjcsD"). There are a number of decisions for customers when uploading a document via API:
Area | Description |
---|---|
Document | Either a file or URL needs to be included in the POST request. The following formats are supported: PDF, ZIP, DOC, DOCX,XLS, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG* - Volume Limits: There is no limit to the number of documents you can submit to the Affinda API. However, you will be limited to 30 documents per minute that will be processed by our high priority queue. If you would like to specify which queue to use, you can set the lowPriority parameter during document submission. Note that if you explicitly set lowPriority to false , and if you have exceeded the high priority queue rate limit, you will receive a 429 (Too Many Requests) response.- Page Limits: The default page limit for all customers is 10 pages. Settings may be adjusted to increase this limit on a case-by-case basis. Please get in contact with your Affinda account manager for details or submit a request via our contact form . |
Workspace or Collection | Specify Collection ID instead of Workspace ID:Customers using our Resume or Job Description Parser will typically know what type of document has been received. As a result, we recommend specifying the relevant Collection identifier (as opposed to the Workspace identifier which may then result in document classification being used to route the document to the right model. Find Workspace ID: Log in to your Affinda application and go to Workspace > Settings to find your Workspace ID. Important note: Each organisation has a maximum limit of 100 workspaces. If you anticipate exceeding this limit, please contact your Affinda account manager for assistance or submit a request through our contact form. Find Collection ID: Hover over the collection, click the gear icon in the collection settings, and the Collection ID will be displayed. |
Synchronous or Asynchronous Responses | Within the request, customers can set wait to true / false depending if the parsing response needs to be returned synchronously or not. For customers who are bulk uploading, it is recommended to set wait to false.- Synchronous - If "true" (default), will return a response only after processing has been completed. - Asynchronous - If "false", will return the document identifier and other metadata alongside an empty data object. The data object can be returned at a later date by either: - Polling GET endpoint until processing is complete; or - Setting up webhooks to get a notification that parsing has completed and then using the GET endpoint to pull the extracted data |
Parsing Time | For customers who need real-time responses where seconds count, please set parameters to the following when submitting a document:** - enableValidationTool: False - deleteAfterParse: True - compact: True By setting these parameters, Affinda can bypass the need to save any data to our database, which eliminates unnecessary processing time and reduces the overall time taken to return results. However, note that this means that: - The document can not be viewed in the Affinda app(e.g. for 'human in the loop' validation) - The document is not retained in our system so responses can not be fetched at a later date - Field metadata is not returned, only the 'parsed' value |
- Retrieve document after initial parsing: To retrieve the results of a previously processed document, clients can use the Get Document endpoint. Retrieving results from this endpoint does not consume any credits.
Area | Description |
---|---|
identifier | Relates to the document unique identifier received via the response from the POST documents endpoint |
Format | File format to be returned - options includes JSON, XML and hr-xml |
compact | If "true", the response is compacted to only the annotations' parsed data. Annotations' meta data are excluded. If not specified, default is "false". |
- Update a document: Update file name, expiry time, or move to another collection, use the update a document API.There are some common decisions for customers when updating a document:
Area | Description |
---|---|
New file name | Specify a new file name to replace the existing file name |
Collection | Move the document into a new collection specified. Note: Collection ID can be found inside the appli Find Collection ID: Hover over the collection, click the gear icon in the collection settings, and the Collection ID will be displayed |
Expiry date | The default is that documents will not be deleted within Affinda. Specify an expiry date for the document to be autoamtically deleted from the Affinda database Please specify the expiryTime in ISO-8601 format. For example, September 27, 2022 at 6 PM in ISO 8601 format is written as 2022-09-27 18:00:00.000.** |
skipParse | By default, actions like updating a document or moving a document from one collection to another will trigger a reparse, consuming a credit. To prevent automatic reparsing when updating a document, set skip_parse = true. |
Language | Update the parsed value to a specific language. Note: Language code in ISO 639-1 format (eg. es: Spanish) Note: Must specify zh-cn or zh-tw for Chinese. |
- Patching and Updating Custom Fields: Some customers may require custom fields to be consistently set to a specific value for system compatibility. For detailed instructions, refer to the Patching and Updating Custom Fields guide
Information Security
Overview
Affinda’s parsers convert unstructured resumes and job descriptions into structured data, enabling faster, more informed decisions for HR and recruiters. Unlike generative AI services, Affinda is an extractive AI solution. Our proprietary technology extracts data from documents, reducing risks commonly associated with generative AI, such as copyright infringement and bias risks. We do not use large language models like GPT and ensure data privacy by not using client documents to train our models.
Furthermore, Affinda is ISO27001 certified, reflecting our commitment to rigorous, best-in-class information security practices and robust risk management protocols. All our products are designed with privacy, security, and compliance as core principles.
On Premise Deployment
Most customers benefit from Affinda's technology through our hosted solution. However, some may require a locally deployed solution for specific needs. To support these users, we’ve published an Affinda Self-Hosted Deployment Guide
If you’re interested in a local deployment, please contact your Affinda account manager for further details or submit a request via our contact form.
Document Lifecycle
The Affinda API provides customers full control over the lifecycle of their submitted documents.
Lifecycle options can be set per document and include:
Deletion
When a document is deleted, the document and all associated files are immediately removed from our servers. All access to the document will be lost. Document metadata, which may include file names but does not include the file content, may remain in Affinda’s database or backups of Affinda’s database for some time.
Expiration
A typical scenario when incorporating the API into a web app is to enable a customer’s end users to perform document parses on demand. In such cases, it is not necessary or desirable to store the result indefinitely. To facilitate this, the API allows a customer to specify an expiry time when they submit the document. When a document has an expiration set, it will be deleted automatically at the expiration date.
Offboarding
- User Offboarding: When a user is removed from an organisation, they are automatically offboarded and will no longer have access to any data within the organisation
- Delete Organisation: An organization can be deleted either via API or through the Affinda app by navigating to: Organization > Settings > "Delete [Organization Name]".