Transitioning to NextGen Parser
The page contains all information details for customers transition from Legacy to NextGen Resume Parser
- NextGen vs Legacy Parser Fields
- Upgrading to API V3
- Compatibility with Search and Match
- Upgrading to a NextGen self-hosted setup
NextGen vs Legacy Parser fields
Additional fields covered by the NextGen Resume Parser
The full list of standard fields extracted by the Affinda Resume Parser is available HERE. Below are the new fields introduced by the NextGen Resume Parser that are not present in the legacy parser.
NextGen schema
The NextGen Resume Parser has also has new schema that includes far wider data coverage, new field relationships and structures, and additional meta data. As a result, any customers using our Legacy Resume Parser will need to update their field mappings in their platform to make use of the new schema and the new capabilities provided through the NextGen Resume Parser.
The below link provides a mapping of the Legacy Resume Parser to the NextGen Resume Parser, with optionality on whether to use our compact view.
NextGen Resume Parser - Field Mappings.xlsx
Additional meta data
Metadata | Description |
---|---|
id | Identifier associated with the specific data point |
rectangle(s) | x/y coordinates for the rectangular bounding box containing the data |
pageIndex | The page that the data is found on |
raw | Raw data extracted before any processing and formatting |
confidence | Overall confidence that indicates the likelihood the data extracted is correct. This considers both classification and text extraction confidence scores |
classificationConfidence | A value that indicates the confidence that the model has that the data returned is correct |
textExtractionConfidence | A value that indicates the confidence that the text extracted from the document is correct (relevant for scanned documents) |
isVerified | Indicates whether the data has been validated, either by a human using our validation tool or through auto-validation rules |
isClientVerified | Indicates whether the data has been validated by a human |
isAutoVerified | Indicates whether the data has been auto-validated |
dataPoint | A unique identifier associated with that data field |
contentType | Type of data. Options include text, date, date-time, enum, location, float, and decimal. |
parsed | Parsed data extract after post-processing steps, including reformatting or mapping to a defined taxonomy |
Upgrading to API V3
Affinda will continue to support API v2 for the legacy parser, with no plans to retire it. However, the NextGen Resume Parser is only supported on API v3. Clients wishing to use the NextGen Resume Parser must migrate to API v3.
Main differences between API V2 and V3
1. Document agnostic end-points
- V2: Separate endpoints were available for each document type supported, which meant that customers had to upload documents to specific endpoints for resumes or job descriptions
- V3: Uploading documents can now be done via a single 'documents' endpoint, and the response received depends on the document type considered. While this is a different end-point, the way that this works is largely the same as in V2 (e.g. same parameters).
2. Document organisation
- V2: No ability to organise documents of the same type into logical groups (e.g. all resumes are all located in the same queue).
- V3: Workspaces and collections help organize documents into logical groups, allowing for the application of different settings to each group (e.g., creating separate workspaces for different clients, each with tailored fields). When uploading documents, users must specify the Workspace or Collection to which the document belongs.
3. Capabilities
- V2: Set of capabilities is largely restricted to uploading documents and receiving the results.
- V3: Option exists to make use of a far wider range of features, such as adding tags, mata mapping, validation rules and managing user access.
Compability with Search and Match
A key advantage of upgrading is that resumes processed with the v4 parser will have more accurate and comprehensive information extracted (e.g., fewer missing skills), enhancing the matching process. If you're an existing Search and Match customer using a legacy parser and want to upgrade to the NextGen Parser, there's no need to reparse previously processed resumes. You can continue using the same indices for candidates parsed with the legacy parser.
Upgrading to a NextGen self-hosted setup
To upgrade your legacy self-hosted parser to the NextGen self-hosted setup, follow these steps:
- Back up your data: Before starting the database migration, create a backup of your data by duplicating your existing environment, including the back-end, to ensure you have a safe copy to restore if needed.
- Access the latest self-hosted version: Obtain the latest version under "releases" from https://github.com/affinda/self-hosted . Depending on your infrastructure, access is provided through either Docker or an Amazon IAM user.
- Run migration in staging environment: Conduct the migration in a staging environment to minimise the risk of production disruptions. Ensure the staging environment closely mirrors the production setup. Since the database migration is irreversible, testing in a separate instance ensures a safer process.
- Set up and use the NextGen collection: After upgrading to the latest version, create a Resumes(NextGen) collection (see screenshot below) and parse resumes using it. Ensure you are using API V3 to access this NextGen collection, and any new resumes parsed in this collection will adopt the NextGen data structure.
- Switch to Production: Once all tests in the staging environment are successful, switch to the production environment to enable newly parsed resumes to use the Nextgen parser