On the dataset file list, when a file needs to be rescanned, Textual now displays a small warning icon in front of the column name and adds a clickable rescan icon to the options column.
Entities catalog for dataset entity values - Replaced the entity type value count and value modal with a new Entities catalog page for the dataset. The Entities catalog lists all of the detected entity values in the file. The list includes the context, transformed value, the Textual confidence in the detection, and the file where it was detected.
On the Entity Settings page for a dataset, restored the option to select the generator to use to generate synthesized values for custom entity types.
The dataset files list now displays the OCR method used for PDFs and images.
The dataset files list indicates when each file needs to be rescanned.
Fixed an issue where the preview for a dataset PDF file did not allow users to change the entity type handling.
On a self-hosted instance, you can now select the OCR engine to use for individual datasets. The setting is in the PDF settings section of the Project Settings page for the dataset.
Fixed an issue that caused an error when changing resource editing permissions when deprecated pipeline permissions were present.
Dataset details redesign - On the new dataset details page, the left panel contains a menu of options to display the list of dataset files, the entity type settings, and the project settings. Added more information to the file list. Added filters to the entity types list and changed the handling option from a toggle to a dropdown list. Enabling and disabling custom entity types is moved to a new separate side panel.
For dataset .docx and .xslx files, when the URL entity type is set to Redact or Synthesis, Textual now automatically replaces hyperlink destinations with google.com. If the URL entity type is set to Off, then Textual does not replace the hyperlinks.
Improved the grouping model to enable disambiguation of distinct entities that have identical names.
Replaced the Getting Started navigation link and modal with panels on the Home page that contain links to perform actions related to the API, datasets, guided redaction, and custom entity types.
On the Home page, when you upload a file that the redaction preview tool doesn't support (such as a PDF file), Textual now prompts you to create a dataset that contains the file.
Improved detection for date of birth values.
Added a new built-in entity type, US_ROUTING_TRANSIT_NUMBER, to identify the routing number of a bank in the United States.
Removed the pipeline functionality from the API and the SDK.
For guided redaction, added guided redaction permissions to control whether users can:
On the Home page, for redacted values, the entity type is now followed by a token to distinguish the value. The same value always gets the same token.
When you preview a file, and an entity is flagged with both numeric_pii and a more specific entity type, the more specific entity type is preferred.
Model-based custom entity types now use a larger context to include prediction errors and provide better guideline suggestions.
On the dataset details page, standardized how regex-based and model-based custom entity types are displayed.
Added a user profile setting to specify the name of the organization or team that the user belongs to.
Guided redaction (beta release) - The new guided redaction feature, currently in beta, produces redacted files for use cases such as Freedom of Information Act requests. Guided redaction supports a process that includes redaction and review.
Before you create a guided redaction project, you set up reference codes to identify specific types of values, and can map those codes to Textual entity types. Projects and files are assigned statuses to indicate where they are in the process. Textual provides a basic set of status values that you can adjust.
For the redaction phase, Textual performs an initial scan to detect sensitive values. You can then add and remove values and adjust the assigned reference codes.
The review process verifies that the redaction is correct and complete.
You can preview each file to verify the redactions before you download the output. All output files are PDFs, regardless of the original format. You can choose the color of the box that hides the redacted values, and whether to include the reference codes.
Model-based custom entity types - Textual now supports a new way to create a custom entity type. Previously, custom entity types always looked for values that matched regular expressions. Those are now called regex-based custom entity types, and are useful for entity types that have a smaller set of values or have values that use a standard format. For the new model-based custom entity types, you define and train models to identify entity values for an entity type. Model-based custom entity types are useful for when there are a wide range of values that need to be identified more by context than by format.