Fixed a rare issue where Azure OCR returned a400 response when the file upload stream contained corrupted data.
Improved synthesis on days of the week and ordinal numbers that are flagged as DATE_TIME.
Textual now only disables a numeric span when it overlaps one of the following disabled types: DATE_TIME, DOB, LOCATION, LOCATION_ADDRESS, LOCATION_ZIP, MONEY, CREDIT_CARD, PHONE_NUMBER.
Improved the Textual NER model throughput on long strings that contain a large number of detected entities.
Added support to store dataset files in a specified S3 bucket, instead of in the Textual application database.
When Textual replaces first name values, it now attempts to use a name with the same gender.
For the DOB (date of birth) entity type, you can now configure synthesis options. You can set how to shift the date.
You can now configure the entity type handling for a dataset before you upload the dataset files.
You can now provide added and excluded entity values when you use the SDK to redact individual strings and files.
Added a new method to the SDK. redact_xml
works similarly to redact_json
. To produce a redacted output, you pass in a redact_xml
string.