Tonic Structural release notes

v541 - v551

Removed

August 12, 2022

Added a new Max Categorical Dimension parameter to the AI Synthesizer configuration. This parameter controls the dimension of each column that has categorical or location encoding. If a column contains more distinct categories than this parameter, the most frequent categories are embedded as distinct one-hot vectors. The remaining categories are combined into a single one-hot vector.

Improved error identification for an invalid WHERE clause in subsetting configuration.

On the subsetting page, for tables that were not previously in the subset, the row count is now correctly represented as unknown instead of 0.

Fixed an issue with the Tonic update option in the Tonic application.

Configuring a subset target table to include 100% of the records no longer causes an error during data generation.

Removed connection pooling from Tonic workers to address database connection issues during data generation.

MongoDB

Added support for partial indexes.
Fixed an issue where the configured generators were not applied.
Indexes that use the collation option are now properly recreated in the destination database.
The subsetting user interface now uses the correct terminology for MongoDB.
Improved performance for Mongo subsetting when handling downstream tables.
Fixed percentage-based subsetting for Mongo versions before 4.4.2.

MySQL

Added support for HASH partition parallelization.
Running data generation on masked and passthrough tables with ranged sub-partitions no longer results in duplicated data.
Added support for parallel uploads with ordering.

Oracle

Added new environment variables (ORACLE_TRACE_LEVEL, ORACLE_TRACE_FILE_LOCATION, ORACLE_TRACE_FILE_MAX_SIZE, and ORACLE_TRACE_OPTION) to enable Oracle tracing.
Added support for datetime components in composite keys for subsetting.
Unique indexes are now detected and users cannot apply generators that might violate the enforced uniqueness.
Improved performance for data generation.
Improved handling of schema names during data generation.
Users are no longer incorrectly removed from the destination database.

PostGreSQL

Upgraded npgsql to address an issue with cross-schema types.

Spark

Added support for Kerberos authentication for HDFS with Spark / Livy.
Added support for repartition and coalesce options for Spark EMR and Livy.
Repartition and coalesce options can now be saved on Databricks.
Added support in Hive for varchar and char fields that have lengths.