DataXID

Changelog

Release history and migration guides.

0.2.0 (2026-04-13)

Multi-table synthesis, API redesign, and developer experience improvements.

Multi-Table Synthesis

  • Table class — typed table definition with primary_key, foreign_keys, sequential, and sequence_by parameters
  • synthesize_tables() — synthesize related tables with automatic dependency ordering, PK auto-assignment, and FK remapping
  • Sequential generation — child tables are generated conditioned on the parent, preserving per-entity patterns (transaction counts, temporal ordering, value distributions)
  • Fan-out support — multiple child tables referencing the same parent (e.g. accounts → transactions + loans)
  • N-parent tablessequence_by for disambiguation when a table has multiple foreign keys
  • Flat remapsequential=False for independent generation with FK integrity only

API Redesign

  • foreign_keys parameter accepts Table objects instead of string references — IDE autocomplete, type-safe, typo → immediate error
  • context_key removed — sequential generation is automatic when foreign_keys is set
  • Model.create()foreign_key parameter (renamed from group_by), parent_key inference from FK column name

Developer Experience

  • Loggingdataxid.enable_logging() / dataxid.disable_logging() / DATAXID_LOG environment variable
  • ModelConfig — typed dataclass with IDE-discoverable fields, replaces untyped config dict (dict still works)
  • Datetime auto-detection — string columns with datetime-like names are automatically encoded as datetime
  • TrainingTimeoutError — raised when server-side training exceeds timeout
  • TrainingError — raised when training fails on the server

Migration from 0.1.0

No breaking changes. All 0.1.0 code works unchanged.

# 0.1.0 — still works
synthetic = dataxid.synthesize(data=df, n_samples=1000)

# 0.2.0 — new capabilities
from dataxid import Table
synthetic = dataxid.synthesize_tables({
    "accounts": Table(accounts_df, primary_key="account_id"),
    "transactions": Table(transactions_df, foreign_keys={"account_id": accounts_tbl}),
})

0.1.0 (2026-03-19)

Initial release.

  • dataxid.synthesize() — single-table synthetic data generation in one call
  • dataxid.Model.create() / model.generate() — step-by-step control for large datasets and custom config
  • Privacy by architecture — raw data never leaves your machine, only embeddings (64 floats/row) cross the API boundary
  • Error handling — typed exception hierarchy (AuthenticationError, RateLimitError, QuotaExceededError, etc.)
  • Automatic retries — exponential backoff for 5xx errors and rate limits
  • DATAXID_API_KEY environment variable support

On this page