Multi-Table Synthesis
Synthesize related tables with referential integrity — foreign keys, sequential generation, and full database synthesis.
Dataxid can synthesize entire databases — not just single tables.
Define your schema with Table objects, and synthesize_tables() handles
dependency ordering, training, primary key assignment, and foreign key remapping automatically.
from dataxid import Table
accounts = Table(accounts_df, primary_key="account_id")
transactions = Table(transactions_df, foreign_keys={"account_id": accounts})
synthetic = dataxid.synthesize_tables({
"accounts": accounts,
"transactions": transactions,
})Child tables are generated sequentially by default — preserving per-entity patterns (transaction counts, ordering, value distributions).
Table Class
Table wraps a DataFrame with schema information for multi-table synthesis.
from dataxid import Table
Table(
data=df, # Training DataFrame
primary_key="id", # Excluded from training, auto-assigned after generation
foreign_keys={"fk_col": parent}, # FK column → parent Table object
sequential=True, # Sequential generation (default: True)
sequence_by="fk_col", # Which FK to use for sequential context (N-parent only)
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data | DataFrame | required | Training data for this table |
primary_key | str | None | None | PK column — excluded from training, auto-assigned as 1-based integer after generation |
foreign_keys | dict[str, Table] | {} | Maps FK column name to parent Table object. IDE autocomplete, type-safe, typo → immediate error |
sequential | bool | True | When True and foreign_keys is set, child rows are generated conditioned on the parent — preserving correlations. Set to False for independent generation with FK remapping only |
sequence_by | str | None | None | Required when a table has multiple foreign keys and sequential=True. Specifies which FK relationship drives the sequential generation |
Two Tables
The simplest multi-table case: a parent and a child with a foreign key.
import dataxid
import pandas as pd
from dataxid import Table
dataxid.api_key = "dx_..."
accounts = pd.read_csv("accounts.csv")
transactions = pd.read_csv("transactions.csv")
accounts_tbl = Table(accounts, primary_key="account_id")
transactions_tbl = Table(transactions, foreign_keys={"account_id": accounts_tbl})
synthetic = dataxid.synthesize_tables({
"accounts": accounts_tbl,
"transactions": transactions_tbl,
})
synthetic["accounts"] # synthetic accounts with auto-assigned PKs
synthetic["transactions"] # synthetic transactions with valid FK referencesWhat happens behind the scenes:
- Tables are sorted in dependency order (parents first)
- Accounts are trained and generated as a flat table
account_idis excluded from training and auto-assigned (1, 2, 3, ...)- Transactions are trained with accounts as context — the model learns per-account patterns
- Transactions are generated conditioned on synthetic accounts
- FK values in transactions reference valid synthetic account PKs
Referential integrity
All generated FK values reference valid parent PKs.
Three or More Tables (Fan-Out)
Multiple child tables can reference the same parent. Each child is trained and generated independently, all referencing the same synthetic parent.
accounts = Table(accounts_df, primary_key="account_id")
synthetic = dataxid.synthesize_tables({
"accounts": accounts,
"transactions": Table(transactions_df,
foreign_keys={"account_id": accounts}),
"loans": Table(loans_df, primary_key="loan_id",
foreign_keys={"account_id": accounts}),
})
synthetic["accounts"] # 1 synthetic parent table
synthetic["transactions"] # sequential child — correlated with accounts
synthetic["loans"] # sequential child — correlated with accountsGeneration order is determined automatically via topological sort. Circular dependencies are detected and rejected at validation time.
N-Parent Tables
When a table has foreign keys to multiple parents, use sequence_by to specify
which relationship drives the sequential generation. The other FK is remapped for
referential integrity but does not influence the generation pattern.
customers = Table(customers_df, primary_key="customer_id")
products = Table(products_df, primary_key="product_id")
orders = Table(
orders_df,
foreign_keys={"customer_id": customers, "product_id": products},
sequence_by="customer_id", # generate order sequences per customer
)
synthetic = dataxid.synthesize_tables({
"customers": customers,
"products": products,
"orders": orders,
})customer_id→ sequential context (order patterns per customer are preserved)product_id→ FK remapped to valid synthetic product PKs
sequence_by is required for multiple FKs
If a table has more than one foreign key and sequential=True, you must specify sequence_by.
The SDK raises an error with the available options if omitted.
Flat Remap (No Correlation)
By default, foreign keys trigger sequential generation — the child table learns
per-entity patterns from the parent. If you only need referential integrity
without correlation (rare), set sequential=False:
products = Table(
products_df,
foreign_keys={"category_id": categories},
sequential=False,
)The table is generated independently. FK values are remapped to valid parent PKs after generation, but the model does not learn category-level patterns.
Low-Level API
synthesize_tables() is a convenience wrapper. For fine-grained control over
individual tables (custom config per table, reusing a trained model for multiple
generations), use Model.create() directly:
accounts = Table(accounts_df, primary_key="account_id")
synthetic = dataxid.synthesize_tables({
"accounts": accounts,
"transactions": Table(transactions_df,
foreign_keys={"account_id": accounts}),
})# Step 1: Train and generate accounts (flat)
acct_model = dataxid.Model.create(data=accounts_df.drop(columns=["account_id"]))
syn_accounts = acct_model.generate(n_samples=len(accounts_df))
syn_accounts.insert(0, "account_id", range(1, len(syn_accounts) + 1))
acct_model.delete()
# Step 2: Train transactions with accounts as parent (sequential)
tx_model = dataxid.Model.create(
data=transactions_df,
parent=accounts_df,
foreign_key="account_id",
)
syn_transactions = tx_model.generate(parent=syn_accounts)
tx_model.delete()Model.create() Parameters for Sequential
| Parameter | Type | Description |
|---|---|---|
parent | DataFrame | Parent table for context-aware generation |
foreign_key | str | FK column in data linking rows to parent — enables sequential mode |
parent_key | str | None | PK column in parent (inferred from foreign_key if column names match) |
parent_encoding_types | dict | None | Encoding overrides for parent columns |
Validation Rules
The SDK validates your schema before training starts. All errors are raised as
dataxid.InvalidRequestError with a descriptive message and the offending parameter.
| Rule | Error |
|---|---|
foreign_keys value is not a Table | foreign_keys values must be Table instances |
| FK column not in DataFrame | Column 'X' not found in DataFrame columns |
Parent Table has no primary_key | Referenced table must have a primary_key defined |
| Circular dependency | Circular dependency detected |
Multiple FKs + no sequence_by | Table has N foreign keys. Use sequence_by to specify which relationship to use |
sequential=False + sequence_by | sequence_by and sequential=False are mutually exclusive |
| FK target not in tables dict | references a Table object that is not in the tables dict |
How It Works
- Topological sort — Tables are ordered so parents are always processed before children
- PK exclusion — Primary key columns are dropped before training (the model doesn't learn ID patterns)
- Sequential training — Child tables are trained with parent data as context, learning per-entity distributions (transaction counts, value ranges, temporal patterns)
- Generation — Parents first, then children conditioned on synthetic parents
- PK auto-assignment — 1-based auto-increment integers assigned after generation
- FK remapping — Foreign keys in child tables are mapped to valid synthetic parent PKs