Quickstart
Generate your first synthetic data in 5 minutes.
1. Install the SDK
pip install dataxidThe SDK encodes data locally before sending to the API.
2. Set Your API Key
import dataxid
dataxid.api_key = "dx_test_..."Or set the environment variable (useful for CI/CD):
export DATAXID_API_KEY="dx_test_..."API Keys
Sign up at app.dataxid.com to get your API key.
3. Generate Synthetic Data
One-liner (small datasets)
import dataxid
import pandas as pd
dataxid.api_key = "dx_test_..."
df = pd.read_csv("customers.csv")
synthetic = dataxid.synthesize(data=df, n_samples=1000)
print(synthetic.head())That's it. Behind the scenes:
- SDK processes your data locally → abstract embeddings
- Embeddings are sent to the API → model trains on the statistical structure
- Synthetic data is generated and returned as a DataFrame
Only embeddings cross the wire — not raw data.
Step-by-step (large datasets, custom config)
For more control over training and generation:
import dataxid
import pandas as pd
dataxid.api_key = "dx_test_..."
df = pd.read_csv("transactions.csv")
model = dataxid.Model.create(
data=df,
config=dataxid.ModelConfig(
model_size="large",
max_epochs=200,
),
)
synthetic_1k = model.generate(n_samples=1000)
synthetic_10k = model.generate(n_samples=10000)
model.delete()Plain dict also works for quick experiments:
model = dataxid.Model.create(
data=df,
config={"model_size": "large", "max_epochs": 200},
)4. Synthesize Related Tables
Have multiple tables with foreign keys? synthesize_tables() generates them
with valid foreign key references:
from dataxid import Table
accounts = Table(accounts_df, primary_key="account_id")
transactions = Table(transactions_df, foreign_keys={"account_id": accounts})
synthetic = dataxid.synthesize_tables({
"accounts": accounts,
"transactions": transactions,
})
synthetic["accounts"] # synthetic accounts with auto-assigned PKs
synthetic["transactions"] # synthetic transactions — per-account patterns preservedChild tables are generated sequentially by default — the model learns per-entity patterns (transaction counts, temporal ordering, value distributions).
See Multi-Table Synthesis for fan-out schemas, N-parent
tables, and the low-level Model.create() API.
5. Enable Logging
See what the SDK is doing during training:
dataxid.enable_logging("info") # training progress, epoch stats
dataxid.enable_logging("debug") # verbose — includes HTTP requests
dataxid.disable_logging() # turn off (default state)Or via environment variable:
DATAXID_LOG=info python my_script.pyNext Steps
- Multi-Table Synthesis — Related tables, foreign keys, full database generation
- Configuration — Model config, privacy settings, advanced tuning
- API Reference — Full endpoint documentation
- Error Codes — Troubleshoot API errors