Quickstart
Generate your first synthetic data in 5 minutes.
1. Install the SDK
pip install dataxidRaw data never leaves your machine — the SDK processes everything locally.
2. Set Your API Key
import dataxid
dataxid.api_key = "dx_test_..."Or set the environment variable (useful for CI/CD):
export DATAXID_API_KEY="dx_test_..."API Keys
Sign up at app.dataxid.com to get your API key.
3. Generate Synthetic Data
One-liner (small datasets)
import dataxid
import pandas as pd
dataxid.api_key = "dx_test_..."
df = pd.read_csv("customers.csv")
synthetic = dataxid.synthesize(data=df, n_samples=1000)
print(synthetic.head())That's it. Behind the scenes:
- SDK processes your data locally → abstract embeddings
- Embeddings are sent to the API → model trains on the statistical structure
- Synthetic data is generated and returned as a DataFrame
Your raw data never crosses the wire.
Step-by-step (large datasets, custom config)
For more control over training and generation:
import dataxid
import pandas as pd
dataxid.api_key = "dx_test_..."
df = pd.read_csv("transactions.csv")
model = dataxid.Model.create(
data=df,
config=dataxid.ModelConfig(
model_size="large",
max_epochs=200,
),
)
synthetic_1k = model.generate(n_samples=1000)
synthetic_10k = model.generate(n_samples=10000)
model.delete()Plain dict also works for quick experiments:
model = dataxid.Model.create(
data=df,
config={"model_size": "large", "max_epochs": 200},
)Next Steps
- API Reference — Full endpoint documentation
- Error Codes — Troubleshoot API errors