DataXID

Quickstart

Generate your first synthetic data in 5 minutes.

1. Install the SDK

pip install dataxid

Raw data never leaves your machine — the SDK processes everything locally.

2. Set Your API Key

import dataxid

dataxid.api_key = "dx_test_..."

Or set the environment variable (useful for CI/CD):

export DATAXID_API_KEY="dx_test_..."

API Keys

Sign up at app.dataxid.com to get your API key.

3. Generate Synthetic Data

One-liner (small datasets)

import dataxid
import pandas as pd

dataxid.api_key = "dx_test_..."
df = pd.read_csv("customers.csv")

synthetic = dataxid.synthesize(data=df, n_samples=1000)
print(synthetic.head())

That's it. Behind the scenes:

  1. SDK processes your data locally → abstract embeddings
  2. Embeddings are sent to the API → model trains on the statistical structure
  3. Synthetic data is generated and returned as a DataFrame

Your raw data never crosses the wire.

Step-by-step (large datasets, custom config)

For more control over training and generation:

import dataxid
import pandas as pd

dataxid.api_key = "dx_test_..."
df = pd.read_csv("transactions.csv")

model = dataxid.Model.create(
    data=df,
    config=dataxid.ModelConfig(
        model_size="large",
        max_epochs=200,
    ),
)

synthetic_1k = model.generate(n_samples=1000)
synthetic_10k = model.generate(n_samples=10000)

model.delete()

Plain dict also works for quick experiments:

model = dataxid.Model.create(
    data=df,
    config={"model_size": "large", "max_epochs": 200},
)

Next Steps

On this page