API Reference

Base URL: https://api.dataxid.com

Authentication

All /v1/* endpoints require a Bearer token:

Authorization: Bearer dx_test_...

Key prefix	Environment
`dx_test_`	Sandbox (free, no billing)
`dx_live_`	Production (metered)

Requests without a valid key receive 401 with error code api_key_missing or api_key_invalid.

Response Envelope

Every successful response follows the same structure:

{
  "object": "<type>",
  "data": { ... },
  "metadata": {
    "timestamp": "2026-02-20T10:30:00Z"
  }
}

List endpoints include pagination metadata:

{
  "object": "list",
  "data": [ ... ],
  "metadata": {
    "timestamp": "2026-02-20T10:30:00Z",
    "has_more": true
  }
}

Error responses use a separate envelope — see Error Codes.

Common Headers

Request Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <api_key>`
`Content-Type`	Yes (POST)	`application/json`
`Idempotency-Key`	Recommended (POST)	Unique key for safe retries

Response Headers

Header	Description
`X-Request-Id`	Unique request identifier (include in support requests)
`X-RateLimit-Limit`	Requests allowed per window
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix epoch when the window resets
`Retry-After`	Seconds to wait (only on `429`)

Rate Limiting

Requests are rate-limited per organization using a sliding window counter. Default: 60 requests per minute.

When the limit is exceeded, the API returns 429 Too Many Requests with a Retry-After header. See rate_limit_exceeded.

Idempotency

All POST endpoints support the Idempotency-Key header. Sending the same key returns the cached response without re-executing the request. This makes retries safe after network failures.

Keys expire after 24 hours. Concurrent requests with the same key receive 409 Conflict.

The training flow (model creation, training, and generation) requires the Python SDK. The SDK handles data encoding, metadata extraction, and training automatically. Endpoints like Get Model, List Models, and Delete Model can be called directly via HTTP.

For SDK usage guides, see Quickstart, Multi-Table Synthesis, and Configuration.

Create Model

Registers a model for training. The SDK handles metadata extraction and the full training flow automatically.

POST /v1/models

Request Body

Field	Type	Required	Description
`metadata`	object	Yes	Generated by SDK — column statistics for model training
`metadata.cardinalities`	`object`	Yes	Vocabulary sizes per feature
`metadata.features`	`string[]`	Yes	Column names (min 1)
`metadata.column_stats`	`object`	No	Column statistics per feature (optional)
`metadata.value_mappings`	`object`	No	Value-to-code mappings per feature (optional)
`metadata.empirical_probs`	`object`	Yes	Probability distributions per feature
`config`	object	No	Training configuration (defaults below)
`config.model_size`	`string`	No	`"small"`, `"medium"`, or `"large"`. Default: `"medium"`. Also accepts `"S"`, `"M"`, `"L"`
`config.embedding_dim`	`integer`	No	Embedding dimension. Default: `64`
`config.batch_size`	`integer`	No	Training batch size. Default: `256`
`config.learning_rate`	`float`	No	Initial learning rate. Default: `0.001`
`config.max_epochs`	`integer`	No	Maximum training epochs. Default: `100`
`config.max_training_time`	`float`	No	Time limit in seconds. Default: `14400` (4h)
`config.early_stop_patience`	`integer`	No	Epochs without improvement before stopping. Default: `4`
`config.privacy_enabled`	`boolean`	No	Add noise to embeddings for privacy. Default: `false`
`config.privacy_noise`	`float`	No	Noise scale (Gaussian std). Default: `0.1`

import dataxid
import pandas as pd

dataxid.api_key = "dx_test_..."
df = pd.read_csv("data.csv")

model = dataxid.Model.create(data=df)
print(model.id)       # "mdl_a1b2c3d4e5f6"
print(model.status)   # "training" → "ready"

Response `201 Created`

{
  "object": "model",
  "data": {
    "id": "mdl_a1b2c3d4e5f6",
    "status": "training",
    "config": {
      "embedding_dim": 64,
      "model_size": "medium",
      "batch_size": 256,
      "learning_rate": 0.001,
      "max_epochs": 100,
      "max_training_time": 14400.0,
      "early_stop_patience": 4,
      "device": "cpu"
    },
    "current_epoch": 0,
    "train_loss": null,
    "val_loss": null,
    "created_at": "2026-02-20T10:30:00Z",
    "updated_at": "2026-02-20T10:30:00Z",
    "error": null
  },
  "metadata": {
    "timestamp": "2026-02-20T10:30:00Z"
  }
}

Errors

Code	Status	Cause
`parameter_invalid`	400	Invalid metadata or config
`api_key_missing`	401	No auth header
`api_key_invalid`	401	Invalid API key
`quota_exceeded`	402	Monthly usage quota exceeded
`rate_limit_exceeded`	429	Too many requests

Get Model

Retrieve a model's current status and training progress.

GET /v1/models/{model_id}

Path Parameters

Parameter	Type	Description
`model_id`	`string`	Model ID (e.g. `mdl_a1b2c3d4e5f6`)

curl https://api.dataxid.com/v1/models/mdl_a1b2c3d4e5f6 \
  -H "Authorization: Bearer dx_test_..."

status = model.refresh()
print(status["status"])       # "created" | "building" | "training" | "ready" | "failed"
print(status["current_epoch"])
print(status["val_loss"])

Response `200 OK`

{
  "object": "model",
  "data": {
    "id": "mdl_a1b2c3d4e5f6",
    "status": "ready",
    "config": { ... },
    "current_epoch": 42,
    "train_loss": 0.234,
    "val_loss": 0.256,
    "created_at": "2026-02-20T10:30:00Z",
    "updated_at": "2026-02-20T10:35:00Z",
    "error": null
  },
  "metadata": {
    "timestamp": "2026-02-20T10:35:00Z"
  }
}

Model Status Values

Status	Description
`created`	Model created, initializing
`building`	Model is being prepared
`training`	Model is training
`ready`	Training complete, model can generate
`failed`	Training failed — see `error` field

Errors

Code	Status	Cause
`model_not_found`	404	Model does not exist or belongs to another org

List Models

List models for your organization. Cursor-based pagination.

GET /v1/models

Query Parameters

Parameter	Type	Default	Description
`limit`	`integer`	`20`	Max results per page (1–100)
`starting_after`	`string`	—	Cursor: model ID to start after

# First page
curl "https://api.dataxid.com/v1/models?limit=10" \
  -H "Authorization: Bearer dx_test_..."

# Next page
curl "https://api.dataxid.com/v1/models?limit=10&starting_after=mdl_a1b2c3d4e5f6" \
  -H "Authorization: Bearer dx_test_..."

import httpx

resp = httpx.get(
    "https://api.dataxid.com/v1/models",
    params={"limit": 10},
    headers={"Authorization": "Bearer dx_test_..."},
)
data = resp.json()
models = data["data"]
has_more = data["metadata"]["has_more"]

Response `200 OK`

{
  "object": "list",
  "data": [
    {
      "id": "mdl_a1b2c3d4e5f6",
      "status": "ready",
      "config": { ... },
      "current_epoch": 42,
      "train_loss": 0.234,
      "val_loss": 0.256,
      "created_at": "2026-02-20T10:30:00Z",
      "updated_at": "2026-02-20T10:35:00Z",
      "error": null
    }
  ],
  "metadata": {
    "timestamp": "2026-02-20T10:35:00Z",
    "has_more": false
  }
}

Results are ordered by creation time (newest first). Use the last item's id as starting_after to fetch the next page.

Generate Synthetic Data

Generate synthetic data from a trained model. The SDK handles embedding generation and response decoding automatically.

POST /v1/models/{model_id}/generate

Path Parameters

Parameter	Type	Description
`model_id`	`string`	Model ID (must be in `training` or `ready` status)

Request Body

Field	Type	Required	Description
`embedding`	`array` or `object`	Yes	Generated by SDK — abstract representation of seed data
`temperature`	`float`	No	Sampling temperature. Default: `1.0`
`top_p`	`float`	No	Nucleus sampling threshold (0–1). Default: `null`

synthetic = model.generate(n_samples=1000)
print(synthetic.head())
#    age  income
# 0   34   52000
# 1   28   41000
# 2   45   67000

The SDK decodes the API response into a pandas DataFrame automatically.

Errors

Code	Status	Cause
`model_not_found`	404	Model does not exist
`model_not_ready`	400	Model in `created`, `building`, or `failed` state
`parameter_invalid`	400	Invalid request parameters
`quota_exceeded`	402	Monthly usage quota exceeded
`training_service_unavailable`	503	Training infrastructure temporarily unavailable

Delete Model

Delete a model and free server resources.

DELETE /v1/models/{model_id}

Path Parameters

Parameter	Type	Description
`model_id`	`string`	Model ID

curl -X DELETE https://api.dataxid.com/v1/models/mdl_a1b2c3d4e5f6 \
  -H "Authorization: Bearer dx_test_..."

model.delete()

Response `204 No Content`

Empty response body.

Errors

Code	Status	Cause
`model_not_found`	404	Model does not exist or belongs to another org

Authentication

Response Envelope

Common Headers

Request Headers

Response Headers

Rate Limiting

Idempotency

Endpoints

Create Model

Request Body

Response `201 Created`

Errors

Get Model

Path Parameters

Response `200 OK`

Model Status Values

Errors

List Models

Query Parameters

Response `200 OK`

Generate Synthetic Data

Path Parameters

Request Body

Errors

Delete Model

Path Parameters

Response `204 No Content`

Errors

See Also

On this page