API Reference
Full endpoint documentation for the DataXID API.
Base URL: https://api.dataxid.com
Authentication
All /v1/* endpoints require a Bearer token:
Authorization: Bearer dx_test_...| Key prefix | Environment |
|---|---|
dx_test_ | Sandbox (free, no billing) |
dx_live_ | Production (metered) |
Requests without a valid key receive 401 with error code
api_key_missing or api_key_invalid.
Response Envelope
Every successful response follows the same structure:
{
"object": "<type>",
"data": { ... },
"metadata": {
"timestamp": "2026-02-20T10:30:00Z"
}
}List endpoints include pagination metadata:
{
"object": "list",
"data": [ ... ],
"metadata": {
"timestamp": "2026-02-20T10:30:00Z",
"has_more": true
}
}Error responses use a separate envelope — see Error Codes.
Common Headers
Request Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <api_key> |
Content-Type | Yes (POST) | application/json |
Idempotency-Key | Recommended (POST) | Unique key for safe retries |
Response Headers
| Header | Description |
|---|---|
X-Request-Id | Unique request identifier (include in support requests) |
X-RateLimit-Limit | Requests allowed per window |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix epoch when the window resets |
Retry-After | Seconds to wait (only on 429) |
Rate Limiting
Requests are rate-limited per organization using a sliding window counter. Default: 60 requests per minute.
When the limit is exceeded, the API returns 429 Too Many Requests with a
Retry-After header. See rate_limit_exceeded.
Idempotency
All POST endpoints support the Idempotency-Key header. Sending the same key
returns the cached response without re-executing the request. This makes retries
safe after network failures.
Keys expire after 24 hours. Concurrent requests with the same key receive 409 Conflict.
Endpoints
SDK-first API
The training flow (model creation, training, and generation) requires the Python SDK. The SDK handles data encoding, metadata extraction, and training automatically. Endpoints like Get Model, List Models, and Delete Model can be called directly via HTTP.
Create Model
Registers a model for training. The SDK handles metadata extraction and the full training flow automatically.
POST /v1/modelsRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
metadata | object | Yes | Generated by SDK — column statistics for model training |
metadata.cardinalities | object | Yes | Vocabulary sizes per feature |
metadata.features | string[] | Yes | Column names (min 1) |
metadata.column_stats | object | No | Column statistics per feature (optional) |
metadata.value_mappings | object | No | Value-to-code mappings per feature (optional) |
metadata.empirical_probs | object | Yes | Probability distributions per feature |
config | object | No | Training configuration (defaults below) |
config.model_size | string | No | "small", "medium", or "large". Default: "medium". Also accepts "S", "M", "L" |
config.embedding_dim | integer | No | Embedding dimension. Default: 64 |
config.batch_size | integer | No | Training batch size. Default: 256 |
config.learning_rate | float | No | Initial learning rate. Default: 0.001 |
config.max_epochs | integer | No | Maximum training epochs. Default: 100 |
config.max_training_time | float | No | Time limit in seconds. Default: 14400 (4h) |
config.early_stop_patience | integer | No | Epochs without improvement before stopping. Default: 4 |
config.privacy_enabled | boolean | No | Add noise to embeddings for privacy. Default: false |
config.privacy_noise | float | No | Noise scale (Gaussian std). Default: 0.1 |
import dataxid
import pandas as pd
dataxid.api_key = "dx_test_..."
df = pd.read_csv("data.csv")
model = dataxid.Model.create(data=df)
print(model.id) # "mdl_a1b2c3d4e5f6"
print(model.status) # "training" → "ready"Response 201 Created
{
"object": "model",
"data": {
"id": "mdl_a1b2c3d4e5f6",
"status": "training",
"config": {
"embedding_dim": 64,
"model_size": "medium",
"batch_size": 256,
"learning_rate": 0.001,
"max_epochs": 100,
"max_training_time": 14400.0,
"early_stop_patience": 4,
"device": "cpu"
},
"current_epoch": 0,
"train_loss": null,
"val_loss": null,
"created_at": "2026-02-20T10:30:00Z",
"updated_at": "2026-02-20T10:30:00Z",
"error": null
},
"metadata": {
"timestamp": "2026-02-20T10:30:00Z"
}
}Errors
| Code | Status | Cause |
|---|---|---|
parameter_invalid | 400 | Invalid metadata or config |
api_key_missing | 401 | No auth header |
api_key_invalid | 401 | Invalid API key |
quota_exceeded | 402 | Monthly usage quota exceeded |
rate_limit_exceeded | 429 | Too many requests |
Get Model
Retrieve a model's current status and training progress.
GET /v1/models/{model_id}Path Parameters
| Parameter | Type | Description |
|---|---|---|
model_id | string | Model ID (e.g. mdl_a1b2c3d4e5f6) |
curl https://api.dataxid.com/v1/models/mdl_a1b2c3d4e5f6 \
-H "Authorization: Bearer dx_test_..."status = model.refresh()
print(status["status"]) # "created" | "building" | "training" | "ready" | "failed"
print(status["current_epoch"])
print(status["val_loss"])Response 200 OK
{
"object": "model",
"data": {
"id": "mdl_a1b2c3d4e5f6",
"status": "ready",
"config": { ... },
"current_epoch": 42,
"train_loss": 0.234,
"val_loss": 0.256,
"created_at": "2026-02-20T10:30:00Z",
"updated_at": "2026-02-20T10:35:00Z",
"error": null
},
"metadata": {
"timestamp": "2026-02-20T10:35:00Z"
}
}Model Status Values
| Status | Description |
|---|---|
created | Model created, initializing |
building | Model is being prepared |
training | Model is training |
ready | Training complete, model can generate |
failed | Training failed — see error field |
Errors
| Code | Status | Cause |
|---|---|---|
model_not_found | 404 | Model does not exist or belongs to another org |
List Models
List models for your organization. Cursor-based pagination.
GET /v1/modelsQuery Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | integer | 20 | Max results per page (1–100) |
starting_after | string | — | Cursor: model ID to start after |
# First page
curl "https://api.dataxid.com/v1/models?limit=10" \
-H "Authorization: Bearer dx_test_..."
# Next page
curl "https://api.dataxid.com/v1/models?limit=10&starting_after=mdl_a1b2c3d4e5f6" \
-H "Authorization: Bearer dx_test_..."import httpx
resp = httpx.get(
"https://api.dataxid.com/v1/models",
params={"limit": 10},
headers={"Authorization": "Bearer dx_test_..."},
)
data = resp.json()
models = data["data"]
has_more = data["metadata"]["has_more"]Response 200 OK
{
"object": "list",
"data": [
{
"id": "mdl_a1b2c3d4e5f6",
"status": "ready",
"config": { ... },
"current_epoch": 42,
"train_loss": 0.234,
"val_loss": 0.256,
"created_at": "2026-02-20T10:30:00Z",
"updated_at": "2026-02-20T10:35:00Z",
"error": null
}
],
"metadata": {
"timestamp": "2026-02-20T10:35:00Z",
"has_more": false
}
}Results are ordered by creation time (newest first). Use the last item's id as starting_after to fetch the next page.
Generate Synthetic Data
Generate synthetic data from a trained model. The SDK handles embedding generation and response decoding automatically.
POST /v1/models/{model_id}/generatePath Parameters
| Parameter | Type | Description |
|---|---|---|
model_id | string | Model ID (must be in training or ready status) |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
embedding | array or object | Yes | Generated by SDK — abstract representation of seed data |
temperature | float | No | Sampling temperature. Default: 1.0 |
top_p | float | No | Nucleus sampling threshold (0–1). Default: null |
synthetic = model.generate(n_samples=1000)
print(synthetic.head())
# age income
# 0 34 52000
# 1 28 41000
# 2 45 67000The SDK decodes the API response into a pandas DataFrame automatically.
Errors
| Code | Status | Cause |
|---|---|---|
model_not_found | 404 | Model does not exist |
model_not_ready | 400 | Model in created, building, or failed state |
parameter_invalid | 400 | Invalid request parameters |
quota_exceeded | 402 | Monthly usage quota exceeded |
training_service_unavailable | 503 | Training infrastructure temporarily unavailable |
Delete Model
Delete a model and free server resources.
DELETE /v1/models/{model_id}Path Parameters
| Parameter | Type | Description |
|---|---|---|
model_id | string | Model ID |
curl -X DELETE https://api.dataxid.com/v1/models/mdl_a1b2c3d4e5f6 \
-H "Authorization: Bearer dx_test_..."model.delete()Response 204 No Content
Empty response body.
Errors
| Code | Status | Cause |
|---|---|---|
model_not_found | 404 | Model does not exist or belongs to another org |