DataXID

API Reference

Full endpoint documentation for the DataXID API.

Base URL: https://api.dataxid.com


Authentication

All /v1/* endpoints require a Bearer token:

Authorization: Bearer dx_test_...
Key prefixEnvironment
dx_test_Sandbox (free, no billing)
dx_live_Production (metered)

Requests without a valid key receive 401 with error code api_key_missing or api_key_invalid.


Response Envelope

Every successful response follows the same structure:

{
  "object": "<type>",
  "data": { ... },
  "metadata": {
    "timestamp": "2026-02-20T10:30:00Z"
  }
}

List endpoints include pagination metadata:

{
  "object": "list",
  "data": [ ... ],
  "metadata": {
    "timestamp": "2026-02-20T10:30:00Z",
    "has_more": true
  }
}

Error responses use a separate envelope — see Error Codes.


Common Headers

Request Headers

HeaderRequiredDescription
AuthorizationYesBearer <api_key>
Content-TypeYes (POST)application/json
Idempotency-KeyRecommended (POST)Unique key for safe retries

Response Headers

HeaderDescription
X-Request-IdUnique request identifier (include in support requests)
X-RateLimit-LimitRequests allowed per window
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix epoch when the window resets
Retry-AfterSeconds to wait (only on 429)

Rate Limiting

Requests are rate-limited per organization using a sliding window counter. Default: 60 requests per minute.

When the limit is exceeded, the API returns 429 Too Many Requests with a Retry-After header. See rate_limit_exceeded.


Idempotency

All POST endpoints support the Idempotency-Key header. Sending the same key returns the cached response without re-executing the request. This makes retries safe after network failures.

Keys expire after 24 hours. Concurrent requests with the same key receive 409 Conflict.


Endpoints

SDK-first API

The training flow (model creation, training, and generation) requires the Python SDK. The SDK handles data encoding, metadata extraction, and training automatically. Endpoints like Get Model, List Models, and Delete Model can be called directly via HTTP.

Create Model

Registers a model for training. The SDK handles metadata extraction and the full training flow automatically.

POST /v1/models

Request Body

FieldTypeRequiredDescription
metadataobjectYesGenerated by SDK — column statistics for model training
metadata.cardinalitiesobjectYesVocabulary sizes per feature
metadata.featuresstring[]YesColumn names (min 1)
metadata.column_statsobjectNoColumn statistics per feature (optional)
metadata.value_mappingsobjectNoValue-to-code mappings per feature (optional)
metadata.empirical_probsobjectYesProbability distributions per feature
configobjectNoTraining configuration (defaults below)
config.model_sizestringNo"small", "medium", or "large". Default: "medium". Also accepts "S", "M", "L"
config.embedding_dimintegerNoEmbedding dimension. Default: 64
config.batch_sizeintegerNoTraining batch size. Default: 256
config.learning_ratefloatNoInitial learning rate. Default: 0.001
config.max_epochsintegerNoMaximum training epochs. Default: 100
config.max_training_timefloatNoTime limit in seconds. Default: 14400 (4h)
config.early_stop_patienceintegerNoEpochs without improvement before stopping. Default: 4
config.privacy_enabledbooleanNoAdd noise to embeddings for privacy. Default: false
config.privacy_noisefloatNoNoise scale (Gaussian std). Default: 0.1
import dataxid
import pandas as pd

dataxid.api_key = "dx_test_..."
df = pd.read_csv("data.csv")

model = dataxid.Model.create(data=df)
print(model.id)       # "mdl_a1b2c3d4e5f6"
print(model.status)   # "training" → "ready"

Response 201 Created

{
  "object": "model",
  "data": {
    "id": "mdl_a1b2c3d4e5f6",
    "status": "training",
    "config": {
      "embedding_dim": 64,
      "model_size": "medium",
      "batch_size": 256,
      "learning_rate": 0.001,
      "max_epochs": 100,
      "max_training_time": 14400.0,
      "early_stop_patience": 4,
      "device": "cpu"
    },
    "current_epoch": 0,
    "train_loss": null,
    "val_loss": null,
    "created_at": "2026-02-20T10:30:00Z",
    "updated_at": "2026-02-20T10:30:00Z",
    "error": null
  },
  "metadata": {
    "timestamp": "2026-02-20T10:30:00Z"
  }
}

Errors

CodeStatusCause
parameter_invalid400Invalid metadata or config
api_key_missing401No auth header
api_key_invalid401Invalid API key
quota_exceeded402Monthly usage quota exceeded
rate_limit_exceeded429Too many requests

Get Model

Retrieve a model's current status and training progress.

GET /v1/models/{model_id}

Path Parameters

ParameterTypeDescription
model_idstringModel ID (e.g. mdl_a1b2c3d4e5f6)
curl https://api.dataxid.com/v1/models/mdl_a1b2c3d4e5f6 \
  -H "Authorization: Bearer dx_test_..."
status = model.refresh()
print(status["status"])       # "created" | "building" | "training" | "ready" | "failed"
print(status["current_epoch"])
print(status["val_loss"])

Response 200 OK

{
  "object": "model",
  "data": {
    "id": "mdl_a1b2c3d4e5f6",
    "status": "ready",
    "config": { ... },
    "current_epoch": 42,
    "train_loss": 0.234,
    "val_loss": 0.256,
    "created_at": "2026-02-20T10:30:00Z",
    "updated_at": "2026-02-20T10:35:00Z",
    "error": null
  },
  "metadata": {
    "timestamp": "2026-02-20T10:35:00Z"
  }
}

Model Status Values

StatusDescription
createdModel created, initializing
buildingModel is being prepared
trainingModel is training
readyTraining complete, model can generate
failedTraining failed — see error field

Errors

CodeStatusCause
model_not_found404Model does not exist or belongs to another org

List Models

List models for your organization. Cursor-based pagination.

GET /v1/models

Query Parameters

ParameterTypeDefaultDescription
limitinteger20Max results per page (1–100)
starting_afterstringCursor: model ID to start after
# First page
curl "https://api.dataxid.com/v1/models?limit=10" \
  -H "Authorization: Bearer dx_test_..."

# Next page
curl "https://api.dataxid.com/v1/models?limit=10&starting_after=mdl_a1b2c3d4e5f6" \
  -H "Authorization: Bearer dx_test_..."
import httpx

resp = httpx.get(
    "https://api.dataxid.com/v1/models",
    params={"limit": 10},
    headers={"Authorization": "Bearer dx_test_..."},
)
data = resp.json()
models = data["data"]
has_more = data["metadata"]["has_more"]

Response 200 OK

{
  "object": "list",
  "data": [
    {
      "id": "mdl_a1b2c3d4e5f6",
      "status": "ready",
      "config": { ... },
      "current_epoch": 42,
      "train_loss": 0.234,
      "val_loss": 0.256,
      "created_at": "2026-02-20T10:30:00Z",
      "updated_at": "2026-02-20T10:35:00Z",
      "error": null
    }
  ],
  "metadata": {
    "timestamp": "2026-02-20T10:35:00Z",
    "has_more": false
  }
}

Results are ordered by creation time (newest first). Use the last item's id as starting_after to fetch the next page.


Generate Synthetic Data

Generate synthetic data from a trained model. The SDK handles embedding generation and response decoding automatically.

POST /v1/models/{model_id}/generate

Path Parameters

ParameterTypeDescription
model_idstringModel ID (must be in training or ready status)

Request Body

FieldTypeRequiredDescription
embeddingarray or objectYesGenerated by SDK — abstract representation of seed data
temperaturefloatNoSampling temperature. Default: 1.0
top_pfloatNoNucleus sampling threshold (0–1). Default: null
synthetic = model.generate(n_samples=1000)
print(synthetic.head())
#    age  income
# 0   34   52000
# 1   28   41000
# 2   45   67000

The SDK decodes the API response into a pandas DataFrame automatically.

Errors

CodeStatusCause
model_not_found404Model does not exist
model_not_ready400Model in created, building, or failed state
parameter_invalid400Invalid request parameters
quota_exceeded402Monthly usage quota exceeded
training_service_unavailable503Training infrastructure temporarily unavailable

Delete Model

Delete a model and free server resources.

DELETE /v1/models/{model_id}

Path Parameters

ParameterTypeDescription
model_idstringModel ID
curl -X DELETE https://api.dataxid.com/v1/models/mdl_a1b2c3d4e5f6 \
  -H "Authorization: Bearer dx_test_..."
model.delete()

Response 204 No Content

Empty response body.

Errors

CodeStatusCause
model_not_found404Model does not exist or belongs to another org

On this page