# dTelecom x402 Gateway — API Reference

## Base URL

```
https://x402.dtelecom.org
```

## Why This Gateway

Most real-time communication APIs require accounts, API keys, and complex billing setups. This gateway is different:

- **Agent-native** — No API keys, no accounts. Any AI agent with a USDC wallet can discover, pay, and use services autonomously via x402 or MPP.
- **Three services, one gateway** — WebRTC, STT, and TTS through a single payment and session system. Bundle them with Agent Sessions.
- **Usage-based billing** — Pay only for what you use. Unused credits are refunded when sessions end.
- **Decentralized infrastructure** — Built on dTelecom DePIN, a decentralized real-time communication network.

---

## Quick Start

```bash
npm install @dtelecom/x402-client viem
```

```typescript
import { createGateway } from "@dtelecom/x402-client";
import { privateKeyToAccount } from "viem/accounts";

const wallet = privateKeyToAccount("0x...");
const gw = createGateway({ wallet, chain: "base" });

// Purchase credits (x402 payment handled automatically)
await gw.purchaseCredits({ amountUsd: 0.10 });

// One call → WebRTC + STT + TTS
const session = await gw.createAgentSession({
  durationMinutes: 5,
  clientIdentity: "user-42",
});

// session.webrtc.agent  — agent WebRTC token + ws_url
// session.webrtc.client — client WebRTC token + ws_url
// session.stt           — STT token + server_url
// session.tts           — TTS token + server_url
```

Build voice agents with [`@dtelecom/agents-js`](https://www.npmjs.com/package/@dtelecom/agents-js) — includes DtelecomSTT and DtelecomTTS providers that connect directly to gateway sessions.

---

## SDKs & Libraries

| Package | Description | Links |
|---------|-------------|-------|
| `@dtelecom/x402-client` | Gateway API client — purchase credits, create sessions, x402 payment | [npm](https://www.npmjs.com/package/@dtelecom/x402-client) · [GitHub](https://github.com/dTelecom/x402-client) |
| `@dtelecom/agents-js` | Voice agent framework — DtelecomSTT/DtelecomTTS providers, pipeline orchestration | [npm](https://www.npmjs.com/package/@dtelecom/agents-js) · [GitHub](https://github.com/dTelecom/agents-js) |

---

## Example: AI Language Tutor

A full-stack voice AI demo built on this gateway.

- **Live demo:** [ai-tutor-demo.dtelecom.org](https://ai-tutor-demo.dtelecom.org/)
- **Source:** [github.com/dTelecom/ai-tutor-demo](https://github.com/dTelecom/ai-tutor-demo)

Architecture: Next.js server calls `createAgentSession()` with `clientIdentity`, spawns an agent process with the returned tokens (WebRTC + STT + TTS). The client connects via the client WebRTC token.

---

## Authentication

All `/v1/*` endpoints (except `/v1/pricing` and `/v1/servers/status`) require wallet signature authentication.

### Headers

| Header | Description |
|--------|-------------|
| `Authorization` | `<chain>:<signature>` (e.g. `solana:<base64sig>`) |
| `X-Wallet-Address` | Your wallet address |
| `X-Wallet-Chain` | `solana` or `evm` |
| `X-Timestamp` | Unix epoch seconds |

### Signed Message Format

```
METHOD\nPATH\nTIMESTAMP
```

Example: `POST\n/v1/webrtc/token\n1700000000`

Timestamp must be within ±30 seconds of server time.

---

## Pricing

| Service | Rate | Microcredits | USD |
|---------|------|-------------|-----|
| WebRTC | per audio participant minute | 1,000 | $0.001 |
| STT | per minute | 6,000 | $0.006 |
| TTS | per 1K chars | 8,000 | $0.008 |

1 USD = 1,000,000 microcredits. Minimum purchase: $0.1. Payment in USDC on Solana, Base (EVM), or Tempo (MPP).

### Full-Stack Voice Agent Comparison

| Voice Agent Stack | x402 Gateway | DIY (assemble yourself) |
|-------------------|-------------|------------------------|
| WebRTC audio (2 participants) | $0.002/min pay-as-you-go | Agora $0.002/min |
| ↳ Bundle option | from $0.0001/min via [cloud.dtelecom.org](https://cloud.dtelecom.org) | $400/mo, unlimited minutes |
| STT | $0.006/min | Deepgram $0.0077/min |
| TTS (~500 chars/min) | ~$0.004/min | ElevenLabs ~$0.15/min |
| **Total** | **~$0.012/min** (or ~$0.010 w/ bundle) | **~$0.16+/min** |
| Integration | 1 API call, 1 bill, crypto | 3 accounts, 3 APIs |

---

## Public Endpoints

### GET /v1/pricing

Returns current pricing and limits.

**Response:**
```json
{
  "microcredits_per_usd": 1000000,
  "rates": {
    "webrtc": { "unit": "per_minute", "microcredits": 1000, "usd": 0.001 },
    "stt": { "unit": "per_minute", "microcredits": 6000, "usd": 0.006 },
    "tts": { "unit": "per_1k_characters", "microcredits": 8000, "usd": 0.008 }
  }
}
```

### GET /v1/servers/status

Returns aggregated server availability by service type.

---

## Credits

### POST /v1/credits/purchase

Purchase credits via x402 payment (USDC on Solana or Base). The `X-Payment` header must contain a valid x402 proof.

**Body:**
```json
{
  "wallet_address": "string",
  "wallet_chain": "solana" | "evm",
  "amount_usd": number
}
```

**Response:**
```json
{
  "account_id": "uuid",
  "credited_microcredits": "string",
  "amount_usd": number
}
```

### POST /v1/credits/purchase/mpp

Purchase credits via MPP payment (USDC on Tempo). Returns a 402 challenge if no valid credential is provided. Use the `mppx` client SDK for automatic payment handling.

**Body:**
```json
{
  "wallet_address": "string",
  "wallet_chain": "evm" | "tempo",
  "amount_usd": number
}
```

**Response:** Same as `/v1/credits/purchase`.

### POST /v1/credits/purchase/tron

Purchase credits via TRON USDT (TRC-20) through the [Bank of AI](https://docs.bankofai.io) x402 facilitator. Returns a 402 challenge with `PAYMENT-REQUIRED` header if no valid `PAYMENT-SIGNATURE` is provided. The buyer wallet is taken from the signed payment payload (not the request body) — credits go to whoever actually paid. Use the `bankofai-x402` Python SDK or compatible client for automatic payment handling.

**Scheme:** `exact_permit` on `tron:mainnet` (USDT contract `TR7NHqjeKQxGTCi8q8ZY4pL8otSzgjLj6t`).

**Body:**
```json
{
  "amount_usd": number
}
```

**Response:** Same as `/v1/credits/purchase`, plus `tron_tx_hash` (the on-chain settlement tx).

---

## Account

### GET /v1/account

Returns account details and balances.

### GET /v1/account/transactions?limit=50&offset=0

Returns paginated credit transaction history.

### GET /v1/account/sessions?limit=50&offset=0&status=active

Returns paginated session list. Optional `status` filter.

---

## WebRTC — dTelecom DePIN SFU

dTelecom is a decentralized WebRTC platform — a **LiveKit fork** with Solana-based node discovery and Ed25519 JWT signing. Instead of centralized servers, decentralized SFU nodes register on a Solana registry for discovery. Horizontally scalable with P2P broadcast for write operations.

- **Decentralized SFU** — Nodes discovered via Solana registry, not centralized
- **SVC/simulcast** — Scalable Video Coding with adaptive bitrate
- **E2EE** — End-to-end encryption support
- **Ed25519 JWT signing** — Cryptographic authentication
- **SDKs** — JavaScript/TypeScript, React (pre-built components: VideoConference, Chat, GridLayout)
- **Features** — Speaker detection, selective subscription, moderation, data messages
- **Webhooks** — Room lifecycle (created, finished), participant events (joined, left)
- **Room management** — Via server SDK, multi-participant rooms with metadata

### POST /v1/webrtc/token

Create a WebRTC session.

**Body:**
```json
{
  "room_name": "string",
  "participant_identity": "string",
  "duration_minutes": number,
  "metadata": "string (optional)",
  "client_ip": "string (optional)"
}
```

**`client_ip` — geo-routing for `ws_url`:**
The returned `ws_url` points to the nearest SFU node. By default, the gateway uses the requester's IP (from `X-Forwarded-For`) to determine location. When your **server** creates tokens on behalf of **end users**, pass the user's IP as `client_ip` so the SFU is chosen near the user, not your server.

- Server creating a token for itself (e.g. AI agent) → omit `client_ip` (server IP is correct)
- Server creating a token for a remote user → set `client_ip` to the user's IP (from your own `X-Forwarded-For`)

**Response:**
```json
{
  "session_id": "uuid",
  "token": "jwt",
  "ws_url": "wss://...",
  "expires_at": "ISO8601"
}
```

### POST /v1/webrtc/token/extend

**Body:** `{ "session_id": "uuid", "additional_minutes": number }`

---

## Speech-to-Text

### Capabilities

**Dual-engine architecture:**
- **Parakeet-TDT 0.6B** — 25 European languages, 3-4x faster than Whisper, native auto-detect
- **Whisper large-v3-turbo** — 99+ languages, contextual prompting, 809M parameters

**Smart Routing Logic:**
1. `force_model=whisper` in config → always Whisper
2. `language=auto` → Parakeet preferred (native auto-detect); Whisper if no Parakeet available
3. Language not in Parakeet set → Whisper
4. Parakeet available → Parakeet (preferred, 3-4x faster)
5. Parakeet busy, Whisper available → Whisper fallback
6. Both busy → queue to Parakeet (faster inference = shorter wait)

**Audio intelligence pipeline:**
1. Silero VAD (per-client, Apple Neural Engine) — speech threshold 0.45, 250ms min speech, adaptive silence
2. GTCRN noise reduction — 48K parameter neural denoiser, ONNX on CPU
3. Speech validation — reject non-speech audio (VAD confidence, speech ratio, RMS checks)
4. Intelligent trimming — keep speech region + 300ms padding
5. Hallucination filter — low-energy phrase check, text-to-audio ratio, known patterns, repetition detection, de-looping

**Clock-pause billing:**
- Billing clock pauses on WebSocket disconnect
- Reconnect with same `session_key` before expiry to resume without losing paid time
- `session_expiring` warnings sent at 60s and 10s before expiry

**Parakeet-TDT language performance (WER, FLEURS benchmark):**

| Language | Code | WER | Tier |
|----------|------|-----|------|
| Italian | it | 3.0% | Excellent |
| Spanish | es | 3.5% | Excellent |
| Portuguese | pt | 4.8% | Excellent |
| English | en | 4.9% | Excellent |
| German | de | 5.0% | Excellent |
| French | fr | 5.2% | Excellent |
| Russian | ru | 5.5% | Excellent |
| Ukrainian | uk | 6.8% | Very Good |
| Polish | pl | 7.3% | Very Good |
| Dutch | nl | 7.5% | Very Good |
| Slovak | sk | 8.8% | Good |
| Czech | cs | 11.0% | Good |
| Romanian | ro | 12.4% | Good |
| Croatian | hr | 12.5% | Good |
| Bulgarian | bg | 12.6% | Good |
| Finnish | fi | 13.2% | Good |
| Swedish | sv | 15.1% | Fair |
| Hungarian | hu | 15.7% | Fair |
| Estonian | et | 17.7% | Fair |
| Danish | da | 18.4% | Fair |
| Lithuanian | lt | 20.4% | Fair |
| Maltese | mt | 20.5% | Fair |
| Greek | el | 20.7% | Fair |
| Latvian | lv | 22.8% | Fair |
| Slovenian | sl | 24.0% | Fair |

Average: 11.97% across all 25 languages. Top 10 average: 5.4%.

**Whisper language tiers:**
- High-resource (3-8% WER): en, es, fr, de, it, pt, zh, ja, ko, nl
- Medium-resource (8-15% WER): ru, pl, cs, tr, ar, hi, sv, id, vi, uk, ro, hu, fi, da, no, el, he, th, bg, hr, sk, ca, sl, lt, lv, et, sr, ms, gl, eu
- Long-tail: 60+ additional languages (af, sq, am, hy, az, bn, be, bs, ka, gu, ha, is, kn, kk, km, lo, mk, ml, mi, mr, mn, ne, fa, pa, so, sw, tl, ta, te, uz, cy, yi, yo, and more)

**Audio format:**

| Parameter | Value |
|-----------|-------|
| Format | PCM16 (signed 16-bit integer) |
| Byte order | Little-endian |
| Sample rate | 16000 Hz |
| Channels | 1 (mono) |
| Recommended chunk | 20ms (320 bytes) to 100ms (3200 bytes) |

No server-side resampling — client must send exactly 16kHz mono PCM16.

Convert from other formats: `ffmpeg -i input.wav -f s16le -ar 16000 -ac 1 output.pcm`

### WebSocket Protocol

Connect to the `server_url` returned in the session response.

**Config (first message, mandatory within 10s):**
```json
{"type": "config", "language": "en", "session_key": "eyJ..."}
```
`session_key` required on `/v1/stream` (authenticated). `language` defaults to `"en"`. Set `"model": "whisper"` to force Whisper.

**Mid-session reconfigure:**
```json
{"type": "config", "language": "es"}
```
Server responds with `{"type": "ready", ...}` and clears audio buffers.

**Client → Server messages:**
- Binary frames: raw PCM16 audio chunks
- `{"type": "flush"}` — force-process remaining audio
- `{"type": "reset"}` — clear audio buffers
- `{"type": "ping"}` — keepalive
- `{"type": "extend", "session_key": "eyJ..."}` — extend session

**Server → Client messages:**
```json
{"type": "ready", "client_id": "12345", "language": "en"}
{"type": "vad_event", "event": "speech_start", "timestamp": 1234567890.123}
{"type": "vad_event", "event": "speech_end", "timestamp": 1234567890.456}
{"type": "transcription", "text": "hello world", "language": "en", "is_final": true, "latency_ms": 1250.5}
{"type": "pong", "timestamp": 1234567890.123}
{"type": "session_expiring", "remaining_seconds": 60}
{"type": "session_extended", "remaining_seconds": 300}
{"type": "error", "error": "server_full", "message": "Server at capacity"}
```

### POST /v1/stt/session

**Body:**
```json
{
  "duration_minutes": number,
  "language": "string (optional)"
}
```

**Response:**
```json
{
  "session_id": "uuid",
  "token": "jwt",
  "server_url": "https://...",
  "expires_at": "ISO8601"
}
```

### POST /v1/stt/session/extend

**Body:** `{ "session_id": "uuid", "additional_minutes": number }`

---

## Text-to-Speech

### Capabilities

**Model:** Kokoro 82M — lightweight neural TTS model running on MLX (Apple Silicon optimized). Fast inference with natural-sounding output. Text with per-message voice/speed/language override.

**54 voices across 9 languages:**

| Language | Code | Voices |
|----------|------|--------|
| American English | `"a"` | af_alloy, af_aoede, af_bella, af_heart *(default)*, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky, am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa |
| British English | `"b"` | bf_alice, bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_george, bm_lewis |
| Spanish | `"e"` | ef_dora, em_alex, em_santa |
| French | `"f"` | ff_siwis |
| Hindi | `"h"` | hf_alpha, hf_beta, hm_omega, hm_psi |
| Italian | `"i"` | if_sara, im_nicola |
| Japanese | `"j"` | jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo |
| Portuguese (BR) | `"p"` | pf_dora, pm_alex, pm_santa |
| Mandarin Chinese | `"z"` | zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang |

**Language code aliases:** `"a"`=en-us, `"b"`=en-gb, `"e"`=es, `"f"`=fr, `"h"`=hi, `"i"`=it, `"j"`=ja, `"p"`=pt-br, `"z"`=zh

**Voice blending:** Comma-delimited voice IDs average voice tensors: `"af_heart,af_bella"`

**Speed control:** Float multiplier (0.5–2.0, default 1.0). Affects phoneme duration, no pitch shift.

**Output format:**

| Parameter | Value |
|-----------|-------|
| Format | PCM16 (signed 16-bit, little-endian, mono) |
| Sample rate | 48000 Hz (resampled from native 24000 Hz) |
| Chunk size | 960 samples = 1920 bytes = 20ms |
| Delivery | Binary WebSocket frames at real-time rate (50 chunks/sec) |

**Chunking:** English splits at sentence boundaries when phonemes exceed 510 chars. Non-English splits at ~400 chars on sentence boundaries.

### WebSocket Protocol

Connect to the `server_url` returned in the session response.

**Auth (first message on /v1/stream, mandatory):**
```json
{"session_key": "eyJ...", "config": {"voice": "af_heart", "lang_code": "a", "speed": 1.0}}
```

**Send text:**
```json
{"text": "Hello, how are you?"}
{"text": "Hola", "voice": "ef_dora", "lang_code": "e", "speed": 1.2}
```

**Session config (persists for subsequent texts):**
```json
{"config": {"voice": "af_heart", "lang_code": "a", "speed": 1.0}}
```

**Barge-in:**
```json
{"type": "clear"}
```

**Extend session:**
```json
{"type": "extend", "session_key": "eyJ..."}
```

**Server → Client messages:**
```json
{"type": "generating", "text": "..."}
{"type": "done"}
{"type": "cleared"}
{"type": "error", "message": "..."}
{"type": "extended", "expires_in": 120}
```
Plus binary PCM16 audio chunks between `generating` and `done`.

Multiple texts can be queued — processed sequentially. Barge-in cancels all queued texts and clears the audio buffer.

### POST /v1/tts/session

**Body:**
```json
{
  "max_characters": number,
  "language": "string (optional)"
}
```

### POST /v1/tts/session/extend

**Body:** `{ "session_id": "uuid", "additional_characters": number }`

---

## Agent Session (Bundle)

Creates WebRTC + STT + TTS sessions as a single bundle. Ideal for AI agents that need voice interaction — request all three services in one call.

**Bundle cost: ~$0.015/min** (WebRTC $0.001/min + STT $0.006/min + TTS $0.008/1K chars). Both participants share one WebRTC room. A $0.1 minimum purchase buys ~6.5 minutes of bundled agent voice.

### POST /v1/agent-session

**Body:**
```json
{
  "room_name": "string",
  "participant_identity": "string",
  "duration_minutes": number,
  "language": "string (optional)",
  "tts_max_characters": number (optional, default: 10000),
  "metadata": "string (optional)",
  "client_ip": "string (optional)",
  "client_identity": "string (optional)"
}
```

**`client_ip`:** Same as in `/v1/webrtc/token` — pass the end user's IP when your server creates a session on behalf of a remote user, so the WebRTC `ws_url` routes to the SFU nearest the user.

**`client_identity`:** When provided, the gateway generates **two** WebRTC tokens — one for the agent (`participant_identity`) and one for the client (`client_identity`). The agent token is geo-routed by the server's IP (from `X-Forwarded-For`), and the client token is geo-routed by `client_ip`. This is the recommended approach when both agent and client need to join the same room. Without `client_identity`, a single WebRTC token is returned (backwards compatible).

**Response (with `client_identity` — dual WebRTC tokens):**
```json
{
  "bundle_id": "uuid",
  "webrtc": {
    "agent": { "session_id": "uuid", "token": "jwt", "ws_url": "wss://..." },
    "client": { "session_id": "uuid", "token": "jwt", "ws_url": "wss://..." }
  },
  "stt": { "session_id": "uuid", "token": "jwt", "server_url": "https://..." },
  "tts": { "session_id": "uuid", "token": "jwt", "server_url": "https://..." },
  "expires_at": "ISO8601"
}
```

**Response (without `client_identity` — single WebRTC token, backwards compatible):**
```json
{
  "bundle_id": "uuid",
  "webrtc": { "session_id": "uuid", "token": "jwt", "ws_url": "wss://..." },
  "stt": { "session_id": "uuid", "token": "jwt", "server_url": "https://..." },
  "tts": { "session_id": "uuid", "token": "jwt", "server_url": "https://..." },
  "expires_at": "ISO8601"
}
```

### POST /v1/agent-session/extend

**Body:** `{ "bundle_id": "uuid", "additional_minutes": number, "additional_tts_characters": number (optional) }`

**Response (dual WebRTC — when the bundle has 2 WebRTC sessions):**
```json
{
  "webrtc": {
    "agent": { "session_id": "uuid", "token": "jwt", "ws_url": "wss://...", "new_expires_at": "ISO8601" },
    "client": { "session_id": "uuid", "token": "jwt", "ws_url": "wss://...", "new_expires_at": "ISO8601" }
  },
  "stt": { "token": "jwt", "new_expires_at": "ISO8601" },
  "tts": { "token": "jwt", "new_expires_at": "ISO8601" }
}
```

**Response (single WebRTC — backwards compatible):**
```json
{
  "webrtc": { "token": "jwt", "new_expires_at": "ISO8601" },
  "stt": { "token": "jwt", "new_expires_at": "ISO8601" },
  "tts": { "token": "jwt", "new_expires_at": "ISO8601" }
}
```

---

## Webhooks

### POST /v1/webhooks/sfu

Receives dTelecom SFU callbacks (`participant_joined`, `room_finished`).

---

## Internal

### POST /internal/events

Receives session completion events from STT/TTS servers.

**Body:** `{ "session_id": "uuid", "event": "complete" | "usage", "usage": { "duration_seconds": number, "used_characters": number } }`

### GET /internal/health

Health check. Returns `{ "status": "ok" }`.

---

## Error Codes

| Code | Meaning |
|------|---------|
| 400 | Bad request / validation error |
| 401 | Invalid or missing authentication |
| 402 | Insufficient credits |
| 404 | Resource not found |
| 429 | Rate limit exceeded / concurrent session limit |
| 503 | No servers available for requested service |
