If you speak OpenAI, you speak Sylica.
One base URL. Same request and response shapes, including streaming and tool use.
Drop-in SDK compatibility
Keep your existing OpenAI-compatible client code and switch only your base URL and API key. Sylica preserves request/response semantics so migration takes hours, not weeks.
Integration flow
- 1
Set baseURL to https://api.sylicaai.com/v1
- 2
Use your existing chat.completions request shape
- 3
Turn on stream=true for low-latency UX
- 4
Read x-sylica-request-id for debugging and support
import OpenAI from "openai";
const sylica = new OpenAI({
baseURL: "https://api.sylicaai.com/v1",
apiKey: process.env.SYLICA_API_KEY,
});
const stream = await sylica.chat.completions.create({
model: "sylica/auto",
messages: [{ role: "user", content: "Write me a haiku about routing." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}curl https://api.sylicaai.com/v1/chat/completions \
-H "Authorization: Bearer $SYLICA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-opus-4.5",
"stream": true,
"messages": [{ "role": "user", "content": "hello" }]
}'Five stages between your SDK and the model.
Auth, rate limit, route, adapter, and credits -- on a p50 routing overhead under 100 ms.
Unified schema
OpenAI chat completions in, OpenAI chat completions out. Adapters normalize every provider.
SSE, end to end
No buffering. Tokens hit your socket the moment the upstream emits them.
Meta-models
Ask for sylica/auto, sylica/cheap, or sylica/fast. The router picks a concrete model per request.
Per-token billing
Debits are atomic against your org's credit balance. 429 when empty -- never a surprise invoice.
Rate limits + keys
Token-bucket per key, per model. Scoped keys with deny-lists for production safety.
OpenTelemetry
Every request emits spans with provider, TTFB, and cost. Export to your own backend.
The router is a small, legible scoring function.
Every eligible model gets a score from cost, latency, and live health. The top score wins. If it fails before the first byte, the next score wins -- automatically, on the same stream.
- Routing
- 32 ms
- TTFB
- 180 ms
- Total
- 2.4 s
Every token is accounted for.
Per-key usage, per-model breakdowns, live p50/p95 latency, and spend -- as a dashboard, an API, or OTLP spans.
- Request logEvery call is stored for 30 days with model, provider, tokens, cost, and latency.
- Usage APIGET /v1/usage returns JSON aggregates. Stream to your data warehouse on a cron.
- OpenTelemetryEnable OTEL_EXPORTER_OTLP_ENDPOINT and Sylica will emit traces and metrics alongside your own.
- AlertingOptional webhooks on spend thresholds, error spikes, or per-key rate-limit hits.
BYOK, with real encryption -- not just a claim.
Provider keys are sealed with AES-256-GCM using a master key held outside Postgres, decrypted only on the request path, and never logged.
Ten of the 30+ models we serve today.
Prices are per 1M tokens. The full list -- including context windows, tool-use, and vision flags -- lives in the dashboard.
| Model | Context | Input / 1M | Output / 1M | Traits |
|---|---|---|---|---|
openai/gpt-5 | 400k | $2.50 | $10.00 | reasoningtoolsvision |
openai/gpt-4.1 | 1,047.576k | $2.00 | $8.00 | toolsvision |
openai/o3 | 200k | $2.00 | $8.00 | reasoningtoolsvision |
anthropic/claude-opus-4.5 | 200k | $5.00 | $25.00 | reasoningtoolsvision |
anthropic/claude-sonnet-4.5 | 200k | $3.00 | $15.00 | reasoningtoolsvision |
anthropic/claude-haiku-4.5 | 200k | $1.00 | $5.00 | toolsvision |
xai/grok-4 | 256k | $3.00 | $15.00 | reasoningtoolsvision |
xai/grok-4-fast | 2,000k | $0.20 | $0.50 | toolsvision |
google/gemini-2.5-pro | 1,048.576k | $1.25 | $10.00 | reasoningtoolsvision |
google/gemini-2.5-flash | 1,048.576k | $0.30 | $2.50 | reasoningtoolsvision |
Two ways to pay. Neither involves a sales call.
Prepaid credits settle at published per-token rates. BYOK passes through at 0%.