# Responses
Source: https://zerogpu.mintlify.app/api-reference/endpoint/responses
Send input to an AI model and receive a response.
## `POST /v1/responses`
Send a list of input messages to an AI model and receive a generated response.
### Request headers
| Header | Type | Required | Description |
| -------------- | ------ | -------- | -------------------------- |
| `x-api-key` | string | Yes | Your ZeroGPU API key |
| `x-project-id` | string | Yes | Your project UUID |
| `content-type` | string | Yes | Must be `application/json` |
### Request body
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ---------------------------------------------------- |
| `model` | string | Yes | The model identifier (available from your dashboard) |
| `input` | array | Yes | Array of input message objects |
| `text` | object | No | Response format configuration |
#### Input message object
| Field | Type | Description |
| --------- | ------ | -------------------------------------------------- |
| `role` | string | The role of the message author: `user` or `system` |
| `content` | string | The content of the message |
#### Text format object
| Field | Type | Description |
| ------------------ | ------ | ----------------------------------- |
| `text.format.type` | string | Response format type (e.g., `text`) |
### Example request
```bash cURL theme={null}
curl --location 'https://api.zerogpu.ai/v1/responses' \
--header 'content-type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'x-project-id: YOUR_PROJECT_ID' \
--data '{
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here..."
}
],
"text": {
"format": {
"type": "text"
}
}
}'
```
```python Python theme={null}
import requests
url = "https://api.zerogpu.ai/v1/responses"
headers = {
"content-type": "application/json",
"x-api-key": "YOUR_API_KEY",
"x-project-id": "YOUR_PROJECT_ID",
}
payload = {
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here...",
}
],
"text": {
"format": {
"type": "text"
}
},
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
```
```javascript JavaScript theme={null}
const url = 'https://api.zerogpu.ai/v1/responses';
const headers = {
'content-type': 'application/json',
'x-api-key': 'YOUR_API_KEY',
'x-project-id': 'YOUR_PROJECT_ID'
};
const payload = {
model: 'YOUR_MODEL',
input: [
{
role: 'user',
content: 'Your input text here...'
}
],
text: {
format: {
type: 'text'
}
}
};
fetch(url, {
method: 'POST',
headers,
body: JSON.stringify(payload)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
```
### Example response
```json theme={null}
{
"id": "resp_abc123",
"object": "response",
"created": 1710000000,
"model": "your-selected-model",
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The generated response from the model..."
}
]
}
],
"usage": {
"input_tokens": 24,
"output_tokens": 32,
"total_tokens": 56
}
}
```
### Response fields
| Field | Type | Description |
| ------------------------- | ------- | ----------------------------------------------- |
| `id` | string | Unique identifier for the response |
| `object` | string | Object type (`response`) |
| `created` | integer | Unix timestamp of when the response was created |
| `model` | string | The model used for inference |
| `output` | array | Array of output message objects |
| `output[].role` | string | Always `assistant` |
| `output[].content[].text` | string | The generated text response |
| `usage` | object | Token usage statistics |
| `usage.input_tokens` | integer | Number of tokens in the input |
| `usage.output_tokens` | integer | Number of tokens generated |
| `usage.total_tokens` | integer | Total tokens consumed |
### Context length
Each model has a maximum input token limit. If your input exceeds it:
* The API may return **`420`** with `error.code` **`context_length_exceeded`** when the model is configured to reject over-length input.
* Otherwise the input may be truncated to the limit and the response will include usage for the truncated input.
Keep requests within the model's token limit or handle `420` and truncation in your client.
# API Reference
Source: https://zerogpu.mintlify.app/api-reference/introduction
ZeroGPU Responses API documentation.
ZeroGPU provides a REST API for running AI model inference. Send text inputs to models and receive structured responses.
## Base URL
```
https://api.zerogpu.ai/v1
```
## Authentication
All requests require two headers:
| Header | Description |
| -------------- | --------------------------------- |
| `x-api-key` | Your ZeroGPU API key (`zgpu-...`) |
| `x-project-id` | Your project identifier (UUID) |
| `content-type` | Must be `application/json` |
See [Authentication](/platform/authentication) for details.
## Available endpoints
`POST /v1/responses` — Send input to an AI model and receive a response.
## Available models
| Model | Use case |
| --------------------------- | -------------------------- |
| `zlm-v1-summary-cloud` | Text summarization |
| `zlm-v1-iab-classify-cloud` | IAB content classification |
Select the model in your [dashboard](https://zerogpu.ai) and pass its identifier in the `model` field of your request.
## Error codes
| Status | Meaning |
| ------ | --------------------------------------------------------------------------------------------------------- |
| `200` | Success |
| `400` | Bad request — check your request body |
| `401` | Unauthorized — invalid or missing API key |
| `403` | Forbidden — invalid project ID or insufficient permissions |
| `420` | Input exceeds the model's token limit — see [Responses](/api-reference/endpoint/responses#context-length) |
| `500` | Internal server error — retry with exponential backoff |
## SDK examples
Integration examples are available for multiple languages:
# Distributed Inference
Source: https://zerogpu.mintlify.app/concepts/distributed-inference
Run AI on idle devices at the edge instead of GPU data centers.
Traditional inference: every request goes to a GPU data center, regardless of task complexity or user location. You pay for reserved capacity even when it's idle.
ZeroGPU: requests run on a distributed network of edge devices — laptops, phones, servers, browsers — using Nano Language Models that don't need GPUs.
## Why centralized inference is expensive
| Problem | Cost |
| -------------------- | ------------------------------------------------------------- |
| **Traffic spikes** | Over-provision GPUs or accept latency spikes |
| **Oversized models** | LLMs consume GPU resources for tasks that don't need them |
| **Regional egress** | Data round-trips to distant data centers add latency and fees |
| **Idle capacity** | Reserved instances cost money 24/7 |
## How ZeroGPU distributes it
1. **Your app** sends a request to ZeroGPU
2. **Router** picks the best edge node by location, capacity, and model availability
3. **Edge device** runs the NLM and returns the result
4. **Cloud fallback** catches requests when no edge node is available
## What you get
* **Scale horizontally** — more devices = more capacity, no GPU procurement
* **Pay for usage** — not reserved instances sitting idle
* **Lower latency** — inference runs near the user, not across the country
* **Resilience** — no single point of failure; traffic reroutes automatically
**Trade-off:** Edge inference depends on device availability. ZeroGPU mitigates this with automatic cloud fallback — your app never notices the difference.
How the router picks the optimal node.
Send your first distributed inference request.
# Geo-Aware Routing
Source: https://zerogpu.mintlify.app/concepts/geo-aware-routing
Every request goes to the nearest capable device. Fallback to cloud if needed.
The router decides which edge device handles each request. It evaluates four signals in milliseconds:
| Signal | What it optimizes |
| ------------------------ | ------------------------------------------- |
| **Geographic proximity** | Lowest network latency |
| **Device capability** | Enough compute for the requested model |
| **Current load** | Avoids overloaded nodes |
| **Model availability** | Routes to nodes with the NLM already cached |
## Request flow
API gateway extracts the model identifier, payload size, and origin IP.
Router queries the network topology. Best node that satisfies all constraints wins.
Request forwarded to edge device. NLM processes input, returns result.
No edge node responds in time? Request goes to cloud infrastructure. Same response format — your app doesn't know the difference.
## Cloud fallback
Availability guarantee. If the edge network can't serve a request — capacity, model, or device constraints — cloud-hosted replicas handle it transparently.
**Trade-off:** Cloud fallback may have slightly higher latency than edge, but it ensures 100% availability. Your integration code doesn't change either way.
The full architecture behind edge compute.
Endpoint spec for `/v1/responses`.
# Nano Language Models
Source: https://zerogpu.mintlify.app/concepts/nano-language-models
Sub-1B parameter models that run on CPUs and cost a fraction of LLMs.
Most production AI traffic is classification, extraction, routing, and moderation — not creative writing or multi-step reasoning. These tasks don't need 70B parameters. They need something fast, cheap, and predictable.
That's what Nano Language Models (NLMs) are built for.
## NLMs vs LLMs
| | LLMs | NLMs |
| -------------- | --------------------- | ----------------------------------- |
| **Parameters** | 7B – 400B+ | Sub-1B |
| **Runs on** | GPU clusters | CPU, mobile, browser |
| **Output** | Variable | Predictable, task-specific |
| **Cost** | High | Low |
| **Latency** | 100ms – seconds | Single-digit milliseconds |
| **Best for** | Open-ended generation | Classification, extraction, routing |
## What NLMs handle well
* **Content classification** — categorize into taxonomies at scale
* **Intent routing** — map user queries to the right handler
* **Entity extraction** — pull names, dates, amounts from unstructured text
* **Content moderation** — flag violations in real time
* **Summarization** — condense documents and conversations
* **Sentiment analysis** — positive/negative/neutral at high throughput
## "Why not just use a small LLM?"
Different architecture, different goals:
1. **Single-task fine-tuning** — every parameter optimized for one job
2. **CPU-native** — quantized and compiled for edge hardware, not adapted from GPU-first designs
3. **Deterministic output** — consistent results production systems can rely on
**Trade-off:** NLMs can't do open-ended generation or complex reasoning. Use LLMs for that. Use NLMs for the high-volume, well-defined tasks that make up 80%+ of production AI traffic.
## Available models
| Model | Use case |
| --------------------------- | -------------------------- |
| `zlm-v1-summary-cloud` | Text summarization |
| `zlm-v1-iab-classify-cloud` | IAB content classification |
Choose the model in your [dashboard](https://zerogpu.ai) and use its identifier in the `model` field when calling the API.
Send requests to NLMs via `/v1/responses`.
# Batch requests
Source: https://zerogpu.mintlify.app/cookbook/batch-requests
Send multiple API requests efficiently with parallelism and error handling.
When you need to process many items (e.g. summarize 100 articles or classify a list of snippets), send requests in parallel and handle errors so one failure doesn't block the rest.
## Pattern: parallel requests with a pool
```python Python theme={null}
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
url = "https://api.zerogpu.ai/v1/responses"
headers = {
"content-type": "application/json",
"x-api-key": "YOUR_API_KEY",
"x-project-id": "YOUR_PROJECT_ID",
}
def one_request(content: str, model: str):
payload = {
"model": model,
"input": [{"role": "user", "content": content}],
"text": {"format": {"type": "text"}},
}
r = requests.post(url, headers=headers, json=payload)
r.raise_for_status()
return r.json()
texts = ["First text...", "Second text..."] # your inputs
model = "zlm-v1-summary-cloud" # or zlm-v1-iab-classify-cloud
results = []
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(one_request, t, model): t for t in texts}
for future in as_completed(futures):
try:
results.append(future.result())
except requests.RequestException as e:
# log and skip or retry
print(f"Failed: {e}")
```
## Tips
* **Concurrency:** Tune `max_workers` to match your usage and dashboard metrics.
* **Errors:** Check status codes and response body; retry with backoff on 5xx.
* **Credentials:** Use env vars for `x-api-key` and `x-project-id`; see [Security](/platform/security).
For a single request shape, see [Summarize text](/cookbook/summarize-text) or [IAB classification](/cookbook/iab-classification). For a runnable Node.js batch demo, see the [Batch requests demo](/cookbook/demos#batch-requests-nodejs) in the cookbook.
# Runnable demos
Source: https://zerogpu.mintlify.app/cookbook/demos
Clone and run full apps that use the ZeroGPU API.
The [zerogpu/cookbook](https://github.com/zerogpu/cookbook) repo contains runnable demos: small apps you can clone, install, and run locally. Each demo lives in a dedicated folder under **`demos/`**.
## Repository structure
```
cookbook/
├── README.md # Overview and how to add demos
├── .gitignore # Ignores .env, node_modules, dist
└── demos/
├── summarize-react/ # React summarization app
├── iab-classification-react/ # React IAB classification app
├── batch-requests-node/ # Node.js parallel batch requests
├── quickstart-python/ # Python one-shot request
└── ...
```
**Naming:** Demos use descriptive folder names (e.g. `summarize-react`, `batch-requests-node`). See the repo [README](https://github.com/zerogpu/cookbook#structure) for the full layout and how to add a new demo.
## Available demos
### Summarize (React)
A minimal React app that summarizes text with the ZeroGPU API using the `zlm-v1-summary-cloud` model.
* **Stack:** React, Vite, TypeScript
* **Repo path:** [demos/summarize-react](https://github.com/zerogpu/cookbook/tree/main/demos/summarize-react)
* **Features:** Enter API key and project ID in the UI (stored in the browser only); paste or type text and click **Summarize** to see the result and token usage.
**Run it:**
```bash theme={null}
git clone https://github.com/zerogpu/cookbook.git
cd cookbook/demos/summarize-react
npm install
npm run dev
```
See the demo [README](https://github.com/zerogpu/cookbook/blob/main/demos/summarize-react/README.md).
### IAB classification (React)
Classify content into IAB categories with `zlm-v1-iab-classify-cloud`. Same credential-in-UI pattern; paste content and click **Classify**. The app parses the JSON response and shows **Audience**, **Content** (IAB 1.0 / 2.2), **Topics**, **Keywords**, and **User intent** with a **Copy result** button. Works best with a paragraph or more of content.
* **Repo path:** [demos/iab-classification-react](https://github.com/zerogpu/cookbook/tree/main/demos/iab-classification-react)
* **Run it:** `cd cookbook/demos/iab-classification-react` → `npm install` → `npm run dev`. See the demo [README](https://github.com/zerogpu/cookbook/blob/main/demos/iab-classification-react/README.md).
### Batch requests (Node.js)
Node.js script that sends multiple summarization requests in parallel. Set `ZEROGPU_API_KEY` and `ZEROGPU_PROJECT_ID` in the environment; run `npm start` or `node run.js`.
* **Repo path:** [demos/batch-requests-node](https://github.com/zerogpu/cookbook/tree/main/demos/batch-requests-node)
* **Run it:** `cd cookbook/demos/batch-requests-node` → set env vars → `npm start`. See the demo [README](https://github.com/zerogpu/cookbook/blob/main/demos/batch-requests-node/README.md).
### Quickstart (Python)
Minimal Python script: one request to the API, print response and token usage. Env vars for credentials.
* **Repo path:** [demos/quickstart-python](https://github.com/zerogpu/cookbook/tree/main/demos/quickstart-python)
* **Run it:** `cd cookbook/demos/quickstart-python` → `pip install -r requirements.txt` → set env vars → `python run.py`. See the demo [README](https://github.com/zerogpu/cookbook/blob/main/demos/quickstart-python/README.md).
## Security
* Demos never commit `.env` or real API keys; the repo `.gitignore` excludes them.
* Credentials entered in the browser (e.g. in the React demo) are for local try-it use only. For production, call the ZeroGPU API from a backend so your API key is not exposed in client code.
## More recipes
For copy-paste **code snippets** (cURL, Python, etc.) rather than full apps, see the other cookbook pages:
* [Summarize text](/cookbook/summarize-text) — request/response for summarization
* [IAB content classification](/cookbook/iab-classification) — classify content into IAB categories
* [Batch requests](/cookbook/batch-requests) — parallel requests and error handling
# IAB content classification
Source: https://zerogpu.mintlify.app/cookbook/iab-classification
Classify content into IAB categories with the classification model.
Use the `zlm-v1-iab-classify-cloud` model to classify content into IAB (Interactive Advertising Bureau) categories.
## Request
```bash cURL theme={null}
curl --location 'https://api.zerogpu.ai/v1/responses' \
--header 'content-type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'x-project-id: YOUR_PROJECT_ID' \
--data '{
"model": "zlm-v1-iab-classify-cloud",
"input": [
{
"role": "user",
"content": "Article or content to classify..."
}
],
"text": {
"format": { "type": "text" }
}
}'
```
```python Python theme={null}
import requests
url = "https://api.zerogpu.ai/v1/responses"
headers = {
"content-type": "application/json",
"x-api-key": "YOUR_API_KEY",
"x-project-id": "YOUR_PROJECT_ID",
}
payload = {
"model": "zlm-v1-iab-classify-cloud",
"input": [{"role": "user", "content": "Article or content to classify..."}],
"text": {"format": {"type": "text"}},
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
```
## Response
The API returns the classification in the standard [Responses API](/api-reference/endpoint/responses) format. The model output is **JSON** with:
| Field | Description |
| ------------- | ----------------------------------------------------------------------------------------- |
| `audience` | Audience segments (name, tier, score). |
| `content` | Taxonomy mappings, e.g. `iab_1_0`, `iab_2_2` (IAB 1.0 and 2.2 codes with name and score). |
| `topics` | Topics with scores. |
| `keywords` | Extracted keywords. |
| `user_intent` | Intent name, category (e.g. informational), and score. |
Parse the `output[].content[].text` from the response body; that string is the JSON. For a runnable app that formats this output, see the [IAB classification demo](/cookbook/demos#iab-classification-react) in the cookbook.
## Tips
* Best for single-document or single-snippet classification; for many items, call in a loop or use [batch patterns](/cookbook/batch-requests).
* Classification works best with a **paragraph or more** of content; very short text often yields generic categories.
* Store API key and project ID in environment variables; see [Security](/platform/security).
# Cookbook
Source: https://zerogpu.mintlify.app/cookbook/index
Ready-to-use recipes for common ZeroGPU API tasks.
Copy, adapt, and run. These recipes show how to accomplish specific tasks with the ZeroGPU API.
Clone and run full apps from the cookbook repo (e.g. React summarization demo).
Use the summarization model to shorten long text.
Classify content into IAB categories with the classification model.
Send multiple requests efficiently (parallelism, error handling).
Use your own **API key**, **project ID**, and **model** from the [dashboard](https://zerogpu.ai). See [Authentication](/platform/authentication) and [Quickstart](/quickstart) if you haven't set up yet.
# Summarize text
Source: https://zerogpu.mintlify.app/cookbook/summarize-text
Use the summarization model to shorten long content.
Use the `zlm-v1-summary-cloud` model to produce short summaries from longer text.
## Request
```bash cURL theme={null}
curl --location 'https://api.zerogpu.ai/v1/responses' \
--header 'content-type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'x-project-id: YOUR_PROJECT_ID' \
--data '{
"model": "zlm-v1-summary-cloud",
"input": [
{
"role": "user",
"content": "Your long text to summarize here..."
}
],
"text": {
"format": { "type": "text" }
}
}'
```
```python Python theme={null}
import requests
url = "https://api.zerogpu.ai/v1/responses"
headers = {
"content-type": "application/json",
"x-api-key": "YOUR_API_KEY",
"x-project-id": "YOUR_PROJECT_ID",
}
payload = {
"model": "zlm-v1-summary-cloud",
"input": [{"role": "user", "content": "Your long text to summarize here..."}],
"text": {"format": {"type": "text"}},
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
```
## Response
The response body includes the model output in the structure returned by the [Responses API](/api-reference/endpoint/responses). Extract the summary text from the response and handle errors (e.g. 4xx/5xx) in your code.
## Tips
* Keep input length within the model's limits; for very long documents, consider chunking and summarizing in steps.
* Use [Logs](/platform/logs) and [Usage](/platform/usage-analytics) in the dashboard to debug and monitor usage.
For a runnable React app, see the [Summarize demo](/cookbook/demos#summarize-react) in the cookbook.
# FAQ
Source: https://zerogpu.mintlify.app/faq
Quick answers to common questions.
An API for AI model inference. You send requests, get responses. No GPUs to manage — ZeroGPU handles model hosting, scaling, and routing.
Inference runs on distributed edge devices using Nano Language Models (sub-1B parameters) instead of GPU clusters. You pay per request, not for reserved capacity.
Summarization, classification, entity extraction, content moderation, intent routing, sentiment analysis — the high-volume tasks that make up most production AI traffic.
Two models: **zlm-v1-summary-cloud** (summarization) and **zlm-v1-iab-classify-cloud** (IAB content classification). Select one in your [dashboard](https://zerogpu.ai) and pass it in the `model` field.
One endpoint: `POST https://api.zerogpu.ai/v1/responses`. Add your API key and project ID as headers. SDKs available for Python, JavaScript, Rust, Go, Ruby. See the [Quickstart](/quickstart).
Depends on the model and workload. Monitor real-time latency in [Usage Analytics](/platform/usage-analytics).
Yes. Project isolation, API key management, request logging, usage analytics, and automatic cloud fallback for availability.
Keys are shown once at creation. Revoke the lost key in [API Keys](/platform/api-keys) and create a new one.
Yes. Create separate projects in your organization. Each gets its own keys, logs, and analytics. See [Organizations & Projects](/platform/organizations-and-projects).
[Usage Analytics](/platform/usage-analytics) for trends (tokens, requests, latency). [Logs](/platform/logs) for individual request detail.
# Introduction
Source: https://zerogpu.mintlify.app/index
Run AI model inference through an API — without managing GPU infrastructure.
Send a POST request, get an AI response. No GPUs to provision, no infrastructure to manage.
ZeroGPU handles model hosting, scaling, and routing. You get an API key and start building.
Three steps: get credentials, send a request, see the response.
## What you get
`POST /v1/responses` — send text in, get AI output back. Six SDKs included.
Track every request: token counts, latency, volume, model distribution.
Separate dev, staging, and production with independent keys, logs, and analytics.
## How it works
One organization, multiple projects. Each gets its own API key and dashboard.
POST to `/v1/responses` with your key, project ID, and model. Response comes back as structured JSON.
Token usage, request volume, latency, error rates — all visible in the dashboard. Debug individual requests in Logs.
## Go deeper
Credentials → first request → response. Under 5 minutes.
Full endpoint spec with request/response schemas.
Copy-paste examples for Python, JavaScript, Rust, Go, Ruby.
Your API key, project ID, and ready-to-use code snippet.
# API Keys
Source: https://zerogpu.mintlify.app/platform/api-keys
Create, rotate, revoke. Keys are shown once — copy immediately.
API keys authenticate your requests. Create them from the dashboard, store them securely, revoke when compromised.
## Lifecycle
```
Created (shown once) → Active → Revoked (permanent)
```
**The full key is only displayed at creation.** Copy it immediately. If you lose it, revoke and create a new one.
## Key format
```
zgpu-xxxx...xxxx
```
Partially masked in the dashboard after creation.
## Actions
| Action | What happens |
| ---------- | ------------------------------------------ |
| **Create** | New key generated, shown once in full |
| **View** | See masked keys and creation dates |
| **Revoke** | Key disabled immediately, cannot be undone |
## Rules of thumb
* **One key per environment** — dev, staging, production each get their own
* **Rotate on a schedule** — create new key, update your app, revoke old key
* **Revoke immediately if exposed** — don't wait
```bash theme={null}
export ZEROGPU_API_KEY="YOUR_API_KEY"
export ZEROGPU_PROJECT_ID="YOUR_PROJECT_ID"
```
Lost your key? You can't recover it. Revoke and create a new one.
# Authentication
Source: https://zerogpu.mintlify.app/platform/authentication
Two headers on every request. That's it.
Every API request needs two headers: your API key and your project ID. No OAuth, no tokens to refresh.
## Required headers
| Header | Value |
| -------------- | ------------------------------------ |
| `x-api-key` | Your API key from the dashboard |
| `x-project-id` | Your project UUID from the dashboard |
| `content-type` | `application/json` |
## Example
```bash theme={null}
curl --location 'https://api.zerogpu.ai/v1/responses' \
--header 'content-type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'x-project-id: YOUR_PROJECT_ID' \
--data '{ ... }'
```
## Where to find credentials
1. Log in at [zerogpu.ai](https://zerogpu.ai)
2. Select your project
3. API Key and Project ID are on the dashboard
## When it goes wrong
| Status | Cause | Fix |
| ------ | --------------------------------------- | ---------------------------------------------- |
| `401` | Missing or invalid API key | Check `x-api-key` — did you copy the full key? |
| `403` | Invalid project ID or wrong permissions | Verify `x-project-id` matches the dashboard |
API keys go in server-side code only. Never in frontend JavaScript, mobile apps, or git repos. See [Security](/platform/security).
# Billing
Source: https://zerogpu.mintlify.app/platform/billing
Pay as you go, credit balance, auto recharge, and billing history.
Manage payment for ZeroGPU from the **Billing** page in the [dashboard](https://zerogpu.ai) (under **MANAGE** in the sidebar).
## Plan
ZeroGPU uses a **pay-as-you-go** plan. You're billed based on usage. Pricing is per input and output token by model; rates are shown in the dashboard and in the [cost calculator](https://zerogpu-calculator.vercel.app).
## Credit balance
The Billing page shows your **credit balance** (e.g. **\$5.00**). API usage is paid from this balance.
* **+ Add to credit balance** opens a modal where you add funds.
### Add to credit balance (modal)
* **Amount (USD)** — Enter the amount to add. Preset buttons are available (e.g. $10, $25, $50, $100).
* **Payment method** — You can add or select a payment method (e.g. **+ Add new payment method** with a dropdown to choose an existing one).
* **Card details** (when adding a new card):
* **Card number**
* **Expiration date** (MM / YY)
* **Security code** (CVC)
* **Country** (billing country dropdown)
* Checkout is secured (e.g. “Secure, fast checkout with Link” in the UI). Complete the form and confirm to add credit; the amount is then applied to your balance.
## Auto recharge
On the Billing page, **Auto recharge** has a toggle and a short description (e.g. “Enable to automatically top up your credit balance”). When the toggle is **on**, your balance can be topped up automatically according to your account settings. When it is **off**, you add credit manually only.
## Payment methods
The Billing page has a **Payment methods** card (e.g. “Add or change your payment method”). Use it to add, change, or manage the payment methods used for adding credit and for auto recharge. The exact options (add, set default, remove) are shown in the dashboard.
## Billing history
A **Billing history** card on the Billing page (e.g. “View past and current invoices”) opens a **Billing History** view.
* A **Back to Billing** link returns you to the main Billing page.
* A table lists billing entries with these columns:
| Column | Description |
| --------------- | -------------------------------------------------------------- |
| **Date** | Date and time of the transaction (e.g. 3/11/2026, 3:12:47 PM). |
| **Description** | Type of entry (e.g. **Credit** for an add-credit transaction). |
| **Amount** | Amount with sign (e.g. **+\$5.00** for credit added). |
| **Status** | Status of the entry (e.g. **Credit** shown in a pill/badge). |
Use this view to see when credit was added and the status of past transactions. For export, download, or invoice format, use the options provided in the dashboard.
Use [Usage Analytics](/platform/usage-analytics) to see token usage over time. Use the [cost calculator](https://zerogpu-calculator.vercel.app) to compare OpenAI vs ZeroGPU for your expected volume.
# Dashboard
Source: https://zerogpu.mintlify.app/platform/dashboard
Your API key, project ID, and a ready-to-copy code snippet — all in one place.
The dashboard shows everything you need to start making requests: your organization, project, API key, project ID, and a working code snippet.
## At a glance
| Field | What it is |
| ---------------- | ------------------------------------------ |
| **Organization** | Top-level container (your team or company) |
| **Project** | The active workload you're working in |
| **API Key** | Authentication credential for requests |
| **Project ID** | UUID that scopes requests to this project |
| **Code Snippet** | Copy-paste cURL command, ready to run |
## Ready-to-use snippet
The dashboard generates this for you — copy it, replace the content, run it:
```bash theme={null}
curl --location 'https://api.zerogpu.ai/v1/responses' \
--header 'content-type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'x-project-id: YOUR_PROJECT_ID' \
--data '{
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here..."
}
],
"text": {
"format": {
"type": "text"
}
}
}'
```
## From here
Token counts, request volume, latency trends.
Every request: model, status, latency, timestamp.
Create, revoke, rotate credentials.
Switch projects or create new ones.
# Logs
Source: https://zerogpu.mintlify.app/platform/logs
Every API request, searchable: model, status, latency, timestamp.
Full request history for your project. Find failed requests, track latency, audit which models are being called.
## What each entry shows
| Field | Example |
| ------------- | ---------------------------------------- |
| **Model** | The model that handled the request |
| **Status** | `200`, `401`, `403`, `400`, `420`, `500` |
| **Latency** | Time from request to response |
| **Timestamp** | When the request was made |
## Three ways to use logs
**Debug a failure** — Filter by 4xx/5xx status codes. Check the request payload for malformed input or missing headers.
**Investigate latency** — Sort by latency. If slow requests correlate with high volume, you may need to adjust your request patterns.
**Audit usage** — See which models are called, how often, and from which API keys. Useful for cost tracking across teams.
Logs show individual requests. [Usage Analytics](/platform/usage-analytics) shows the trends. Use both.
# Organizations & Projects
Source: https://zerogpu.mintlify.app/platform/organizations-and-projects
Separate dev from production. Give each team its own keys and logs.
Two levels: **organizations** contain **projects**. Each project gets its own API keys, analytics, and logs.
## Organizations
Your top-level container — one per team or company. Holds multiple projects.
## Projects
Where the work happens. Each project is fully isolated:
* **Own API keys** — revoke dev keys without touching production
* **Own analytics** — see token usage per environment
* **Own logs** — debug staging without production noise
## Recommended setup
| Organization | Project | Use |
| ------------ | ------------- | ------------------- |
| `my-company` | `production` | Live traffic |
| `my-company` | `staging` | Pre-release testing |
| `my-company` | `development` | Local experiments |
Always separate dev and production. One leaked dev key shouldn't compromise your production environment.
## Managing projects
From the top navigation dropdown:
* **Switch** between projects
* **Create** new projects
* **Create** new organizations
One key per project, per environment.
Two headers. No OAuth.
# Security
Source: https://zerogpu.mintlify.app/platform/security
Five rules for keeping your API credentials safe.
## The rules
1. **Server-side only** — API keys never go in frontend code, mobile apps, or anything the user's browser can access.
2. **Environment variables** — not hardcoded strings.
```bash theme={null}
export ZEROGPU_API_KEY="YOUR_API_KEY"
export ZEROGPU_PROJECT_ID="YOUR_PROJECT_ID"
```
3. **Separate keys per environment** — dev key leak shouldn't compromise production.
4. **Rotate regularly** — create a new key, update your app, revoke the old one.
5. **Revoke idle keys** — if it's not in use, it shouldn't exist.
## If a key is compromised
1. **Revoke** the key from [API Keys](/platform/api-keys) — takes effect immediately
2. **Create** a new key and deploy it
3. **Check [Logs](/platform/logs)** for unauthorized requests
4. **Check [Usage Analytics](/platform/usage-analytics)** for unexpected token consumption
## Checklist
| Practice | Done? |
| ------------------------------------ | ----- |
| Keys in environment variables | |
| No keys in client-side code | |
| No keys in version control | |
| Separate keys for dev and production | |
| Unused keys revoked | |
| Rotation schedule in place | |
# Usage Analytics
Source: https://zerogpu.mintlify.app/platform/usage-analytics
See exactly how your API is being used: tokens, latency, volume.
Real-time metrics for the current project. Spot cost spikes, track latency, understand which models get the most traffic.
## Metrics
| Metric | What it tells you |
| ------------------------- | ------------------------------- |
| **Total Requests** | How many API calls you've made |
| **Average Response Time** | How fast the API is responding |
| **Input Tokens** | Tokens you're sending to models |
| **Output Tokens** | Tokens models are generating |
## Charts
* **Token usage over time** — spot unexpected spikes before they hit your bill
* **Request volume** — requests per hour/day
* **Model distribution** — which models handle the most traffic
## Time filters
| Range | Use case |
| ----------- | ---------------------------------- |
| **1 day** | Debug something that just happened |
| **7 days** | Spot weekly patterns |
| **30 days** | Plan capacity and budget |
| **Custom** | Investigate a specific incident |
See a spike? Jump to [Logs](/platform/logs) to inspect the individual requests behind it.
# Quickstart
Source: https://zerogpu.mintlify.app/quickstart
First API call in under 5 minutes.
Three things from your [dashboard](https://zerogpu.ai), one API call, done.
## 1. Grab your credentials
Log into the [ZeroGPU dashboard](https://zerogpu.ai). Copy these three values:
| Credential | Where to find it |
| -------------- | ---------------------------- |
| **API Key** | Dashboard → API Keys |
| **Project ID** | Dashboard → Project Settings |
| **Model** | Dashboard → Model selector |
## 2. Send a request
```bash cURL theme={null}
curl --location 'https://api.zerogpu.ai/v1/responses' \
--header 'content-type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'x-project-id: YOUR_PROJECT_ID' \
--data '{
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here..."
}
],
"text": {
"format": {
"type": "text"
}
}
}'
```
```python Python theme={null}
import requests
url = "https://api.zerogpu.ai/v1/responses"
headers = {
"content-type": "application/json",
"x-api-key": "YOUR_API_KEY",
"x-project-id": "YOUR_PROJECT_ID",
}
payload = {
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here...",
}
],
"text": {
"format": {
"type": "text"
}
},
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
```
```javascript JavaScript theme={null}
const url = 'https://api.zerogpu.ai/v1/responses';
const headers = {
'content-type': 'application/json',
'x-api-key': 'YOUR_API_KEY',
'x-project-id': 'YOUR_PROJECT_ID'
};
const payload = {
model: 'YOUR_MODEL',
input: [
{
role: 'user',
content: 'Your input text here...'
}
],
text: {
format: {
type: 'text'
}
}
};
fetch(url, {
method: 'POST',
headers,
body: JSON.stringify(payload)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
```
```rust Rust theme={null}
use reqwest::Client;
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box> {
let client = Client::new();
let payload = json!({
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here..."
}
],
"text": {
"format": {
"type": "text"
}
}
});
let response = client
.post("https://api.zerogpu.ai/v1/responses")
.header("content-type", "application/json")
.header("x-api-key", "YOUR_API_KEY")
.header("x-project-id", "YOUR_PROJECT_ID")
.json(&payload)
.send()
.await?;
let body = response.text().await?;
println!("{}", body);
Ok(())
}
```
```go Go theme={null}
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
url := "https://api.zerogpu.ai/v1/responses"
payload := map[string]any{
"model": "YOUR_MODEL",
"input": []map[string]string{
{
"role": "user",
"content": "Your input text here...",
},
},
"text": map[string]any{
"format": map[string]string{
"type": "text",
},
},
}
payloadBytes, err := json.Marshal(payload)
if err != nil {
panic(err)
}
req, err := http.NewRequest("POST", url, bytes.NewBuffer(payloadBytes))
if err != nil {
panic(err)
}
req.Header.Set("content-type", "application/json")
req.Header.Set("x-api-key", "YOUR_API_KEY")
req.Header.Set("x-project-id", "YOUR_PROJECT_ID")
client := &http.Client{}
res, err := client.Do(req)
if err != nil {
panic(err)
}
defer res.Body.Close()
body, _ := io.ReadAll(res.Body)
fmt.Println(string(body))
}
```
```ruby Ruby theme={null}
require 'net/http'
require 'uri'
require 'json'
uri = URI.parse('https://api.zerogpu.ai/v1/responses')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = (uri.scheme == 'https')
request = Net::HTTP::Post.new(uri.request_uri)
request['content-type'] = 'application/json'
request['x-api-key'] = 'YOUR_API_KEY'
request['x-project-id'] = 'YOUR_PROJECT_ID'
request.body = {
model: 'YOUR_MODEL',
input: [
{
role: 'user',
content: 'Your input text here...'
}
],
text: {
format: {
type: 'text'
}
}
}.to_json
response = http.request(request)
puts response.body
```
Store your API key in environment variables. Never commit it to version control or expose it in client-side code.
## 3. Verify it worked
Check the [dashboard](https://zerogpu.ai):
* **Logs** — your request appears with model, status, and latency
* **Usage** — token counts update in real time
If you got a `401`, your API key is wrong. `403` means bad project ID. See [Authentication](/platform/authentication) for details.
## Next steps
Full request/response spec for `/v1/responses`.
Error handling, env vars, and production patterns.
# Connect AI tools
Source: https://zerogpu.mintlify.app/resources/ai-and-mcp
Connect Claude Code or Cursor to ZeroGPU docs via MCP for live, queryable access to documentation.
Give your AI coding tool live access to ZeroGPU documentation. Once connected, it can search the docs in real time instead of using a snapshot or cached file.
## What is MCP?
The **Model Context Protocol (MCP)** is an open standard that lets AI applications connect to external data sources and tools in a consistent way. When you connect to ZeroGPU docs via MCP, your tool gets real-time search over the full documentation — the same content as the site, updated automatically when the docs change.
The MCP server for ZeroGPU docs is hosted at:
```
https://zerogpu.mintlify.app/mcp
```
Run in your terminal:
```bash theme={null}
claude mcp add --transport http zerogpu-docs https://zerogpu.mintlify.app/mcp
```
List configured servers:
```bash theme={null}
claude mcp list
```
You should see `zerogpu-docs`. In a Claude Code session, ask a question about ZeroGPU (for example: *How do I call the Responses API?*). The answer will be grounded in the docs.
Press **Cmd+Shift+P** (macOS) or **Ctrl+Shift+P** (Windows), then run **Open MCP settings**.
Click **Add custom MCP**. In the opened `mcp.json`, add or merge:
```json theme={null}
{
"mcpServers": {
"zerogpu-docs": {
"url": "https://zerogpu.mintlify.app/mcp"
}
}
}
```
Save the file and restart Cursor.
In a Cursor chat, ask something about ZeroGPU (for example: *How do I authenticate API requests?*). Cursor will use the MCP server and return answers from the docs.
Some Mintlify themes show an AI or connect icon on the docs site. If you see it on [ZeroGPU docs](https://zerogpu.mintlify.app), you can connect to Cursor from there without editing `mcp.json`. Use that path if your theme supports it.
## Scripts and llms.txt
If you are building a script or pipeline and do not use an MCP client:
| URL | Use |
| -------------------------------------------- | --------------------------------------------- |
| `https://zerogpu.mintlify.app/llms.txt` | Sitemap-style index of pages and descriptions |
| `https://zerogpu.mintlify.app/llms-full.txt` | Full documentation in one Markdown file |
Mintlify generates both when the site is deployed and keeps them in sync with the live docs.
First API call with your API key and project ID.
# Go
Source: https://zerogpu.mintlify.app/sdks/go
Integrate ZeroGPU into your Go application.
## Basic usage
```go theme={null}
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
url := "https://api.zerogpu.ai/v1/responses"
payload := map[string]any{
"model": "YOUR_MODEL",
"input": []map[string]string{
{
"role": "user",
"content": "Your input text here...",
},
},
"text": map[string]any{
"format": map[string]string{
"type": "text",
},
},
}
payloadBytes, err := json.Marshal(payload)
if err != nil {
panic(err)
}
req, err := http.NewRequest("POST", url, bytes.NewBuffer(payloadBytes))
if err != nil {
panic(err)
}
req.Header.Set("content-type", "application/json")
req.Header.Set("x-api-key", "YOUR_API_KEY")
req.Header.Set("x-project-id", "YOUR_PROJECT_ID")
client := &http.Client{}
res, err := client.Do(req)
if err != nil {
panic(err)
}
defer res.Body.Close()
body, _ := io.ReadAll(res.Body)
fmt.Println(string(body))
}
```
## Using environment variables
```go theme={null}
import "os"
apiKey := os.Getenv("ZEROGPU_API_KEY")
projectID := os.Getenv("ZEROGPU_PROJECT_ID")
req.Header.Set("x-api-key", apiKey)
req.Header.Set("x-project-id", projectID)
```
## Error handling
```go theme={null}
res, err := client.Do(req)
if err != nil {
log.Fatalf("Request failed: %v", err)
}
defer res.Body.Close()
if res.StatusCode != http.StatusOK {
body, _ := io.ReadAll(res.Body)
log.Fatalf("API error %d: %s", res.StatusCode, string(body))
}
```
# JavaScript
Source: https://zerogpu.mintlify.app/sdks/javascript
Integrate ZeroGPU into your JavaScript or Node.js application.
## Basic usage
```javascript theme={null}
const url = 'https://api.zerogpu.ai/v1/responses';
const headers = {
'content-type': 'application/json',
'x-api-key': 'YOUR_API_KEY',
'x-project-id': 'YOUR_PROJECT_ID'
};
const payload = {
model: 'YOUR_MODEL',
input: [
{
role: 'user',
content: 'Your input text here...'
}
],
text: {
format: {
type: 'text'
}
}
};
fetch(url, {
method: 'POST',
headers,
body: JSON.stringify(payload)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
```
## Using environment variables (Node.js)
```javascript theme={null}
const url = 'https://api.zerogpu.ai/v1/responses';
const headers = {
'content-type': 'application/json',
'x-api-key': process.env.ZEROGPU_API_KEY,
'x-project-id': process.env.ZEROGPU_PROJECT_ID
};
const payload = {
model: 'YOUR_MODEL',
input: [
{
role: 'user',
content: 'Your input text here...'
}
],
text: {
format: {
type: 'text'
}
}
};
const response = await fetch(url, {
method: 'POST',
headers,
body: JSON.stringify(payload)
});
const data = await response.json();
console.log(data.output[0].content[0].text);
```
## Error handling
```javascript theme={null}
try {
const response = await fetch(url, {
method: 'POST',
headers,
body: JSON.stringify(payload)
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const data = await response.json();
console.log(data.output[0].content[0].text);
} catch (error) {
console.error('Request failed:', error.message);
}
```
# Python
Source: https://zerogpu.mintlify.app/sdks/python
Integrate ZeroGPU into your Python application.
## Basic usage
```python theme={null}
import requests
url = "https://api.zerogpu.ai/v1/responses"
headers = {
"content-type": "application/json",
"x-api-key": "YOUR_API_KEY",
"x-project-id": "YOUR_PROJECT_ID",
}
payload = {
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here...",
}
],
"text": {
"format": {
"type": "text"
}
},
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
```
## Using environment variables
```python theme={null}
import os
import requests
url = "https://api.zerogpu.ai/v1/responses"
headers = {
"content-type": "application/json",
"x-api-key": os.environ["ZEROGPU_API_KEY"],
"x-project-id": os.environ["ZEROGPU_PROJECT_ID"],
}
payload = {
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here...",
}
],
"text": {
"format": {
"type": "text"
}
},
}
response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(result["output"][0]["content"][0]["text"])
```
## Error handling
```python theme={null}
import requests
try:
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
result = response.json()
print(result["output"][0]["content"][0]["text"])
except requests.exceptions.HTTPError as e:
if e.response.status_code == 401:
print("Invalid API key")
else:
print(f"Error: {e.response.status_code}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
```
## Install dependencies
```bash theme={null}
pip install requests
```
# Ruby
Source: https://zerogpu.mintlify.app/sdks/ruby
Integrate ZeroGPU into your Ruby application.
## Basic usage
```ruby theme={null}
require 'net/http'
require 'uri'
require 'json'
uri = URI.parse('https://api.zerogpu.ai/v1/responses')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = (uri.scheme == 'https')
request = Net::HTTP::Post.new(uri.request_uri)
request['content-type'] = 'application/json'
request['x-api-key'] = 'YOUR_API_KEY'
request['x-project-id'] = 'YOUR_PROJECT_ID'
request.body = {
model: 'YOUR_MODEL',
input: [
{
role: 'user',
content: 'Your input text here...'
}
],
text: {
format: {
type: 'text'
}
}
}.to_json
response = http.request(request)
puts response.body
```
## Using environment variables
```ruby theme={null}
request['x-api-key'] = ENV['ZEROGPU_API_KEY']
request['x-project-id'] = ENV['ZEROGPU_PROJECT_ID']
```
## Error handling
```ruby theme={null}
response = http.request(request)
case response.code.to_i
when 200
result = JSON.parse(response.body)
puts result['output'][0]['content'][0]['text']
when 401
puts 'Invalid API key'
else
puts "Error #{response.code}: #{response.body}"
end
```
# Rust
Source: https://zerogpu.mintlify.app/sdks/rust
Integrate ZeroGPU into your Rust application.
## Basic usage
```rust theme={null}
use reqwest::Client;
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box> {
let client = Client::new();
let payload = json!({
"model": "YOUR_MODEL",
"input": [
{
"role": "user",
"content": "Your input text here..."
}
],
"text": {
"format": {
"type": "text"
}
}
});
let response = client
.post("https://api.zerogpu.ai/v1/responses")
.header("content-type", "application/json")
.header("x-api-key", "YOUR_API_KEY")
.header("x-project-id", "YOUR_PROJECT_ID")
.json(&payload)
.send()
.await?;
let body = response.text().await?;
println!("{}", body);
Ok(())
}
```
## Dependencies (Cargo.toml)
```toml theme={null}
[dependencies]
reqwest = { version = "0.12", features = ["json"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
```
## Using environment variables
```rust theme={null}
use std::env;
let api_key = env::var("ZEROGPU_API_KEY").expect("ZEROGPU_API_KEY not set");
let project_id = env::var("ZEROGPU_PROJECT_ID").expect("ZEROGPU_PROJECT_ID not set");
let response = client
.post("https://api.zerogpu.ai/v1/responses")
.header("content-type", "application/json")
.header("x-api-key", &api_key)
.header("x-project-id", &project_id)
.json(&payload)
.send()
.await?;
```