Valiqor automatically traces Ollama API calls by intercepting HTTP requests to the Ollama server. BothDocumentation Index
Fetch the complete documentation index at: https://docs.valiqor.com/llms.txt
Use this file to discover all available pages before exploring further.
/api/chat and /api/generate
endpoints are captured with model name, token usage, and response content.
Ollama tracing works by detecting HTTP requests to the Ollama API server.
No additional install extras are needed — Ollama support is built into the
core
valiqor package.Install
requests
library which is already a core dependency.
Make sure you have Ollama installed and running
locally with a model pulled:
Zero-Config (Recommended)
Add a single import at the top of your app — all Ollama calls are automatically traced:Selective Instrumentation
If you only want Ollama tracing:Chat Endpoint
The/api/chat endpoint uses the messages format:
Generate Endpoint
The/api/generate endpoint uses a prompt string:
What Gets Captured
Each traced Ollama call records:| Field | Description |
|---|---|
model | Model name (e.g. llama3.2, mistral) |
endpoint | API endpoint (chat or generate) |
prompt_tokens | Input token count (from prompt_eval_count) |
completion_tokens | Output token count (from eval_count) |
messages | User prompt or messages |
response | Model response text |
duration_ms | Call latency |
status | Success or error |
With Workflows
Group multiple Ollama calls into a single trace:Disabling
Limitations
- Sync only — only synchronous
requests.postcalls are traced. Async HTTP clients (e.g.httpx,aiohttp) are not intercepted. - Streaming is not instrumented — set
"stream": Falsein your requests for traces to be captured. - Ollama Python library — if you use the
ollamaPython package instead of rawrequests, it works as long as it usesrequests.postinternally.
Next Steps
Tracing Guide
Learn about traces, spans, workflows, and exporters.
Failure Analysis
Run failure analysis on your traced Ollama calls.