Valiqor automatically traces Ollama API calls by intercepting HTTP
requests to the Ollama server. Both /api/chat and /api/generate
endpoints are captured with model name, token usage, and response content.
Ollama tracing works by detecting HTTP requests to the Ollama API server.
No additional install extras are needed — Ollama support is built into the
core valiqor package.
Install
No extra dependencies are needed. Ollama tracing uses the requests
library which is already a core dependency.
Make sure you have Ollama installed and running
locally with a model pulled:
Zero-Config (Recommended)
Add a single import at the top of your app — all Ollama calls are
automatically traced:
import valiqor.auto # ← Add this line
import requests
# Chat endpoint
response = requests.post("http://localhost:11434/api/chat", json={
"model": "llama3.2",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"stream": False
})
print(response.json()["message"]["content"])
Selective Instrumentation
If you only want Ollama tracing:
from valiqor.trace import ollama_autolog
ollama_autolog()
Chat Endpoint
The /api/chat endpoint uses the messages format:
import valiqor.auto
import requests
response = requests.post("http://localhost:11434/api/chat", json={
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is Python?"}
],
"stream": False
})
result = response.json()
print(result["message"]["content"])
Generate Endpoint
The /api/generate endpoint uses a prompt string:
import valiqor.auto
import requests
response = requests.post("http://localhost:11434/api/generate", json={
"model": "llama3.2",
"prompt": "Explain machine learning in simple terms",
"stream": False
})
result = response.json()
print(result["response"])
What Gets Captured
Each traced Ollama call records:
| Field | Description |
|---|
model | Model name (e.g. llama3.2, mistral) |
endpoint | API endpoint (chat or generate) |
prompt_tokens | Input token count (from prompt_eval_count) |
completion_tokens | Output token count (from eval_count) |
messages | User prompt or messages |
response | Model response text |
duration_ms | Call latency |
status | Success or error |
With Workflows
Group multiple Ollama calls into a single trace:
import valiqor.auto
from valiqor.trace import trace_workflow
import requests
OLLAMA_URL = "http://localhost:11434/api/chat"
with trace_workflow("local-assistant"):
# Step 1: Research
research = requests.post(OLLAMA_URL, json={
"model": "llama3.2",
"messages": [{"role": "user", "content": "What are the key facts about Mars?"}],
"stream": False
})
# Step 2: Summarize
summary = requests.post(OLLAMA_URL, json={
"model": "llama3.2",
"messages": [
{"role": "user", "content": f"Summarize: {research.json()['message']['content']}"}
],
"stream": False
})
Disabling
from valiqor.trace import disable_autolog
disable_autolog("ollama") # Disable Ollama only
disable_autolog() # Disable all providers
Limitations
- Sync only — only synchronous
requests.post calls are traced.
Async HTTP clients (e.g. httpx, aiohttp) are not intercepted.
- Streaming is not instrumented — set
"stream": False in your
requests for traces to be captured.
- Ollama Python library — if you use the
ollama Python package
instead of raw requests, it works as long as it uses requests.post
internally.
Next Steps