Skip to main content
Valiqor automatically traces Ollama API calls by intercepting HTTP requests to the Ollama server. Both /api/chat and /api/generate endpoints are captured with model name, token usage, and response content.
Ollama tracing works by detecting HTTP requests to the Ollama API server. No additional install extras are needed — Ollama support is built into the core valiqor package.

Install

pip install valiqor
No extra dependencies are needed. Ollama tracing uses the requests library which is already a core dependency. Make sure you have Ollama installed and running locally with a model pulled:
ollama pull llama3.2

Add a single import at the top of your app — all Ollama calls are automatically traced:
import valiqor.auto  # ← Add this line

import requests

# Chat endpoint
response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "stream": False
})
print(response.json()["message"]["content"])

Selective Instrumentation

If you only want Ollama tracing:
from valiqor.trace import ollama_autolog

ollama_autolog()

Chat Endpoint

The /api/chat endpoint uses the messages format:
import valiqor.auto
import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.2",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is Python?"}
    ],
    "stream": False
})

result = response.json()
print(result["message"]["content"])

Generate Endpoint

The /api/generate endpoint uses a prompt string:
import valiqor.auto
import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "llama3.2",
    "prompt": "Explain machine learning in simple terms",
    "stream": False
})

result = response.json()
print(result["response"])

What Gets Captured

Each traced Ollama call records:
FieldDescription
modelModel name (e.g. llama3.2, mistral)
endpointAPI endpoint (chat or generate)
prompt_tokensInput token count (from prompt_eval_count)
completion_tokensOutput token count (from eval_count)
messagesUser prompt or messages
responseModel response text
duration_msCall latency
statusSuccess or error

With Workflows

Group multiple Ollama calls into a single trace:
import valiqor.auto
from valiqor.trace import trace_workflow
import requests

OLLAMA_URL = "http://localhost:11434/api/chat"

with trace_workflow("local-assistant"):
    # Step 1: Research
    research = requests.post(OLLAMA_URL, json={
        "model": "llama3.2",
        "messages": [{"role": "user", "content": "What are the key facts about Mars?"}],
        "stream": False
    })

    # Step 2: Summarize
    summary = requests.post(OLLAMA_URL, json={
        "model": "llama3.2",
        "messages": [
            {"role": "user", "content": f"Summarize: {research.json()['message']['content']}"}
        ],
        "stream": False
    })

Disabling

from valiqor.trace import disable_autolog

disable_autolog("ollama")    # Disable Ollama only
disable_autolog()            # Disable all providers

Limitations

  • Sync only — only synchronous requests.post calls are traced. Async HTTP clients (e.g. httpx, aiohttp) are not intercepted.
  • Streaming is not instrumented — set "stream": False in your requests for traces to be captured.
  • Ollama Python library — if you use the ollama Python package instead of raw requests, it works as long as it uses requests.post internally.

Next Steps