Ollama

Valiqor automatically traces Ollama API calls by intercepting HTTP requests to the Ollama server. Both /api/chat and /api/generate endpoints are captured with model name, token usage, and response content.

Ollama tracing works by detecting HTTP requests to the Ollama API server. No additional install extras are needed — Ollama support is built into the core valiqor package.

Install

pip install valiqor

No extra dependencies are needed. Ollama tracing uses the requests library which is already a core dependency. Make sure you have Ollama installed and running locally with a model pulled:

ollama pull llama3.2

Zero-Config (Recommended)

Add a single import at the top of your app — all Ollama calls are automatically traced:

import valiqor.auto  # ← Add this line

import requests

# Chat endpoint
response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "stream": False
})
print(response.json()["message"]["content"])

Selective Instrumentation

If you only want Ollama tracing:

from valiqor.trace import ollama_autolog

ollama_autolog()

Chat Endpoint

The /api/chat endpoint uses the messages format:

import valiqor.auto
import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.2",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is Python?"}
    ],
    "stream": False
})

result = response.json()
print(result["message"]["content"])

Generate Endpoint

The /api/generate endpoint uses a prompt string:

import valiqor.auto
import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "llama3.2",
    "prompt": "Explain machine learning in simple terms",
    "stream": False
})

result = response.json()
print(result["response"])

What Gets Captured

Each traced Ollama call records:

Field	Description
`model`	Model name (e.g. `llama3.2`, `mistral`)
`endpoint`	API endpoint (`chat` or `generate`)
`prompt_tokens`	Input token count (from `prompt_eval_count`)
`completion_tokens`	Output token count (from `eval_count`)
`messages`	User prompt or messages
`response`	Model response text
`duration_ms`	Call latency
`status`	Success or error

With Workflows

Group multiple Ollama calls into a single trace:

import valiqor.auto
from valiqor.trace import trace_workflow
import requests

OLLAMA_URL = "http://localhost:11434/api/chat"

with trace_workflow("local-assistant"):
    # Step 1: Research
    research = requests.post(OLLAMA_URL, json={
        "model": "llama3.2",
        "messages": [{"role": "user", "content": "What are the key facts about Mars?"}],
        "stream": False
    })

    # Step 2: Summarize
    summary = requests.post(OLLAMA_URL, json={
        "model": "llama3.2",
        "messages": [
            {"role": "user", "content": f"Summarize: {research.json()['message']['content']}"}
        ],
        "stream": False
    })

Disabling

from valiqor.trace import disable_autolog

disable_autolog("ollama")    # Disable Ollama only
disable_autolog()            # Disable all providers

Limitations

Sync only — only synchronous requests.post calls are traced. Async HTTP clients (e.g. httpx, aiohttp) are not intercepted.
Streaming is not instrumented — set "stream": False in your requests for traces to be captured.
Ollama Python library — if you use the ollama Python package instead of raw requests, it works as long as it uses requests.post internally.

Start Here

Core Workflows

Concepts

Integrations

Resources

Install

Zero-Config (Recommended)

Selective Instrumentation

Chat Endpoint

Generate Endpoint

What Gets Captured

With Workflows

Disabling

Limitations

Next Steps

Tracing Guide

Failure Analysis

Start Here

Core Workflows

Concepts

Integrations

Resources

Documentation Index

​Install

​Zero-Config (Recommended)

​Selective Instrumentation

​Chat Endpoint

​Generate Endpoint

​What Gets Captured

​With Workflows

​Disabling

​Limitations

​Next Steps

Tracing Guide

Failure Analysis

Install

Zero-Config (Recommended)

Selective Instrumentation

Chat Endpoint

Generate Endpoint

What Gets Captured

With Workflows

Disabling

Limitations

Next Steps