Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.valiqor.com/llms.txt

Use this file to discover all available pages before exploring further.

Valiqor automatically traces Ollama API calls by intercepting HTTP requests to the Ollama server. Both /api/chat and /api/generate endpoints are captured with model name, token usage, and response content.
Ollama tracing works by detecting HTTP requests to the Ollama API server. No additional install extras are needed — Ollama support is built into the core valiqor package.

Install

pip install valiqor
No extra dependencies are needed. Ollama tracing uses the requests library which is already a core dependency. Make sure you have Ollama installed and running locally with a model pulled:
ollama pull llama3.2

Add a single import at the top of your app — all Ollama calls are automatically traced:
import valiqor.auto  # ← Add this line

import requests

# Chat endpoint
response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "stream": False
})
print(response.json()["message"]["content"])

Selective Instrumentation

If you only want Ollama tracing:
from valiqor.trace import ollama_autolog

ollama_autolog()

Chat Endpoint

The /api/chat endpoint uses the messages format:
import valiqor.auto
import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.2",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is Python?"}
    ],
    "stream": False
})

result = response.json()
print(result["message"]["content"])

Generate Endpoint

The /api/generate endpoint uses a prompt string:
import valiqor.auto
import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "llama3.2",
    "prompt": "Explain machine learning in simple terms",
    "stream": False
})

result = response.json()
print(result["response"])

What Gets Captured

Each traced Ollama call records:
FieldDescription
modelModel name (e.g. llama3.2, mistral)
endpointAPI endpoint (chat or generate)
prompt_tokensInput token count (from prompt_eval_count)
completion_tokensOutput token count (from eval_count)
messagesUser prompt or messages
responseModel response text
duration_msCall latency
statusSuccess or error

With Workflows

Group multiple Ollama calls into a single trace:
import valiqor.auto
from valiqor.trace import trace_workflow
import requests

OLLAMA_URL = "http://localhost:11434/api/chat"

with trace_workflow("local-assistant"):
    # Step 1: Research
    research = requests.post(OLLAMA_URL, json={
        "model": "llama3.2",
        "messages": [{"role": "user", "content": "What are the key facts about Mars?"}],
        "stream": False
    })

    # Step 2: Summarize
    summary = requests.post(OLLAMA_URL, json={
        "model": "llama3.2",
        "messages": [
            {"role": "user", "content": f"Summarize: {research.json()['message']['content']}"}
        ],
        "stream": False
    })

Disabling

from valiqor.trace import disable_autolog

disable_autolog("ollama")    # Disable Ollama only
disable_autolog()            # Disable all providers

Limitations

  • Sync only — only synchronous requests.post calls are traced. Async HTTP clients (e.g. httpx, aiohttp) are not intercepted.
  • Streaming is not instrumented — set "stream": False in your requests for traces to be captured.
  • Ollama Python library — if you use the ollama Python package instead of raw requests, it works as long as it uses requests.post internally.

Next Steps

Tracing Guide

Learn about traces, spans, workflows, and exporters.

Failure Analysis

Run failure analysis on your traced Ollama calls.