Cost Tracking¶

Machina tracks LLM costs per action via the ActionTracer. Every LLM call records token counts and estimated USD cost.

How It Works¶

When the agent makes an LLM call, the tracer records:

prompt_tokens — input tokens sent to the model
completion_tokens — output tokens generated
total_tokens — sum of prompt + completion
usd_cost — estimated cost based on the model's pricing
model — which model was used (e.g., openai/gpt-4o)

These fields appear in every trace entry where an LLM call occurs.

Analyzing Costs¶

Per-Conversation Cost¶

python -c "
import json, sys
total = 0
for line in open(sys.argv[1]):
    e = json.loads(line)
    cost = e.get('usd_cost', 0)
    if cost:
        total += cost
        print(f'  {e[\"action\"]}: \${cost:.4f} ({e[\"total_tokens\"]} tokens, {e[\"model\"]})')
print(f'\nTotal: \${total:.4f}')
" traces/2026-04-20_conv-abc123.jsonl

Daily Cost Summary¶

python -c "
import json, glob
total = 0
for f in sorted(glob.glob('traces/2026-04-20_*.jsonl')):
    for line in open(f):
        total += json.loads(line).get('usd_cost', 0)
print(f'Daily total: \${total:.4f}')
"

Cost Anomaly Detection¶

Flag conversations that exceed a budget threshold:

python -c "
import json, glob, sys
BUDGET = float(sys.argv[1])  # e.g., 0.50
for f in sorted(glob.glob('traces/*.jsonl')):
    total = sum(json.loads(l).get('usd_cost', 0) for l in open(f))
    if total > BUDGET:
        print(f'OVER BUDGET: {f} — \${total:.4f}')
" 0.50

Cost Benchmarks¶

Typical costs for common maintenance operations (GPT-4o pricing):

Operation	Tokens	Est. Cost
Simple asset lookup + response	500–1,500	$0.01–$0.03
Failure diagnosis with manual search	2,000–5,000	$0.03–$0.08
Work order creation with context	1,500–3,000	$0.02–$0.05
Full alarm-to-WO workflow (6 steps)	5,000–15,000	$0.08–$0.25

Costs vary significantly by model. Ollama (local) is free. Claude and GPT-4o have similar per-token pricing. Smaller models (GPT-4o-mini, Haiku) can reduce costs 5–10x for simpler tasks.

Cost Optimization¶

Use cheaper models for simple tasks. Asset lookups and spare part checks don't need GPT-4o — GPT-4o-mini or Haiku work fine.
Reduce context size. The entity resolver and context injection functions control how much data is sent to the LLM. Trim large asset registries or maintenance histories to relevant subsets.
Cache common queries. If technicians ask the same questions repeatedly, consider caching LLM responses (not built into Machina — implement at the application layer).
Monitor with budgets. Use the per-conversation cost analysis above to set alerts and catch runaway conversations.