No description
Find a file
Burak Emir 1af81b5edf
Strip content from list/search results; truncate oversized tool output
noteworx_list_notes and noteworx_search_notes now remove the `content`
field from each note before returning to the model — the full text of
every note was blowing the 8K context window. The model only needs
slug, title, and tags to decide which note to read.

Also added a 6000-char truncation guard on all tool results as a safety
net against future oversized responses.

Verified: 4-turn noteworx integration test passes — list notes, read
one, create attachment, report back. 42s total, 3 tool calls.
2026-04-17 12:22:14 +02:00
config Initial agent-harness skeleton 2026-04-09 20:31:02 +02:00
deploy Initial agent-harness skeleton 2026-04-09 20:31:02 +02:00
src Strip content from list/search results; truncate oversized tool output 2026-04-17 12:22:14 +02:00
tasks/inbox Initial agent-harness skeleton 2026-04-09 20:31:02 +02:00
.gitignore Poll on async completion; use reasoning_content; TLS native roots 2026-04-09 21:13:41 +02:00
Cargo.lock Poll on async completion; use reasoning_content; TLS native roots 2026-04-09 21:13:41 +02:00
Cargo.toml Poll on async completion; use reasoning_content; TLS native roots 2026-04-09 21:13:41 +02:00
README.md Initial agent-harness skeleton 2026-04-09 20:31:02 +02:00

agent-harness

Multi-turn agent harness for Gemma 4 served via the runpod-gemma4 RunPod serverless endpoint. Reads task files from an inbox directory, calls the model in a tool-use loop, and posts the results back to noteworx as signed text attachments.

Architecture

  systemd timer (daily/hourly)
           │
           ▼
  agent-harness run
           │
           ├──► read tasks/inbox/*.md
           │
           ▼
  multi-turn tool-use loop
     │             │
     │             ├──► webfetch  (URL → text)
     │             │
     │             └──► noteworx  (list/search/read notes,
     │                             read/create text attachments)
     │
     │  (thinking channel stripped from replayed messages but
     │   preserved in logs)
     │
     ├──► POST /runsync  (RunPod → llama-server → Gemma 4)
     │
     └──► logs/<task>/<run>/{conversation,tools}.jsonl + summary.json

Each task invocation is independent. Task files carry their own context; nothing is remembered across runs. The harness is a one-shot: systemd or cron triggers it, it drains (or runs a specific entry in) the inbox, and exits.

Task files

A task is a markdown file under tasks/inbox/. The filename (sans .md) is the task name and appears in log paths. The file body is sent as the initial user message. Optional TOML frontmatter (delimited by +++) carries per-task overrides:

+++
max_turns = 5
tools = ["webfetch", "noteworx_read_note", "noteworx_create_attachment"]
system_prompt = "You are a concise research assistant."
+++

# Weekly research scan

Find all notes tagged `research-active` and, for each URL they mention,
fetch a short summary and attach it to the note.

After a successful run the file moves to tasks/done/<ts>-<name>.md; on failure it moves to tasks/failed/<ts>-<name>.md.

Tools

Name Purpose
webfetch HTTP GET with size and timeout caps
noteworx_list_notes List notes (optional space/tag filters)
noteworx_search_notes Full-text, title, or semantic search
noteworx_read_note Read a single note by slug
noteworx_list_attachments List attachments on a note
noteworx_read_attachment Read a text attachment
noteworx_create_attachment Create a suggestion attachment with agent signature

All noteworx tools authenticate with Authorization: Bearer nwx_… (from the NOTEWORX_TOKEN env var by default). llama.cpp's server translates the OpenAI-format tools array into Gemma 4's native chat template, so we don't have to parse the raw tool-call markup.

Suggestion attachments

When the agent calls noteworx_create_attachment, the content is wrapped in YAML frontmatter identifying the run, so the noteworx UI can mark it as an agent-generated suggestion:

---
agent: noteworx-agent-harness
agent_name: default-agent
agent_version: 0.1.0
agent_task: weekly-research-scan
agent_run_id: 20260409T061500Z
agent_created_at: 2026-04-09T06:15:14Z
parent_note: some-note-slug
---

...model-produced body...

The model is told in its tool description not to add its own frontmatter; the wrapper is applied server-side-of-the-tool.

Thinking channel

Requests include chat_template_kwargs = { enable_thinking = true } so Gemma 4 produces chain-of-thought output. The full response (including thinking) is persisted to conversation.jsonl and surfaced in run.log, but the thinking block is stripped from the assistant message before it's appended back to the conversation for the next turn — this keeps context lean without losing observability.

Running locally

export RUNPOD_API_KEY=rp_…
export NOTEWORX_TOKEN=nwx_…

cp config/agent.toml.example agent.toml
$EDITOR agent.toml

# Smoke-test the endpoint
cargo run --release -- --config agent.toml test-runpod

# Run everything in the inbox
cargo run --release -- --config agent.toml run

# Or a specific task
cargo run --release -- --config agent.toml run-task example

Deploying to the VPS

See deploy/README.md.

Logs

Per-run:

logs/
└── <task-name>/
    └── 20260409T061500Z/
        ├── conversation.jsonl   # every API request and response, raw
        ├── tools.jsonl          # tool calls + results
        ├── run.log              # human-readable trace
        └── summary.json         # turns, tokens, runpod_exec_ms, status

Configuration

See config/agent.toml.example. Key sections:

  • [runpod] — endpoint id and API key env var name
  • [paths] — inbox, done, failed, logs locations
  • [limits]max_turns, max_wall_secs, max_total_tokens
  • [model] — sampling (temp/top_p/top_k) + chat_template_kwargs
  • [tools.webfetch] — per-tool settings
  • [tools.noteworx] — base URL, token env var, agent identity name