- Rust 100%
noteworx_list_notes and noteworx_search_notes now remove the `content` field from each note before returning to the model — the full text of every note was blowing the 8K context window. The model only needs slug, title, and tags to decide which note to read. Also added a 6000-char truncation guard on all tool results as a safety net against future oversized responses. Verified: 4-turn noteworx integration test passes — list notes, read one, create attachment, report back. 42s total, 3 tool calls. |
||
|---|---|---|
| config | ||
| deploy | ||
| src | ||
| tasks/inbox | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| README.md | ||
agent-harness
Multi-turn agent harness for Gemma 4
served via the runpod-gemma4 RunPod serverless
endpoint. Reads task files from an inbox directory, calls the model in a
tool-use loop, and posts the results back to noteworx as signed text
attachments.
Architecture
systemd timer (daily/hourly)
│
▼
agent-harness run
│
├──► read tasks/inbox/*.md
│
▼
multi-turn tool-use loop
│ │
│ ├──► webfetch (URL → text)
│ │
│ └──► noteworx (list/search/read notes,
│ read/create text attachments)
│
│ (thinking channel stripped from replayed messages but
│ preserved in logs)
│
├──► POST /runsync (RunPod → llama-server → Gemma 4)
│
└──► logs/<task>/<run>/{conversation,tools}.jsonl + summary.json
Each task invocation is independent. Task files carry their own context; nothing is remembered across runs. The harness is a one-shot: systemd or cron triggers it, it drains (or runs a specific entry in) the inbox, and exits.
Task files
A task is a markdown file under tasks/inbox/. The filename (sans
.md) is the task name and appears in log paths. The file body is sent
as the initial user message. Optional TOML frontmatter (delimited by
+++) carries per-task overrides:
+++
max_turns = 5
tools = ["webfetch", "noteworx_read_note", "noteworx_create_attachment"]
system_prompt = "You are a concise research assistant."
+++
# Weekly research scan
Find all notes tagged `research-active` and, for each URL they mention,
fetch a short summary and attach it to the note.
After a successful run the file moves to tasks/done/<ts>-<name>.md;
on failure it moves to tasks/failed/<ts>-<name>.md.
Tools
| Name | Purpose |
|---|---|
webfetch |
HTTP GET with size and timeout caps |
noteworx_list_notes |
List notes (optional space/tag filters) |
noteworx_search_notes |
Full-text, title, or semantic search |
noteworx_read_note |
Read a single note by slug |
noteworx_list_attachments |
List attachments on a note |
noteworx_read_attachment |
Read a text attachment |
noteworx_create_attachment |
Create a suggestion attachment with agent signature |
All noteworx tools authenticate with Authorization: Bearer nwx_…
(from the NOTEWORX_TOKEN env var by default). llama.cpp's server
translates the OpenAI-format tools array into Gemma 4's native chat
template, so we don't have to parse the raw tool-call markup.
Suggestion attachments
When the agent calls noteworx_create_attachment, the content is
wrapped in YAML frontmatter identifying the run, so the noteworx UI can
mark it as an agent-generated suggestion:
---
agent: noteworx-agent-harness
agent_name: default-agent
agent_version: 0.1.0
agent_task: weekly-research-scan
agent_run_id: 20260409T061500Z
agent_created_at: 2026-04-09T06:15:14Z
parent_note: some-note-slug
---
...model-produced body...
The model is told in its tool description not to add its own frontmatter; the wrapper is applied server-side-of-the-tool.
Thinking channel
Requests include chat_template_kwargs = { enable_thinking = true } so
Gemma 4 produces chain-of-thought output. The full response (including
thinking) is persisted to conversation.jsonl and surfaced in
run.log, but the thinking block is stripped from the assistant message
before it's appended back to the conversation for the next turn — this
keeps context lean without losing observability.
Running locally
export RUNPOD_API_KEY=rp_…
export NOTEWORX_TOKEN=nwx_…
cp config/agent.toml.example agent.toml
$EDITOR agent.toml
# Smoke-test the endpoint
cargo run --release -- --config agent.toml test-runpod
# Run everything in the inbox
cargo run --release -- --config agent.toml run
# Or a specific task
cargo run --release -- --config agent.toml run-task example
Deploying to the VPS
See deploy/README.md.
Logs
Per-run:
logs/
└── <task-name>/
└── 20260409T061500Z/
├── conversation.jsonl # every API request and response, raw
├── tools.jsonl # tool calls + results
├── run.log # human-readable trace
└── summary.json # turns, tokens, runpod_exec_ms, status
Configuration
See config/agent.toml.example. Key
sections:
[runpod]— endpoint id and API key env var name[paths]— inbox, done, failed, logs locations[limits]—max_turns,max_wall_secs,max_total_tokens[model]— sampling (temp/top_p/top_k) +chat_template_kwargs[tools.webfetch]— per-tool settings[tools.noteworx]— base URL, token env var, agent identity name