No description

Rust 100%

Find a file

Burak Emir 1af81b5edf Strip content from list/search results; truncate oversized tool output noteworx_list_notes and noteworx_search_notes now remove the `content` field from each note before returning to the model — the full text of every note was blowing the 8K context window. The model only needs slug, title, and tags to decide which note to read. Also added a 6000-char truncation guard on all tool results as a safety net against future oversized responses. Verified: 4-turn noteworx integration test passes — list notes, read one, create attachment, report back. 42s total, 3 tool calls.		2026-04-17 12:22:14 +02:00
config	Initial agent-harness skeleton	2026-04-09 20:31:02 +02:00
deploy	Initial agent-harness skeleton	2026-04-09 20:31:02 +02:00
src	Strip content from list/search results; truncate oversized tool output	2026-04-17 12:22:14 +02:00
tasks/inbox	Initial agent-harness skeleton	2026-04-09 20:31:02 +02:00
.gitignore	Poll on async completion; use reasoning_content; TLS native roots	2026-04-09 21:13:41 +02:00
Cargo.lock	Poll on async completion; use reasoning_content; TLS native roots	2026-04-09 21:13:41 +02:00
Cargo.toml	Poll on async completion; use reasoning_content; TLS native roots	2026-04-09 21:13:41 +02:00
README.md	Initial agent-harness skeleton	2026-04-09 20:31:02 +02:00

README.md

agent-harness

Multi-turn agent harness for Gemma 4 served via the runpod-gemma4 RunPod serverless endpoint. Reads task files from an inbox directory, calls the model in a tool-use loop, and posts the results back to noteworx as signed text attachments.

Architecture

  systemd timer (daily/hourly)
           │
           ▼
  agent-harness run
           │
           ├──► read tasks/inbox/*.md
           │
           ▼
  multi-turn tool-use loop
     │             │
     │             ├──► webfetch  (URL → text)
     │             │
     │             └──► noteworx  (list/search/read notes,
     │                             read/create text attachments)
     │
     │  (thinking channel stripped from replayed messages but
     │   preserved in logs)
     │
     ├──► POST /runsync  (RunPod → llama-server → Gemma 4)
     │
     └──► logs/<task>/<run>/{conversation,tools}.jsonl + summary.json

Each task invocation is independent. Task files carry their own context; nothing is remembered across runs. The harness is a one-shot: systemd or cron triggers it, it drains (or runs a specific entry in) the inbox, and exits.

Task files

A task is a markdown file under tasks/inbox/. The filename (sans .md) is the task name and appears in log paths. The file body is sent as the initial user message. Optional TOML frontmatter (delimited by +++) carries per-task overrides:

+++
max_turns = 5
tools = ["webfetch", "noteworx_read_note", "noteworx_create_attachment"]
system_prompt = "You are a concise research assistant."
+++

# Weekly research scan

Find all notes tagged `research-active` and, for each URL they mention,
fetch a short summary and attach it to the note.

After a successful run the file moves to tasks/done/<ts>-<name>.md; on failure it moves to tasks/failed/<ts>-<name>.md.

Tools

Name	Purpose
`webfetch`	HTTP GET with size and timeout caps
`noteworx_list_notes`	List notes (optional space/tag filters)
`noteworx_search_notes`	Full-text, title, or semantic search
`noteworx_read_note`	Read a single note by slug
`noteworx_list_attachments`	List attachments on a note
`noteworx_read_attachment`	Read a text attachment
`noteworx_create_attachment`	Create a suggestion attachment with agent signature

All noteworx tools authenticate with Authorization: Bearer nwx_… (from the NOTEWORX_TOKEN env var by default). llama.cpp's server translates the OpenAI-format tools array into Gemma 4's native chat template, so we don't have to parse the raw tool-call markup.

Suggestion attachments

When the agent calls noteworx_create_attachment, the content is wrapped in YAML frontmatter identifying the run, so the noteworx UI can mark it as an agent-generated suggestion:

---
agent: noteworx-agent-harness
agent_name: default-agent
agent_version: 0.1.0
agent_task: weekly-research-scan
agent_run_id: 20260409T061500Z
agent_created_at: 2026-04-09T06:15:14Z
parent_note: some-note-slug
---

...model-produced body...

The model is told in its tool description not to add its own frontmatter; the wrapper is applied server-side-of-the-tool.

Thinking channel

Requests include chat_template_kwargs = { enable_thinking = true } so Gemma 4 produces chain-of-thought output. The full response (including thinking) is persisted to conversation.jsonl and surfaced in run.log, but the thinking block is stripped from the assistant message before it's appended back to the conversation for the next turn — this keeps context lean without losing observability.

Running locally

export RUNPOD_API_KEY=rp_…
export NOTEWORX_TOKEN=nwx_…

cp config/agent.toml.example agent.toml
$EDITOR agent.toml

# Smoke-test the endpoint
cargo run --release -- --config agent.toml test-runpod

# Run everything in the inbox
cargo run --release -- --config agent.toml run

# Or a specific task
cargo run --release -- --config agent.toml run-task example

Deploying to the VPS

See deploy/README.md.

Logs

Per-run:

logs/
└── <task-name>/
    └── 20260409T061500Z/
        ├── conversation.jsonl   # every API request and response, raw
        ├── tools.jsonl          # tool calls + results
        ├── run.log              # human-readable trace
        └── summary.json         # turns, tokens, runpod_exec_ms, status

Configuration

See config/agent.toml.example. Key sections:

[runpod] — endpoint id and API key env var name
[paths] — inbox, done, failed, logs locations
[limits] — max_turns, max_wall_secs, max_total_tokens
[model] — sampling (temp/top_p/top_k) + chat_template_kwargs
[tools.webfetch] — per-tool settings
[tools.noteworx] — base URL, token env var, agent identity name