Local-first · Rust · local LLMs

A morning briefing that
improves itself every day.

ibrief builds a curated briefing overnight, tailored to one specific person — then measures what was actually useful in order to adapt. Five minutes of signal instead of hours of noise. Fully local on an Apple Silicon machine.

  • grow
  • stay current
  • decide well
  • better conversations

Design philosophy

The memory is the product — not the model.

ibrief follows an "LLM-OS" stance: the LLM is a swappable compute core, the value lives in persistent, versioned state. Self-improvement stays on a leash — propose, verify against real feedback, safeguard, adopt, roll back.

// swappable

Model ≠ lock-in

Only data artifacts are learned — weights, prompts, sources. Never the application code. The config survives any model swap.

// autonomy slider

On a leash

Every proposed change passes a safety gate before becoming the default. Verification is cheaper than generation — propose a lot, adopt little (only what's checked).

// anti-bubble

Engagement isn't the goal

A naive engagement optimizer makes you more comfortable but dumber. ibrief deliberately sacrifices short-term score to preserve diversity and anti-bubble invariants.

// gated autoresearch

Depth, controlled

Multi-step research (Karpathy-style) is powerful but expensive and hallucination-prone — so it is a budgeted, citation-required module, not a free-running agent.

The self-improvement circle

Deliver → Feedback → Evaluate → Learn → Safeguard.

The same loop runs every night. What was useful gets reinforced; what wasn't fades — within hard safety bounds.

01

Deliver

build & push the briefing

02

Feedback

👍/👎 via Telegram buttons

03

Eval

behavior + judge + structure

04

Learn

weights · prompts · sources

05

Gate

check · adopt · rollback

↺ The behavioral score is ground truth; the LLM judge mainly helps in cold-start and is calibrated against real feedback.

Three axes, each safeguarded

Self-improvement isn't one feature — it's three.

AxisLeverSafeguard
Weights
sources & topics
Multi-armed bandit (Thompson / Beta sampling) with an exploration floor [0.2, 2.0]
Safety gate: bounds + source diversity (max 50%), versioned, rollback
Prompts
e.g. TL;DR synthesis
An optimizer LLM generates variants; a shadow test scores both on the same input
Adopted only on a clear judge margin; prompt versioning + experiment log
Sources
feed registry
Quality scoring from feedback + selection frequency; weak ones retired
Drift watchdog (HHI): on narrowing, pruning is suspended — breadth over score

The nightly pipeline

From raw feed to curated briefing.

IngestDedupEnrichScore (weights)CurateTL;DR (learned prompt)RenderPersistPush

The bulk tier (summarizing, tagging) runs on a fast small model, the synthesis tier (curation, judge) on a 70B-class model — both local. The entire learning cycle runs overnight essentially for free; a frontier model is pulled in only optionally for judge calibration through an existing subscription (claude -p).

Anatomy of a briefing

Three sections with clear intent.

The last two cannot be optimized away by personalization. The learning loop cannot optimize you into a comfortable bubble.

Highlights

The most important stories of the day — personalized through the learned weights. Plus "The 3 things today" as an executive summary.

Counterpoint fixed

A seriously argued, steel-manned counter-argument to your own position — so you know the other side in conversation instead of arguing inside the bubble.

Wildcard fixed

A deliberately surprising, real article outside your usual interests — preferably from a source that didn't make the main selection.

Roadmap — fully implemented

Six milestones, each useful on its own.

M1

Static briefing

Ingest → Enrich → Curate → Render against local Ollama.

M2

Persistence & feedback

SQLite (sqlx), dedup, briefing records; Telegram push with 👍/👎 buttons.

M3

Eval engine

Behavioral score + LLM judge against a versioned rubric + deterministic structural checks.

M4

Learning: weights

Thompson bandit over sources/topics, safety gate, config versioning & rollback.

M5

Learning: prompts

Optimizer LLM + shadow test; a temporal A/B decision primitive.

M6

Source evolution + AutoResearch

Scoring, pruning, drift watchdog; gated deep-research loop with citation verification.

Quick start

Local in two minutes.

Requirement: Ollama with one bulk and one synthesis model. On smaller hardware just put smaller models in config/profile.toml.

# pull models
ollama pull qwen2.5:14b      # bulk tier (enrich)
ollama pull llama3.3:70b     # synthesis tier (TL;DR, judge)

# build a briefing
cargo run -p ibrief-app                 # = brief

# learn & optimize
cargo run -p ibrief-app -- eval       # score today's briefing
cargo run -p ibrief-app -- learn      # learn weights (Thompson + gate)
cargo run -p ibrief-app -- optimize   # improve the TL;DR prompt via shadow test
cargo run -p ibrief-app -- sources evolve   # score sources + drift watchdog
cargo run -p ibrief-app -- research "What's new in local LLMs?"

Tech stack

Rust 1.95 · Edition 2024Tokio asyncOllama local LLMssqlx + SQLitefeed-rs ingestreqwest Telegram APIrand_distr Thompson12 crates · CI green

FAQ

Frequently asked.

What is ibrief?

A self-improving, personalized morning briefing. It builds a curated briefing overnight and adapts daily to measured feedback — fully local in Rust, using local LLMs via Ollama.

How does ibrief improve itself?

Across three safeguarded axes: weights (Thompson bandit), prompts (shadow test) and sources (scoring plus a drift watchdog). Every change passes a safety gate and is versioned and reversible.

How does ibrief avoid a filter bubble?

Engagement is not the optimization target. Two sections — Counterpoint and Wildcard — cannot be disabled, and a drift watchdog enforces source diversity even at the cost of the short-term score.

Do I need a cloud API?

No. ibrief runs fully local with Ollama. A cloud frontier model is optional and only used for periodic judge calibration through an existing subscription (claude -p).