The Experience Compiler — ground-up rework of distillation + graph#

Status: DESIGN APPROVED FOR IMPLEMENTATION (operator demanded a qualitative leap, not patches). Replaces the reflect→playbook→ skillify pipeline and the Impact/graph surface. Implementation runs from this document.

0. The paradigm error being corrected#

The current pipeline distills "whatever a single project's recent log supports". That is distillation for distillation's sake: even with quality gates, it produces descriptions of things that happened once. The operator's definition of value is different:

A skill is something we REPEAT. Its value = how much manual time it removes from future work.

So the unit of distillation is not "a session log" — it is a recurring procedure across the whole history of work, and the output is not prose — it is a compiled, executable, feedback-tracked artifact.

1. The pipeline (replaces Reflector)#

[object Promise]

Key mechanical guarantees (not prompt hopes):

nothing with <2 successful occurrences is ever synthesised — recurrence is computed by clustering, the LLM never decides it;
pitfall candidates keep today's gate but require either recurrence OR a failed trace with a root cause + working alternative;
every candidate carries saves_minutes_estimate = avg(trace duration) × recurrence — the ranking key.

2. Data model#

[object Promise]

3. UI — the workbench becomes a decision queue#

Distillation tab (full replacement of the two-column view):

[object Promise]

Every candidate answers "为什么值得": recurrence, success ratio, time saved, evidence quotes inline-expandable.
Compile choice is the operator's; scripts always generated with --dry-run and shown for review before "编译并启用".
Skill rows show effectiveness (loaded→succeeded ratio), not just use count; retire proposals appear here.

4. Graph → the compiler's x-ray (replaces Impact-as-list)#

The graph earns its place by serving the pipeline, not by existing:

Recurrence map (primary view): clusters as rows — which procedures recur, across which projects, trend over time. This IS the graph (trace→cluster→project edges) rendered as a worklist.
Blast radius (kept from Impact view): entity → dependents. Wired into compilation: a skill that touches entity X lists X's dependents as pre-flight warnings in its compiled script.
The raw node browser stays deleted.

5. What gets deleted#

Reflector's per-project playbook drafting (reflect.go synthesis path) — trace extraction replaces it. The structural gate (validateDraftPlaybook) survives as the floor for ③'s output.
'Automate:' title-prefix heuristic — recurrence clustering IS the automation detector now.
The skills/playbooks two-column workbench.

6. Implementation order (each step green + committed)#

procedure_traces table + trace extractor (worker task trace, prompt + strict schema + outcome heuristics) hooked into session-end (journaler), embedding via memory embedder.
Clustering job in the consolidation engine (pgvector neighbour query, union-find; no LLM).
Candidate synthesis (worker task synthesize) for qualifying clusters; structural gate floor; saves_minutes computation.
Compilation: script generation with dry-run + review; custom-task emission; SKILL.md fallback. skill_outcomes + effectiveness.
Workbench UI (decision queue) + recurrence map + blast-radius pre-flight; retire/recompile proposals.
Backfill: run trace extraction over existing session_logs history so the queue is warm on day one.

7. Cost control#

Stage ① uses the capture-grade cheap model per session end (one call); ②⑤ are SQL/embedding-only; ③④ use strong models but fire only for clusters with proven recurrence — by construction, the expensive calls happen exactly where value is already demonstrated.