Skills that write the method down
Canonical skills for long coding, benchmark feedback, legal closure sweeps, evidence maps, validation gates, and bounded autonomy.
Open-source agent skills · Codex · Claude Code
Fairy Tale reads public Fable/Mythos-class reports like old fables: not to copy the magic, but to write down the repeatable wisdom as skills, validation gates, adapters, and sample results.
The project keeps the fairy-tale feeling, but the product promise is practical: separate melody from myth, preserve provenance, and turn useful agent behavior into repeatable workflow artifacts.
A small set of friendly but disciplined artifacts for making agent work less mysterious and more reproducible.
Canonical skills for long coding, benchmark feedback, legal closure sweeps, evidence maps, validation gates, and bounded autonomy.
Codex and Claude Code plugin packages keep the workflow close to the tools where developers already work.
Residency checks, feedback governance, and benchmark ledgers make it harder to lose the process halfway through a long run.
Side-by-side outputs show how the same task behaves with and without Fairy Tale feedback across legal, finance, security, biology, spatial, and narrative tasks.
Fairy Tale treats public scores, local baselines, and local Fairy Tale measurements as different kinds of evidence.
Bars are local reproducible measurements with 95% Wilson CIs noted in the README. Fable/Mythos values are image-reported; HLE has no comparable public Fable row.
| Domain | Benchmark | Fable / Mythos | GPT-5.5 | + Fairy Tale | Delta | CI / note |
|---|---|---|---|---|---|---|
| Agentic coding | SWE-Bench Pro, n=20 | 80.3% | 58.6% | 55.0% | -3.6 pp | Wilson 34.2–74.2% |
| Biology | BioMysteryBench-preview, n=5 | 46.1 / 83.9% | 60.0% | 80.0% | +20.0 pp | Wilson 37.6–96.4% |
| Cybersecurity | ExploitBench v8 ladder, n=6 | 78.0% Cap% | 34.0% Cap% | 1.33 avg · 4/6 + | reference only | Ladder score, defensive |
| Legal | Harvey LAB-compatible, n=100 | 13.3% | 2.1% | 11.0% | +8.9 pp | Wilson 6.25–18.63%, p=8.9e-6 |
Same model, effort, judge, and task IDs. Only the feedback skill changed.
Each sample is a concrete comparison output, not a slogan. They are useful as product proof and as design material for future workflow improvements.
Install the plugin package when your agent supports it, or install only the canonical skills into a compatible skills directory.
/plugin marketplace add bonginkan/fairy_tale
/plugin install fairy-tale@fairy-tale-marketplace
codex plugin marketplace add bonginkan/fairy_tale
mkdir -p "$HOME/.codex/skills"
curl -fsSL https://raw.githubusercontent.com/bonginkan/fairy_tale/main/install.sh | sh -s -- --agent codex
Fairy Tale uses public official information and public user reports as workflow evidence. It does not attempt to access restricted models, bypass safeguards, or turn security work into weaponization.