Open-source agent skills · Codex · Claude Code

A friendly spellbook for evidence-driven agents.

Fairy Tale reads public Fable/Mythos-class reports like old fables: not to copy the magic, but to write down the repeatable wisdom as skills, validation gates, adapters, and sample results.

License
Apache-2.0
Package
Codex + Claude Code plugins
Rule
Public reports only
An open fable book with a feather pen, key, lantern, and paper birds rising from the page
Stories become checks. Checks become workflows.

Not a model. Not a bypass. A readable method.

The project keeps the fairy-tale feeling, but the product promise is practical: separate melody from myth, preserve provenance, and turn useful agent behavior into repeatable workflow artifacts.

The Spellbook

What Fairy Tale gives you

A small set of friendly but disciplined artifacts for making agent work less mysterious and more reproducible.

quill

Skills that write the method down

Canonical skills for long coding, benchmark feedback, legal closure sweeps, evidence maps, validation gates, and bounded autonomy.

key

Plugins that open the right door

Codex and Claude Code plugin packages keep the workflow close to the tools where developers already work.

lantern

Checks that keep the light on

Residency checks, feedback governance, and benchmark ledgers make it harder to lose the process halfway through a long run.

mirror

Samples that show the reflection

Side-by-side outputs show how the same task behaves with and without Fairy Tale feedback across legal, finance, security, biology, spatial, and narrative tasks.

Measured, Not Mythologized

Benchmark signals stay in separate jars

Fairy Tale treats public scores, local baselines, and local Fairy Tale measurements as different kinds of evidence.

Pass rate by domain

Bars are local reproducible measurements with 95% Wilson CIs noted in the README. Fable/Mythos values are image-reported; HLE has no comparable public Fable row.

Domain Benchmark Fable / Mythos GPT-5.5 + Fairy Tale Delta CI / note
Agentic coding SWE-Bench Pro, n=20 80.3% 58.6% 55.0% -3.6 pp Wilson 34.2–74.2%
Biology BioMysteryBench-preview, n=5 46.1 / 83.9% 60.0% 80.0% +20.0 pp Wilson 37.6–96.4%
Cybersecurity ExploitBench v8 ladder, n=6 78.0% Cap% 34.0% Cap% 1.33 avg · 4/6 + reference only Ladder score, defensive
Legal Harvey LAB-compatible, n=100 13.3% 2.1% 11.0% +8.9 pp Wilson 6.25–18.63%, p=8.9e-6
Legal Feedback Retry — n=15 prior misses

Same model, effort, judge, and task IDs. Only the feedback skill changed.

All-pass rate 0.0%20.0% +20.0 pp
Criterion pass rate 83.21%90.61% +7.40 pp
One-miss failures 105 -5
Large collapses < 70% 54 -1
Quick Start

Choose your doorway

Install the plugin package when your agent supports it, or install only the canonical skills into a compatible skills directory.

/plugin marketplace add bonginkan/fairy_tale
/plugin install fairy-tale@fairy-tale-marketplace
Safety Boundaries

A lantern, not a lockpick

Fairy Tale uses public official information and public user reports as workflow evidence. It does not attempt to access restricted models, bypass safeguards, or turn security work into weaponization.

  • Preserve provenance for research claims.
  • Keep defensive security work authorized and non-weaponized.
  • Set budgets before broad or long-running agent work.
  • Validate before claiming workflow improvement.