Open-source agent skills · Codex · Claude Code

A friendly spellbook for evidence-driven agents.

Fairy Tale reads public Fable/Mythos-class reports like old fables: not to copy the magic, but to write down the repeatable wisdom as skills, validation gates, adapters, and sample results.

Install the spellbook Read sample outputs

License: Apache-2.0
Package: Codex + Claude Code plugins
Rule: Public reports only

An open fable book with a feather pen, key, lantern, and paper birds rising from the page — Stories become checks. Checks become workflows.

Not a model. Not a bypass. A readable method.

The project keeps the fairy-tale feeling, but the product promise is practical: separate melody from myth, preserve provenance, and turn useful agent behavior into repeatable workflow artifacts.

The Spellbook

What Fairy Tale gives you

A small set of friendly but disciplined artifacts for making agent work less mysterious and more reproducible.

quill

Skills that write the method down

Canonical skills for long coding, benchmark feedback, legal closure sweeps, evidence maps, validation gates, and bounded autonomy.

key

Plugins that open the right door

Codex and Claude Code plugin packages keep the workflow close to the tools where developers already work.

lantern

Checks that keep the light on

Residency checks, feedback governance, and benchmark ledgers make it harder to lose the process halfway through a long run.

mirror

Samples that show the reflection

Side-by-side outputs show how the same task behaves with and without Fairy Tale feedback across legal, finance, security, biology, spatial, and narrative tasks.

Measured, Not Mythologized

Benchmark signals stay in separate jars

Fairy Tale treats public scores, local baselines, and local Fairy Tale measurements as different kinds of evidence.

Pass rate by domain

Legal
n=100, Harvey LAB-compatible

13.3%

2.1%

11.0%

Biology
n=5, BioMysteryBench-preview

83.9%

60.0%

80.0%

Agentic coding
n=20, SWE-Bench Pro

80.3%

58.6%

55.0%

HLE
n=100, random sample

—

35.0%

51.0%

Bars are local reproducible measurements with 95% Wilson CIs noted in the README. Fable/Mythos values are image-reported; HLE has no comparable public Fable row.

Domain	Benchmark	Fable / Mythos	GPT-5.5	+ Fairy Tale	Delta	CI / note
Agentic coding	SWE-Bench Pro, n=20	80.3%	58.6%	55.0%	-3.6 pp	Wilson 34.2–74.2%
Biology	BioMysteryBench-preview, n=5	46.1 / 83.9%	60.0%	80.0%	+20.0 pp	Wilson 37.6–96.4%
Cybersecurity	ExploitBench v8 ladder, n=6	78.0% Cap%	34.0% Cap%	1.33 avg · 4/6 +	reference only	Ladder score, defensive
Legal	Harvey LAB-compatible, n=100	13.3%	2.1%	11.0%	+8.9 pp	Wilson 6.25–18.63%, p=8.9e-6

Legal Feedback Retry — n=15 prior misses

Same model, effort, judge, and task IDs. Only the feedback skill changed.

All-pass rate 0.0% → 20.0% +20.0 pp

Criterion pass rate 83.21% → 90.61% +7.40 pp

One-miss failures 10 → 5 -5

Large collapses < 70% 5 → 4 -1

Sample Outputs

Open the fable, inspect the lesson

Each sample is a concrete comparison output, not a slogan. They are useful as product proof and as design material for future workflow improvements.

01 Advanced legal comparison Redline matrices, diligence questions, hard stops 02 Finance document comparison Board memo corrections and follow-up analysis 03 Bio / health AI safety Evidence gates and safety-aware classification 04 Spatial / 3D comparison Visual reconstruction and validation discipline 05 Agentic coding security Patch slices, tests, and rollout cautions 06 Cybersecurity comparison Defensive findings, safe evidence, detection coverage 07 Narrative expression Constraint adherence, motif discipline, resonance

Quick Start

Choose your doorway

Install the plugin package when your agent supports it, or install only the canonical skills into a compatible skills directory.

/plugin marketplace add bonginkan/fairy_tale
/plugin install fairy-tale@fairy-tale-marketplace

codex plugin marketplace add bonginkan/fairy_tale

mkdir -p "$HOME/.codex/skills"
curl -fsSL https://raw.githubusercontent.com/bonginkan/fairy_tale/main/install.sh | sh -s -- --agent codex

Safety Boundaries

A lantern, not a lockpick

Fairy Tale uses public official information and public user reports as workflow evidence. It does not attempt to access restricted models, bypass safeguards, or turn security work into weaponization.

Preserve provenance for research claims.
Keep defensive security work authorized and non-weaponized.
Set budgets before broad or long-running agent work.
Validate before claiming workflow improvement.