What Makes a Good AI Skill? Procedure Over Template

Skills are the layer that turns a capable model into a reliable worker. The gap between a skill that ships and a skill that breaks is not about length, polish, or clever phrasing.

It comes down to whether the skill teaches procedure or encodes templates.

The Distinction That Matters

A template tells the agent what to say. A procedure tells the agent how to decide what to say.

This sounds like a small difference. It is not. A template-based skill holds up for as long as the task looks like the example it was trained on. The moment the details change, the template breaks. A procedural skill holds up across variants because it teaches the agent how to recognise the situation first, then shape the response to fit.

Skills do not guarantee execution. The model still decides whether to follow the instructions, which is why they are structured guidance, not deterministic automation. Medium

That is exactly why procedural design matters. A good skill increases the probability that the model makes the right call, even when the surface details rotate.

What Procedural Skills Actually Teach

A well-designed skill teaches four things. How to recognise the situation. What steps to take. What the output should look like when it is done well. What failure modes to avoid.

Each of these is generalised. None of them is specific to a single scenario.

Take customer email drafting as an example. A template-driven skill might store five pre-written email templates for common complaint types. The moment a customer writes something that does not match one of the five, the skill produces either a forced-fit response or nothing useful.

A procedural skill handles the same task differently. It teaches the agent to identify the situation type first. Is this a relationship repair? A reasonable request that can be granted? A request that has to be declined? A customer misunderstanding? Each situation has a different shape. The skill teaches the agent to acknowledge the specific quantified impact, take clear action, set expectations, and avoid defensive language, regardless of which exact scenario the customer raised.

The procedure generalises. The template does not.

Why Templates Fail in Production

Enterprise deployments have surfaced the limits of template-based thinking consistently. Research on AI agent failure patterns repeatedly shows that agents which perform well in demos collapse when they hit the variance of real business inputs.

The reason is structural. Real tasks do not arrive in clean categories. Customer complaints contain multiple issues at once. Sales enquiries shift mid-conversation. Internal requests use terminology that does not match any documentation.

A template-based skill tries to match each variant to a stored response. A procedural skill tells the agent how to handle the class of situation, whatever the specific details.

The Anatomy of a Good Skill

Good skills tend to share structural features, regardless of the domain they cover.

They name the failure modes explicitly. Strong skills warn the agent away from specific traps. Over-apologising. Hiding behind policy jargon. Blame-shifting. Asking for a survey mid-resolution. Naming what not to do is often more useful than describing what to do.

They scaffold without dictating. A good skill tells the agent what sections or decisions the output needs, without prescribing exact wording. This leaves room for the model to adapt to context while ensuring structural consistency.

They teach recognition before action. The first step in most strong skills is not "do this." It is "identify what kind of situation this is." The response shape follows from the recognition, not from a script.

They use progressive disclosure. Well-designed skills keep metadata lightweight so it always sits in context, with detailed instructions loaded only when the task triggers the skill. This keeps the context window clean and the procedural knowledge available on demand.

The Test of a Good Skill

The strongest test of a skill is simple. Would the output hold up if you handed the agent to a capable new team member and asked them to ship?

A skill that captures how experienced operators approach the task passes this test. A skill built around templates fails it the moment the details change.

This is also why skills written to memorise specific scenarios are a dead end. They rate well on the exact cases they were built for and collapse on variants. Integrity checks on skill submissions increasingly look for scenario-specific encoding as a disqualifying signal, because a skill that memorises test cases produces misleading quality ratings.

The useful question is not "did the skill produce the right answer on this example." It is "would the skill produce the right answer on a variant the author never saw."

Where Skills Come From

Good skills are rarely written in one sitting. They tend to emerge from iteration, usually starting with someone capable performing the task manually, then encoding the decisions they make along the way.

Practical experience from teams building skills in production suggests most useful skills go through three to five revisions before they produce consistent output. Each revision narrows the gap between what the author meant and what the agent actually does.

That is procedural work. The author learns what the agent gets wrong, then adds explicit guidance to prevent it. Over time, the skill stops being a draft and starts behaving like institutional knowledge the agent can reliably call on.

Skills as Institutional Knowledge

The commercial implication is significant. A procedural skill captures how an experienced operator approaches a task in a form the agent can execute.

That is organisational IP. Skills turn know-how into something portable, version-controlled, and shareable. The best skills carry the judgement of the best operators, applied consistently across every variant of the task.

This is the layer that sundae_bar is building at scale. The generalist agent being trained on SN121 pulls from a growing library of skills, each designed to teach procedure rather than memorise scenarios. Every challenge on the subnet adds another capability to the agent's job description, and each skill has to pass the same test: does it hold up when the details change.

The Takeaway

A skill that memorises specific scenarios will not survive contact with real business work.

A skill that teaches procedure, names failure modes, scaffolds without dictating, and captures how good operators actually think will.

The difference is what separates a demo from a digital worker.