Skip to main content

Writing Effective Skills: Best Practices for Agent Onboarding

How to write skills that trigger reliably, load efficiently, and stay maintainable

Skills are one of the cleanest ways to turn a general-purpose agent into a specialist. Instead of repeating instructions in every conversation, you package a reusable workflow into a folder with a SKILL.md file and optional scripts, references, and assets.

That simple packaging model creates a temptation: treat a skill like a miniature wiki. In practice, that usually makes the skill worse. The best skills are not documentation dumps. They are compact onboarding guides for an agent: enough information to route the task correctly, enough instruction to execute well, and enough structure to avoid wasting context.

In a survey of the Agent Skills specification, Hugging Face's docs, and public skills repositories, the pattern is remarkably consistent. Good skills are narrow, procedural, and aggressively organized for progressive disclosure[1, 2, 3].

This post summarizes the design rules that matter most, with concrete examples and the runtime behavior that makes those rules worth following.

The Right Mental Model #

A skill is not "extra prompt text." It is a three-layer system:

  1. Metadata for routing: the name and description tell the agent when the skill is relevant.
  2. Instructions for execution: the body of SKILL.md contains the workflow and guardrails.
  3. Resources for depth: scripts, references, and assets are loaded only when needed.

This staged loading model is the entire point. In many skill systems, agents pay a small routing cost for all installed skills, then load full instructions on activation, then pull supporting files only when needed[1]. The exact loading behavior still varies by runtime and mode, so always verify platform-specific semantics before optimizing for token budget.

Once you adopt that mental model, most best practices become obvious. If metadata does routing, the description must be precise. If SKILL.md is loaded whole, it must stay lean. If scripts are available, repeated deterministic logic should move out of prose and into code.

How Skills Actually Reach the Model #

Best practices make more sense once you look at the actual loading path.

Across the Agent Skills ecosystem, the core pattern is consistent. A skill exposes a small always-visible routing surface through name and description. If the runtime decides the skill is relevant, it loads the SKILL.md body. Then, if needed, it pulls scripts/, references/, and assets/ during execution.

That loading path is the reason the standard advice works:

  1. The router only sees a thin slice of the skill. If name and description are vague, the skill either never triggers or triggers at the wrong times.
  2. Activation is expensive. If the whole SKILL.md body is injected after activation, every unnecessary paragraph becomes recurring context overhead.
  3. Deferred resources are a feature, not just a folder convention. Large examples, schemas, and API details belong in references because they should stay out of context until the workflow actually needs them.
  4. Deterministic work should live outside the prompt. If a repeatable transform can be done by a script, moving it out of prose reduces both ambiguity and token cost.

This is why skill-writing feels different from normal documentation. You are not writing reference material for a patient human reader. You are designing a staged context package: route with metadata, execute with compact instructions, and defer everything else until needed.

It also helps explain what a skill is not. Persistent instruction files like AGENTS.md, CLAUDE.md, and GEMINI.md are closer to always-loaded project context than to on-demand skills[4, 5, 6, 7]. Those files should optimize for broad project guidance; skills should optimize for narrow routing and task execution.

Best Practice 1: Make the Skill Easy to Route #

The official guidance is clear: effective skills solve a specific, repeatable task and focus on one workflow rather than trying to do everything[1]. The same pattern appears across public skills repositories[3].

That principle matters for two reasons.

First, narrow skills trigger more reliably. If a skill claims to handle "data analysis, dashboarding, ETL, schema design, and notebooks," its description becomes vague, and the agent has a harder time knowing when it applies.

Second, narrow skills age better. A skill for "building Gradio demos" can evolve independently from a skill for "deploying Hugging Face Jobs." When one workflow changes, you update one skill instead of a giant mixed abstraction.

Bad scope

---
name: ml-workflows
description: Helps with machine learning tasks.
---

This is too broad to route well and too vague to maintain.

Better scope

---
name: hf-jobs
description: Run Python or Docker workloads on Hugging Face Jobs. Use when the user wants cloud CPUs/GPUs without local setup, including batch inference, experiments, or scheduled jobs.
---

The second version names the platform, the action, and the trigger conditions.

The description field is not a tagline. It is the dispatch rule. Both the spec and production docs emphasize that the description should explain what the skill does and when to use it.

The easiest way to write a good description is to force it into two parts:

  • Capability: what the skill helps the agent do
  • Trigger cues: what user requests should cause it to fire

For example:

description: Extract text and tables from PDF files, fill forms, and merge documents. Use when working with PDFs, forms, or document extraction.

As a practical rule, descriptions should be literal, not clever. Avoid product copy. Include the nouns and verbs a user will actually say. If the user says "merge PDFs" or "fill this form," the description should contain those phrases or close equivalents.

Also account for platform constraints. In constrained environments, concise and explicit beats comprehensive. Claude Code, for example, truncates skill descriptions in its listing to reduce context usage, which makes front-loaded wording even more important[8]. Description text is routing-critical, so optimize for signal density over prose style.

This is also the part you should test most aggressively. A skill can have a perfect workflow and still fail in practice because it never triggers. Official guides recommend testing not just the workflow, but whether the description actually causes invocation[1].

Best Practice 2: Keep Activated Context Small #

Multiple ecosystems recommend keeping SKILL.md lean and moving detailed material into separate files[1]. That guidance is easy to ignore, but it is one of the highest-leverage rules.

Why? Because once a skill is activated, the agent typically loads the whole SKILL.md. Every extra paragraph becomes permanent cognitive and token overhead for the duration of that task.

The best SKILL.md files are not encyclopedias. They contain just enough: the default workflow, key directives or guardrails, and references to scripts and supporting documents. The important nuance is that routing should live in frontmatter, not in a "when to use" section in the body. The body is for execution once the skill is already active.

This is enough:

# Dataset SQL

## Workflow
1. Inspect schema first with `scripts/sql_manager.py describe`.
2. Run small sample queries before writing transformations.
3. Push derived datasets only after validating row counts and column names.

## Key directives
- Validate row counts before export.
- Do not overwrite the source dataset.

## References
- SQL syntax examples: [references/sql_patterns.md](references/sql_patterns.md)
- Export rules: [references/export.md](references/export.md)

What usually bloats SKILL.md unnecessarily? Full API references, long installation guides, exhaustive edge-case catalogs, multiple unrelated workflows, and duplicated examples that already exist in scripts or references.

The skills architecture is explicitly built around progressive disclosure: metadata is loaded for routing, instructions are typically loaded on activation, and supporting resources are loaded as needed. So once a skill is active, every extra paragraph competes with the actual task for attention.

Move deterministic work to scripts. If the same task requires the agent to repeatedly rewrite the same parsing logic, formatting transform, or validation routine, that logic should probably live in scripts/. Use prose for judgment. Use scripts for determinism. Prose works well for deciding which workflow applies, explaining tradeoffs, listing guardrails, and choosing between modes. Scripts work well for parsing file formats, validating schemas, extracting frontmatter, generating machine-readable output, and performing repeatable API calls.

Split files around decision points, not just for organization. A good author does not just split files mechanically. They split them around workflow branches.

For example:

pdf-processing/
├── SKILL.md
├── references/
│   ├── forms.md
│   ├── OCR.md
│   └── tables.md
└── scripts/
    └── extract_tables.py

This structure is good because the agent can ask:

  • Is this a forms task?
  • Is this an OCR task?
  • Is this a table extraction task?

and load exactly one reference file instead of an undifferentiated 3,000-line manual.

A bad split is one that mirrors author convenience instead of task selection, for example part1.md, part2.md, advanced.md, or nested chains of references the agent has to discover indirectly.

The spec's guidance to keep file references shallow is correct. SKILL.md should act like a table of contents the agent can navigate directly. The point of the file split is not aesthetics—it is to keep the default path compact and defer depth until the workflow actually needs it.

Best Practice 3: Teach the Happy Path and Constrain Failure #

Examples matter. The docs explicitly recommend them[1]. But too many skills confuse "example" with "full tutorial."

The purpose of an example is to teach the agent the shape of success: what the input looks like, what the expected output looks like, and what sequence of steps is preferred. That usually means one or two sharp examples, not ten variations.

For instance, this is useful:

## Example
If the user asks to "query a Hugging Face dataset by SQL", first:
1. inspect the schema
2. run a small sample query
3. only then run the full transformation

Avoid pages of sample invocations with tiny differences, examples duplicated from scripts, or examples that introduce new rules not stated elsewhere. Examples should compress understanding, not expand the skill into a textbook.

The Agent Skills integration guidance explicitly highlights security concerns around script execution and recommends sandboxing, allowlisting, confirmation for dangerous operations, and logging[9]. The same principle applies broadly: restrict capabilities early, not after failure.

That same principle should show up in skill authoring. If a task is risky, the skill should say so plainly: validate before pushing, ask before destructive operations, never hardcode secrets, prefer read-only inspection first, and require a schema or preview before mutation. These are not stylistic flourishes—they are operational constraints. A good skill tells the agent where improvisation is dangerous.

And once the guardrails are written, test them under real prompts, not just by reading the file. A practical check is:

Test Question
Positive trigger Does the skill activate for the requests it should handle?
Negative trigger Does it stay out of the way for requests it should not handle?
Happy path Can it complete the default workflow end-to-end?
Failure path Does it behave sensibly when inputs are missing, malformed, or risky?
Resource path Do referenced files and scripts actually exist and resolve correctly?

Validation tools help with structure, but they do not tell you whether the skill is well-routed[9]. That still requires real prompts.

A Practical Template #

If you want a default pattern that works well most of the time, start here:

---
name: skill-name
description: What this skill does, plus the kinds of requests that should trigger it.
---

# Skill Title

## Default workflow
1. first action
2. second action
3. validation or handoff

## Key directives
- important constraint
- important preference
- important safety rule

## Example
If the user asks for X, first do Y, then validate with Z before proceeding.

## References
- [references/mode-a.md](references/mode-a.md) when the task is X
- [references/mode-b.md](references/mode-b.md) when the task is Y

## Scripts
- `scripts/do_the_thing.py` for deterministic execution

This is enough structure for most skills. Notice what is missing: a long motivational preface, repeated trigger text, and embedded reference dumps. The example section is optional, but when included it should teach the preferred path, not restate the whole manual. The point is not to follow a rigid template forever. The point is to preserve the separation of concerns: routing in frontmatter, workflow in the body, and deep resources outside the core instructions.

The Core Principle #

The biggest mistake in skill authoring is assuming the agent needs more explanation. Usually it needs better structure.

A good skill makes the right task easy to recognize, keeps the activated instructions compact, defers depth and deterministic work out of the prompt, and teaches the happy path while constraining failure. That is why the best skills feel less like manuals and more like runbooks—optimized for invocation, execution, context efficiency, and maintenance at the same time.

If you remember only one rule, remember this: write skills as onboarding guides for specialists, not as documentation for humans. Once you do that, the right scope, file layout, examples, and scripts follow naturally.

References #

[1]
Agent Skills, “Specification.” Accessed: Mar. 07, 2026. [Online]. Available: https://agentskills.io/specification
[2]
Hugging Face Hub Docs, “Agent Skills.” Accessed: Mar. 07, 2026. [Online]. Available: https://huggingface.co/docs/hub/en/agents-skills
[3]
Hugging Face, “huggingface/skills.” Accessed: Mar. 07, 2026. [Online]. Available: https://github.com/huggingface/skills
[4]
OpenAI, “How OpenAI uses Codex.” Accessed: Mar. 29, 2026. [Online]. Available: https://openai.com/business/guides-and-resources/how-openai-uses-codex/
[5]
Agentic AI Foundation, “AGENTS.md.” Accessed: Mar. 29, 2026. [Online]. Available: https://agents.md/
[6]
Claude Code Docs, “Claude Code settings.” Accessed: Mar. 29, 2026. [Online]. Available: https://code.claude.com/docs/en/settings
[7]
Gemini CLI Docs, “Gemini CLI configuration.” Accessed: Mar. 29, 2026. [Online]. Available: https://geminicli.com/docs/reference/configuration/
[8]
Claude Code Docs, “Extend Claude with skills.” Accessed: Mar. 29, 2026. [Online]. Available: https://code.claude.com/docs/en/skills
[9]
Agent Skills, “Integrate skills into your agent.” Accessed: Mar. 07, 2026. [Online]. Available: https://agentskills.io/integrate-skills

Citation #

If you found this useful, please cite this blog as:

Non-linear AI. (Mar 2026). Writing Effective Skills: Best Practices for Agent Onboarding. non-linear.ai. https://non-linear.ai/blog/skills-best-practices/

or

@article{ai2026skillsbestpractices,
  title   = {Writing Effective Skills: Best Practices for Agent Onboarding},
  author  = {{Non-linear AI}},
  journal = {non-linear.ai},
  year    = {2026},
  month   = {Mar},
  url     = {https://non-linear.ai/blog/skills-best-practices/}
}