Building GitHub Copilot Plugins: First Principles, Real Submissions, Honest Lessons

Why this post exists

We built two GitHub Copilot plugins. We submitted them to the Awesome Copilot community marketplace. Both were rejected. This post is about why that happened and what it taught us about what a Copilot plugin is actually for.

The two plugins are orqit and respit. orqit orchestrates PLANNER → BUILDER → REVIEWER workflows (named agent roles — the architecture section explains what that means) for solo founders running multi-repo product pipelines. respit provides personal-sustainability skills — cognitive state monitoring, scope-creep intervention, momentum tracking — for the same context. Both are in active use internally at Foculoom. Neither belongs in a public marketplace.

The rejection feedback was fair. The maintainer gave clear, specific reasons, and the automated intake system worked exactly as designed. Aaron Powell's post on automating the Awesome Copilot intake used orqit's submission as a showcase example of how the iterative feedback loop works — we went through multiple /rerun-intake cycles without any human maintainer involvement before the final manual review. The process is genuinely impressive engineering.

If you want a tutorial that ends with "and then it got accepted," this is not that post. But if you want to understand how Copilot plugins are structured, what the submission process looks like from the inside, and where the line is between useful community tooling and useful internal tooling, read on.

Prerequisites: this post assumes you are comfortable with git, Markdown, and GitHub pull requests. No prior experience with Copilot extensions or Model Context Protocol is assumed — both are explained here.

What a Copilot plugin actually is

GitHub Copilot's extensibility model has three distinct layers. Understanding each one is necessary before you can make good decisions about what to put where.

Instructions

The base layer is .github/copilot-instructions.md — a Markdown file at the root of any repository that gets injected into every Copilot chat context in that repo. Think of it as a README written for the AI rather than for humans. Where a human README explains how to get started, a copilot-instructions.md explains the workspace conventions, naming rules, agent routing logic, and project-specific context that you want the model to carry into every interaction.

Instructions are static and scoped to the repository. They don't execute anything. They're the ambient knowledge layer.

Skills and agent profiles

The next layer is .github/skills/ and .github/agents/. Skills are reusable procedural playbooks — Markdown files with YAML front matter that describe a step-by-step workflow. They're invoked by name in chat (/dev-session, /ship-issue, /status). Agent profiles define bounded personas — a BUILDER that implements one scoped issue, a PLANNER that triages ideas and writes specs, a REVIEWER that evaluates quality. Each agent profile describes what the role does, what it doesn't do, and what signals should route work elsewhere.

The key distinction from instructions: skills and agents are invoked. Instructions are always present; skills and agents are pulled in on demand.

MCP tools

The third layer is Model Context Protocol: external tool endpoints that the model can call at runtime. Registered in ~/.copilot/mcp-config.json, these extend what the model can do rather than what it knows. An image-generation MCP gives the model the ability to call an external image-generation service. A database MCP gives it read access to a live schema. A calendar MCP lets it query real availability.

MCP tools are the only layer that creates runtime side effects outside the conversation. Everything else — instructions, skills, agents — lives in context. MCP tools reach out.

How the layers compose

The composition model is straightforward in principle: instructions set the ambient context, skills define repeatable procedures, agent profiles define bounded roles, MCP tools extend runtime capability. In practice the boundary between "this belongs in instructions" and "this belongs in a skill" requires some judgment, and getting it wrong creates either bloated context on every request or skills that are too fragile to invoke reliably.

Building the plugin components

Here is what each layer looks like in practice, with synthetic examples that represent the structure without exposing internal files.

Instructions

A good copilot-instructions.md has three sections: what this project is, how it is structured, and how to route work. The first two are orientation for the model. The third is operational — it tells the model which agent role to use for which class of task.

# Acme App — Copilot workspace instructions

        ## Project overview
        Acme App is a SwiftUI iOS app for managing household tasks.
        Source lives in `Sources/AcmeApp/`. Tests in `Tests/`. CI uses xcodebuild.

        ## Conventions
        - Use `feat(scope):` Conventional Commits.
        - All PRs close a GitHub issue with `Closes acme/acme-app#N`.
        - Never push directly to `main`.

        ## Agent routing
        - New feature ideas → invoke /planner
        - Implementation tasks → invoke /builder
        - Quality review → invoke /reviewer
        - Unsure → ask before acting

The file above would fit in an afternoon's reading. That's the right size. If your copilot-instructions.md is longer than a few hundred lines, you've probably mixed conventions with procedures, and the procedures belong in skills.

SKILL.md structure

A skill file opens with a YAML front-matter block (the section between --- markers at the top of the file), then three prose sections: ## When to use, ## Steps, and optionally ## Fallback. The front matter carries metadata — at minimum a tier field that controls model routing.

---
        tier: standard
        ---
        # feature-flag-check

        ## When to use
        Run before enabling a feature in production. Use when a PR touches
        any file under `src/flags/` or references a flag key.

        ## Steps
        1. List current flag state:
           `gh api repos/{owner}/{repo}/contents/config/flags.json`
        2. Confirm the flag key exists and is set to `false` in production.
        3. If the flag does not exist, stop and report — do not create flags inline.
        4. If the flag is `true` already, skip and note in PR body.
        5. Add a checklist item to the PR body: `- [ ] Flag enabled: <key>`.

        ## Fallback
        If the flags API returns 404, the flags config may have moved.
        Check `config/` for an alternative path and update this skill.

The prose-based format is deliberate. A skill written as prose can be read and audited by a human in 30 seconds. A skill written as a shell script cannot. The model executes prose instructions; it does not need a runnable script to follow a procedure.

Agent profile structure

An agent profile describes a role's persona, capabilities, and explicit limits. The  HTML comment is a machine-readable marker the routing logic can parse.

<!-- tier: standard -->

        You are CODE-REVIEWER.

        ## Role
        Review incoming PRs for bugs, logic errors, and missing test coverage.
        Do not review style, formatting, or naming conventions.
        Do not modify code.

        ## What you do
        - Read the diff.
        - Report BUGS / MISSING_TESTS / SHIP_RECOMMENDATION.
        - For each bug: file path, line range, description of the defect.

        ## What you do not do
        - Rewrite the code.
        - Comment on style choices.
        - Review files outside the diff scope.

        ## Stop conditions
        If the diff is >1,000 lines, stop and ask the founder to split the PR.

The stop conditions matter. An agent profile without explicit stop conditions is an agent that will keep going when it should pause.

MCP registration

A partial mcp-config.json entry for a hypothetical image-generation server:

{
          "servers": {
            "image-gen": {
              "type": "http",
              "url": "https://api.example.com/mcp",
              "tools": ["generate_image", "upscale_image"],
              "auth": {
                "type": "bearer",
                "token_env": "IMAGE_GEN_API_KEY"
              }
            }
          }
        }

The tools array controls which endpoints are exposed to the model. Registering fewer tools means fewer ways for the model to make unintended calls. Start minimal.

Where things should live

The decision tree is roughly:

Stable conventions (naming rules, commit format, PR structure) → instructions
Repeatable procedures (how to run a QA checklist, how to open a PR, how to triage a new idea) → skills
Bounded roles (who is responsible for what, what each role should not do) → agent profiles
Runtime side effects (calling an external API, generating an image, querying a database) → MCP tools

Mixing layers is the most common mistake. If your instructions file contains a 20-step deployment procedure, move it to a skill. If your skill tries to define a persona, move that part to an agent profile. The separation makes each layer easier to audit and update independently.

The Awesome Copilot marketplace and automated intake

Awesome Copilot is a community-curated list of Copilot extensions, skills, agent profiles, and MCP servers. It's a GitHub repository maintained by the Copilot team, and it accepts community submissions via pull requests and, more recently, a structured issue-based intake flow for external plugins.

The external plugin intake — built and documented by Aaron Powell — is a GitHub Issues workflow backed by GitHub Actions. The flow: open an issue using a structured form template, an Action fires and validates the submission against a schema, a bot comments with the parsed JSON payload and any validation errors, you fix issues and trigger a rerun with /rerun-intake, and once the automated validation passes, a human maintainer does a final review and either approves or rejects.

What's well-designed about this is the iterative feedback loop. The bot is specific. If your ref doesn't resolve to a real SHA, it tells you that and nothing else. If your description is missing, it tells you the field name. You can iterate on your own timeline without waiting for a human to respond.

Aaron Powell's post on building this system specifically cited orqit's submission (issue #1813) as an example of the automation working as intended: "That whole back-and-forth happened without any human maintainer involvement." He then notes: "Ultimately, the plugin was rejected after I did a manual review, but I wanted to highlight the process, not the outcome."

That is a fair and accurate summary of what happened.

The security model is also worth noting. The intake pins submissions to immutable SHAs, not branches. SHAs identify a single fixed snapshot of the code — a branch can be updated to point to new commits at any time. Pinning to a SHA means the approved listing cannot change out from under reviewers after submission. Approved external plugins are flagged for re-review after six months. The system treats external plugin listings as a supply chain problem — because they are — and the design reflects that.

Foculoom's real experience: orqit and respit

orqit

orqit is a Copilot CLI plugin that orchestrates PLANNER → BUILDER → REVIEWER workflows. It provides 12 portable workflow skills (dev-session, ship-issue, status, risk-review, fallback-mode, model-audit, usage, qa-validate, and others) and 4 named agent roles (conductor, planner, builder, reviewer). The goal was to reduce manual context-switching for a solo founder running multiple product repos simultaneously, while encoding model-routing rules, founder gate enforcement, and cost-tier step-down logic.

We submitted version 1.0.2 after an initial round of automated feedback caught compatibility issues — a pre-tool hook that fired a cleanup script not present on a fresh install, hard dependencies on MCP tools not available outside Foculoom's environment. We fixed those. The bot validated the new version. Then a maintainer did the manual review.

The rejection:

> Upon reviewing the files in the plugin repo, I'm going to be rejecting this plugin. The agents in the plugin are not bringing meaningful uplifts in capabilities, and also reimplementing features such as /plan which can result in a degraded experience. Similarly, with the recent inclusion of server-side model routing in auto mode, having model routing done on the client side using the agent is inefficient and error prone.

This is correct. GitHub Copilot now has a built-in plan mode. Server-side Auto routing selects models dynamically without client-side logic. When orqit was designed in early 2025, both of these were gaps. By the time we submitted, they were native features. A plugin that reimplements a native feature doesn't add capability — it adds a second code path that can disagree with the native behavior.

The CONDUCTOR orchestrator, the BUILDER workflow, the issue-gate enforcement — these are still valuable internally because they encode Foculoom-specific rules that Copilot's native plan mode doesn't know about. But those rules are only meaningful in Foculoom's context. For anyone else, they would be noise layered on top of features that already work.

respit

respit is a plugin for the personal-sustainability side of solo founder work. Its skills cover cognitive state monitoring (energy-check), scope-creep intervention (scope-brake), momentum tracking (project-health), structured reflection (personal-retro), and idea capture (ideas). The premise: generic productivity plugins assume a team; a solo founder is the entire team and needs different defaults.

The rejection:

> Upon reviewing the files in the plugin repo, I'm going to be rejecting this plugin. The skills seem to be tailored around a very specific working style and are rather vague in their generalised usage.

Both parts are accurate. The skills are tailored around a specific working style — Foculoom's. scope-brake fires based on Foculoom's definition of scope creep. project-health queries Foculoom's issue tracker conventions. personal-retro outputs in a format that feeds back into the Foculoom session-store database. Extracted from that context and handed to a different team, these skills would need to be rewritten almost entirely to be useful.

The "rather vague in their generalised usage" note is also fair. A skill like energy-check sounds meaningful if you know the context; in isolation, it reads as "prompts the founder to check in with themselves," which is something a sticky note does just as well.

The distinction between orqit and respit's rejections is instructive. orqit was rejected because it reimplemented things that exist natively. respit was rejected because its value is entirely dependent on context that doesn't transfer. Two different failure modes, both legitimate.

Five design principles for Copilot plugins

Even if you never submit to the marketplace. These are the principles we'd apply differently if we started over.

1. Design for the gap, not the overlap. If GitHub Copilot already handles something well — code generation, plan mode, model selection, chat context — don't replicate it. Your plugin should do what Copilot cannot do alone: enforce your team's specific constraints, encode your team's domain knowledge, or bridge Copilot to your specific external systems. If you can't clearly articulate what gap you're filling, you're probably building redundancy.

2. Fail loudly and explicitly. Workflows where an AI model takes autonomous sequential steps — writing code, calling tools, opening pull requests — can fail silently and produce confident wrong outputs. Every skill should have explicit stop conditions, named human-review gates, and clear escalation paths. A skill that says "I don't know — stop and ask" is better than one that makes a guess and continues. The stop conditions are not error handling; they are part of the intended behavior.

3. Separate instructions from procedures from roles. What lives in copilot-instructions.md should be stable conventions — things that are true on every request, not just some requests. What lives in skills should be repeatable, invocable procedures. What lives in agent profiles should be bounded personas with explicit limits. Mixing these creates context that is simultaneously too heavy for routine use and too vague for specific use.

4. Write the playbook before the automation. If you cannot write a clear step-by-step Markdown procedure for a workflow in about 20 lines, you are not ready to encode it as a skill. The SKILL.md format forces you to be explicit: what triggers this, what each step does, what to do when a step fails. That explicitness is the value. A skill that you cannot describe precisely will not execute reliably.

5. Context budget is the scarce resource. Every line of instructions, every invoked skill costs attention — the model can only hold so much context at once, measured in tokens (roughly three-quarters of a word each, with a hard per-request cap). Design your plugin with a token budget in mind, not a feature budget. A focused 300-line copilot-instructions.md with 6 well-defined skills is more effective than a sprawling 1,500-line instructions file with 25 skills that overlap. The model's attention is finite.

Closing

orqit and respit are staying internal. That is the right call.

Both plugins are useful precisely because they are specific to Foculoom's workflow. orqit encodes Foculoom's agent routing rules, founder gate requirements, and model-cost management logic. respit encodes Foculoom's specific working rhythms and how they feed back into the session-store database. Those specifics are not a limitation — they are the point. The moment you generalize them enough to be useful to people outside Foculoom, you have removed the parts that make them worth using.

The marketplace rejection forced a clarifying question: "Is this general enough to help people we've never met?" For both plugins, the honest answer was no. And "no" is a complete answer. Not every piece of useful tooling needs to be a public artifact.

The Awesome Copilot community is worth engaging with. The automated intake is well-designed — the iterative feedback loop, the SHA pinning, the staleness re-review at six months. The maintainers gave clear, specific feedback. The rejection criteria are consistent: does this bring meaningful uplift in capability, and is that uplift generalizable? Both questions are fair.

What we are doing with what we learned: keeping orqit and respit as internal tooling, continuing to develop them as Foculoom's workflows evolve, and not trying to generalize them until they earn generalization. If orqit's orchestration model becomes something that could genuinely help solo founders in other contexts — independent of Foculoom's specific rules — that would be a different conversation. We're not there.

If you want to see what these look like in practice, both repos are public: foculoom/orqit and foculoom/respit.

The best Copilot plugins are not the ones that add the most features. They are the ones that encode the judgment calls your team makes repeatedly, so you only have to make them once. The hard part is knowing which judgment calls are yours and which ones are everyone's.

Credit to Aaron Powell's post for documenting how the Awesome Copilot automated intake works — the engineering behind that workflow deserves attention independent of any particular plugin outcome.