Open Source Framework

SkillForge

A Self-Evolving Skill Synthesis Framework for AI Systems

Enable AI agents to autonomously generate, verify, evolve, and reuse structured skill packages — outperforming human-authored skills through co-evolutionary feedback loops.

72.3% Evolved Skill Pass Rate
+41.1pp vs No-Skill Baseline
5x Faster Than Human Authoring

Why SkillForge?

Modern AI agents face a fundamental skill acquisition bottleneck

Manual Authoring Doesn't Scale

Human-crafted skills require domain experts, are expensive to produce, and cannot keep pace with the diversity of tasks agents encounter.

Cognitive Misalignment

Skills designed for human intuition often degrade agent performance. What makes sense to a human expert doesn't match how LLM agents reason and act.

Experience Is Wasted

Agents solve tasks from scratch without converting past experience into reusable knowledge. Every task starts at zero.

SkillForge solves this through three synergistic mechanisms:

1

Co-Evolutionary Synthesis

A Skill Generator and Surrogate Verifier co-evolve together, iteratively refining skills through structured feedback loops.

2

Tiered Memory

Experience is progressively distilled from raw episodes into cross-task patterns and executable rules, with adaptive retrieval.

3

Failure-Driven Evolution

Skills evolve in response to diagnosed failure patterns. LLM-driven reflection provides causal insights for targeted repairs.

Architecture

A modular, plug-in framework with seven key components

Skill Generator

Creates & refines multi-artifact skill packages

Surrogate Verifier

Co-evolving test generation with info isolation

Evolution Engine

Diagnose-before-prescribe improvement

Skill Bank

Persistent evolving skill repository

Episodic Memory

Raw outcomes per task

Semantic Memory

Cross-task patterns

Procedural Memory

Executable rules

Adaptive Retrieval Controller

Thompson Sampling-based policy selection for when and what to retrieve

Workflow

Co-evolutionary skill synthesis in iterative cycles

flowchart TD
    A[Task Input] --> B[Skill Generator]
    B --> C[Generate Skill Package v0]
    C --> D[Execute Skill in Environment]
    D --> E[Surrogate Verifier]
    E --> F{Tests Pass?}
    F -->|No| G[Generate Failure Diagnostics]
    G --> H[Append Feedback to Context]
    H --> B
    F -->|Yes| I[Ground Truth Oracle]
    I --> J{Oracle Pass?}
    J -->|No| K[Escalate Verifier Tests]
    K --> E
    J -->|Yes| L[Deploy Evolved Skill]
    L --> M[Store in Skill Bank]
    M --> N[Update Memory Tiers]
    N --> O[Adaptive Retrieval Update]
    O --> P{More Tasks?}
    P -->|Yes| A
    P -->|No| Q[Final Skill Portfolio]

    subgraph Co-Evolution Loop
        B
        C
        D
        E
        F
        G
        H
        I
        J
        K
    end

    subgraph Memory and Retrieval
        M
        N
        O
    end

    style A fill:#4CAF50,color:#fff
    style Q fill:#2196F3,color:#fff
    style L fill:#FF9800,color:#fff
                
Convergence: Skills typically converge within 3-5 evolution rounds, with the surrogate verifier absorbing ~60% of iteration cost before escalating to the oracle.

Evaluation

Rigorous, evaluation-driven design with built-in benchmarking

Pass Rate

tasks passed / total tasks

Primary quality metric — binary pass/fail per task

Correctness Score

Σ assertions passed / total assertions

Fine-grained per-assertion accuracy measure

Evolution Efficiency

final pass rate / evolution rounds

Quality improvement per iteration

Transfer Score

cross-model rate / self-evolved rate

Portability across AI models

Token Cost

tokensgen + tokensverify

Total computational cost of evolution

Failure Diversity

H(failure categories)

Entropy of failure mode distribution

Benchmark Results

Demonstrative results showing framework capabilities

Note: These results are synthetic/demonstrative and illustrate the expected output format.

Pass Rate (%) by Condition
No-Skill Baseline
Human-Curated
One-Shot Self-Gen
SkillForge (3 rnd)
SkillForge (5 rnd)
SkillForge (Full)

Cross-Model Transfer

Target Model With Skills (%) No Skill (%) Δ
Model A (self-evolved)72.331.2+41.1
Model B (transferred)66.828.5+38.3
Model C (transferred)62.122.3+39.8
Model D (transferred)55.412.8+42.6
Model E (transferred)50.29.1+41.1

Human vs AI Simulation

Controlled comparison validates framework effectiveness

Note: These results are synthetic/demonstrative.

Human-Authored Skills

66% Avg Pass Rate
40 min Avg Authoring Time
  • Better ambiguity interpretation
  • Creative problem-solving
  • Domain intuition for edge cases
VS

AI-Generated Skills

73% Avg Pass Rate
8 min Avg Generation Time
  • Systematic sub-task coverage
  • Format and precision compliance
  • 5x faster than human authoring

Key Insight

AI-evolved skills outperform human-authored ones by encoding agent-native reasoning patterns rather than following human assumptions. The optimal approach is human-AI collaboration: human high-level strategy combined with AI-refined executable details.

Agentic Platform

Discoverable agents for any orchestration framework

skillforge.evolver

SkillEvolver

Evolve a verified skill package for a task through co-evolutionary verification

Stateful
skillforge.retriever

SkillRetriever

Search the Skill Bank for existing skills matching a task description

Stateless
skillforge.executor

SkillExecutor

Execute a task augmented with a specific skill from the Skill Bank

Stateful
skillforge.evaluator

SkillEvaluator

Benchmark and compare skill quality using synthetic test generation

Stateless
skillforge.memory

MemoryConsultant

Query tiered memory for relevant experience, patterns, and procedural rules

Stateless

Integration Protocols

🔧 Tool Registration

Register as callable tools in LangChain, AutoGen, Semantic Kernel, or OpenAI function calling.

create_tools(forge)

🤖 Sub-Agent Delegation

Provide SkillForge agents to orchestrators via the Agent Provider interface.

provider.get_agent("skillforge.evolver")

⚡ Event-Driven

Subscribe to lifecycle events for reactive workflows — skill.evolved, task.failed, memory.promoted, and more.

@bus.on("skill.evolved")

🚀 MCP Server

Expose as a Model Context Protocol server for VS Code and Copilot agents.

python -m skillforge.agentic.mcp_server

Multi-Agent Collaboration

Orchestrator
SkillRetriever
Skill exists?
Yes
SkillExecutor
Result
No
SkillEvolver
MemoryConsultant

Platform Setup

Integrate SkillForge into your AI coding platform in minutes

VS Code

VS Code Copilot Chat

Custom agents appear in the @ picker. Skills appear as / slash commands.

Setup

  1. Clone the repo and open in VS Code
  2. Install the GitHub Copilot Chat extension
  3. Agents and skills are auto-discovered from .github/

Usage

  • Type @SkillForge Evolver to evolve a new skill
  • Type @SkillForge Retriever to find existing skills
  • Type @SkillForge Evaluator to benchmark skills
  • Type /skillforge-evolve, /skillforge-retrieve, or /skillforge-evaluate for guided workflows

Files

.github/copilot-instructions.mdProject-wide instructions
.github/agents/*.agent.md3 custom agents
.github/skills/*/SKILL.md3 skill workflows
Claude

Claude Code / Workspace

CLAUDE.md is loaded automatically. Skills are discovered from .claude/skills/.

Setup

  1. Clone the repo
  2. Open in Claude Code or attach as a Claude workspace
  3. CLAUDE.md is read automatically at session start

Usage

  • Claude reads CLAUDE.md for project context, conventions, and architecture
  • Skills in .claude/skills/ provide guided workflows for evolving, retrieving, and evaluating skills
  • Ask Claude to “evolve a skill for X” or “find a skill for Y”

Files

CLAUDE.mdProject context & conventions
.claude/skills/*/SKILL.md3 skill workflows
OpenAI Codex

OpenAI Codex

AGENTS.md at the repo root is automatically read by Codex at session start.

Setup

  1. Clone the repo
  2. Open in Codex
  3. AGENTS.md is read automatically — no extra configuration needed

Usage

  • Codex reads AGENTS.md for the full project schema, conventions, workflows, and agent interfaces
  • Reference skills/*.md for framework-agnostic skill definitions
  • Ask Codex to “evolve a skill”, “retrieve a skill”, or “evaluate skills”

Files

AGENTS.mdFull project schema & workflows
skills/*.md3 cross-platform skill definitions

Integration

Plug-in architecture for any AI system

📦

SDK / Library

Import and call SkillForge APIs directly in your Python application

from skillforge import SkillForge

Middleware

Wrap existing agent pipelines with transparent skill augmentation

@middleware.enhance
🌐

REST API

Deploy as a standalone service with HTTP endpoints

POST /v1/skills/evolve
🔧

Evaluation Pipeline

Integrate skill quality gates into CI/CD workflows

python -m skillforge.evaluate
🔌

Plug-in

Framework-specific adapters for popular agent systems

AgentAdapter(agent, forge)

Quick Start

from skillforge import SkillForge

forge = SkillForge.from_config("config.yaml")

# Define your task
task = {
    "instruction": "Build a data pipeline that validates and transforms CSV to Parquet",
    "environment": {"tools": ["python", "pandas", "pyarrow"]},
}

# Evolve a skill through co-evolutionary verification
skill = forge.evolve_skill(task, max_evolution_rounds=5)

# Use the evolved skill with any agent
result = forge.execute_with_skill(agent, task, skill)

print(f"Skill v{skill.version}: {skill.accuracy:.0%} accuracy")

Ready to Forge Better Skills?

SkillForge is open source and ready for integration into your AI systems. Start evolving skills that outperform human-authored ones.