CodeCome
open-source · built on OpenCode · early PoC

AI-assisted vulnerability research without losing the trail.

CodeCome is the harness I built to let an AI agent help me audit source code without losing the trail. Six discrete phases, Markdown findings with structured YAML frontmatter, validation inside a Docker sandbox, demonstrated impact, and a report you can grep — every time, for every target.

GPL or AGPL (your choice) Local-first Bring your own model v0.x · early PoC
~/research/codecome — six phases
# run the six phases of an audit $ make phase-1 # recon + sandbox bootstrap $ make phase-2 # generate candidate findings $ make phase-3 # counter-analysis / dedup $ make phase-4 FINDING=CC-0001 # validate in sandbox $ make phase-5 FINDING=CC-0001 # build PoC, demonstrate impact $ make phase-6
workspace · itemdb/ on disk
itemdb/
├─notes/ recon, sandbox-plan
├─findings/
│ ├─PENDING/ CC-0003 · CC-0004
│ ├─CONFIRMED/ CC-0001
│ ├─EXPLOITED/ CC-0022
│ ├─REJECTED/ CC-0005
│ └─DUPLICATE/ CC-0006
├─evidence/ PoCs, exploits/, logs
└─reports/ report.md
phases6
finding states5
agentsrecon · auditor · reviewer · validator · exploiter · reporter
data planeMarkdown + YAML on disk
the problem

Chat is not an audit trail.

After watching too many chat sessions produce confident-sounding "potential SQL injection" claims with zero evidence, the goal became simple: a workflow where every claim is a file on disk, every file points at specific lines of code, every finding either has evidence or gets rejected, and the whole thing is reviewable by a human in an afternoon.

  • No database. No RAG. No disappearing chat history. Everything lives on disk as Markdown and YAML you can grep.
  • Every claim becomes an artifact. A hypothesis is a file. A confirmation is a file. A PoC is a file.
  • Hypotheses are not confirmed bugs. A plausible vulnerability is first a hypothesis — confirmation requires evidence.
  • Impact must be demonstrated. Without a PoC, developers dismiss findings as theoretical. Phase 5 is where impact is shown.
what CodeCome is

Research methodology, made executable.

CodeCome is not a vulnerability scanner. It is not a pentest tool. It is not a magic AI bug finder. It is a harness — a set of conventions, prompts, agents and Make targets — that encodes how a careful researcher actually audits code. The model helps you think. You stay in control.

01

Conventions over magic

A workspace layout, naming scheme, finding template and Make targets. Nothing you can't read in an afternoon.

02

Prompts as code

Each phase has explicit prompts checked in under prompts/. Fork, audit and version them like any other code.

03

Evidence over vibes

A finding is not "real" until it has been counter-argued, validated in a sandbox, and reproduced from an artifact on disk.

how it works

Drop source under src/, configure codecome.yml, run the phases.

src/ can be a copied source tree, a git submodule, a checked-out repo, an extracted archive, or a benchmark corpus. CodeCome doesn't care which. The harness will try to build, test and run the target inside the sandbox — that's the point. Validation happens against a real build.

codecome.yml
# Project + audit configuration. Defaults work out of the box.
project:
  name: my-target

audit:
  scope:
    include: ["src/**"]
    exclude: ["src/vendor/**", "src/**/_test.*"]
  focus: [sqli, ssrf, deserialization, authz]
  extra_prompts:
    reconnaissance: |
      Focus sandbox on ASAN builds.

agents:
  auditor:
    model:   anthropic/claude-opus-4-7
    variant: high
  reviewer:
    model:   anthropic/claude-haiku-4-5

validation:
  allowed_write_paths: ["itemdb/**", "sandbox/**", "tmp/**"]
$ run your first audit — 8 commands
# bootstrap a virtual env and install deps
$ make venv

# sanity-check the workspace, model creds, sandbox
$ make check

# six phases of an audit — each restartable
$ make phase-1                        # recon + sandbox bootstrap
$ make phase-2                        # generate candidate findings
$ make phase-3                        # counter-analysis (dedup / reject)
$ make phase-4 FINDING=CC-0001        # validate one finding
$ make phase-5 FINDING=CC-0001        # build a PoC, demonstrate impact
$ make phase-6                        # generate the report

# walk ONE finding end-to-end first — you'll learn more
# from a single CC-0001 than from twenty PENDING ones.

A flat, explicit workflow

  • 1. Mount. Put the source you want to audit under src/ — copied, submodule, symlink, your call.
  • 2. Configure. Tune audit.scope and audit.focus in codecome.yml; pick a model per agent under agents.<name>.model.
  • 3. Run the phases. Phases 1–3 are batch; phases 4–5 are per finding, so evidence stays traceable.
  • 4. Inspect on disk. Findings, evidence and reports live under itemdb/ as plain files — tree itemdb/ is the dashboard.
  • 5. Ship. Generate the report with make phase-6 or hand the workspace to a teammate.

Avoid make validate-all / exploit-all on a fresh project. Walk one finding through end-to-end first.

Read the workflow guide

Bias a phase without forking

Three layered ways to append instructions to any phase prompt. All additive, applied in this order:

YML
audit.extra_prompts.<phase> in codecome.yml — persistent project-wide policy.
FILE
make phase-1 PROMPT_EXTRA_FILE=my-notes.md — versioned notes you keep around.
ENV
make phase-1 PROMPT_EXTRA="…" — one-shot inline bias.
six phases · six agents

The audit, broken into six discrete steps.

Each phase has its own prompt under prompts/, its own agent under .opencode/agents/, its own outputs, and writes to a known location on disk. Phases 1–3 are batch operations; phases 4 and 5 run per finding — intentional, to keep evidence traceable.

PHASE 01

Recon + sandbox bootstrap

1a: agent reads src/, infers target type, languages, build model, attack surface — notes under itemdb/notes/. 1b: picks a baseline from templates/sandboxes/, applies it to sandbox/, validates it, writes itemdb/notes/sandbox-plan.md.

reconsandbox bootstrap
PHASE 02

Hypothesis

Generate candidate findings under itemdb/findings/PENDING/. Each points at specific code, sources, sinks and a trust boundary. Gated by the sandbox.

+ deep sweep
Pair Phase 2 with make sweep to force one auditor session per high-risk file.
PHASE 03

Counter-analysis

A reviewer pass tries to disprove or deduplicate findings. Looks for unreachable code, input validation, authorization, framework protections, false assumptions. Weak hypotheses → REJECTED/, repeats → DUPLICATE/.

REJECTEDDUPLICATE
PHASE 04

Validation

One finding at a time, inside the Docker sandbox. Build the target, write a small PoC, capture evidence under itemdb/evidence/<id>/, decide CONFIRMED or REJECTED.

CONFIRMEDsandbox
PHASE 05

Exploit

Build a real PoC that shows concrete impact: code execution, data exfiltration, privilege escalation. The exploiter may adjust severity based on what is demonstrated and move the finding to EXPLOITED/. Artifacts under evidence/<id>/exploits/.

EXPLOITEDseverity_after
PHASE 06

Reporting

Generate a Markdown report grouping exploited and confirmed findings with evidence references. Default path: itemdb/reports/report.md. make report is the lightweight local variant (no agent).

report
finding lifecycle

Five states. Explicit transitions.

Every finding lives in exactly one folder, named after its current state. Phase 3 moves to REJECTED/DUPLICATE. Phase 4 promotes to CONFIRMED. Phase 5 may promote to EXPLOITED. You can also move them manually with make findings-move.

PENDING

Hypothesis filed

An idea worth investigating. Filed by Phase 2 (or sweep). Not yet validated.

CONFIRMED

Reproduced in sandbox

Validation captured evidence. Real bug, not weaponized yet.

EXPLOITED

Impact demonstrated

A reproducible exploit exists under evidence/<id>/exploits/. Severity adjusted by Phase 5.

REJECTED

Falsified

Counter-analysis or validation killed the hypothesis. Kept on disk so it isn't re-investigated for free.

DUPLICATE

Already tracked

Same root cause as another finding. Linked, not deleted.

PENDINGCONFIRMEDevidence captured in sandbox (Phase 4)
PENDINGREJECTEDdisproved by counter-analysis or validation (Phase 3 / Phase 4)
PENDINGDUPLICATEalready filed under another finding (Phase 3)
CONFIRMEDEXPLOITEDworking PoC under exploits/ (Phase 5)
CONFIRMED— stays —not feasible to weaponize; documented and kept
what a finding looks like

Plain Markdown. Structured YAML. Real evidence.

A finding is a single Markdown file with a validated YAML frontmatter — the unit of work in CodeCome, not a Jira ticket or a row in a database. Below: the rendered view of CC-0022 (SQL injection in a PHP app's user.get JSON-RPC API), followed by the raw source on disk.

CC-0022

SQL injection via unvalidated selectRole in user.get

CRITICAL EXPLOITED CWE-89
CategorySQL Injection
Target areaJSON-RPC API user.get method
Fileui/include/classes/api/services/CUser.php
SymbolCUser::addRelatedObjects()
SourceJSON-RPC options['selectRole']
SinkDBselect() · CUser.php:2243-2248
Trust boundaryauthenticated API user → raw SQL SELECT clause
SeverityHIGH → CRITICAL (after exploit)
Phases recon hypothesis counter validation exploit
Evidence
itemdb/evidence/CC-0022/
itemdb/evidence/CC-0022/exploits/
# Summary The user.get JSON-RPC API accepts a selectRole array whose elements are concatenated into a SQL SELECT clause via implode(',r.', ...) without any allowlist check. Authenticated users at the lowest privilege level can inject SQL fragments and extract data from the database. # Counter-analysis - Argued: CApiInputValidator would catch this. - Outcome: not used on this code path. Verified at line 91. - Argued: dbConditionInt() sanitises arrays. - Outcome: only sanitises $userIds, not the SELECT clause. # Validation plan 1. Send a user.get JSON-RPC request as a low-privilege user with selectRole: ["roleid,(SELECT version())"]. 2. Observe the version string returned inline in the response. 3. Evidence under itemdb/evidence/CC-0022/.
itemdb/findings/EXPLOITED/CC-0022-sqli-user-get.md
---
id:            "CC-0022"
title:         "SQL injection via unvalidated selectRole in user.get JSON-RPC API"
status:        "EXPLOITED"
severity:      "CRITICAL"
confidence:    "CONFIRMED"
category:      "SQL Injection"
cwe:           ["CWE-89"]
language:      "php"
target_area:   "JSON-RPC API user.get method"
files:
  - "src/app-1.4.1/ui/include/classes/api/services/CUser.php"
symbols:
  - "CUser::addRelatedObjects()"
sources:
  - "JSON-RPC options['selectRole'] parameter"
sinks:
  - "DBselect() at CUser.php:2243-2248"
trust_boundary: "authenticated API user -> raw SQL SELECT clause"
validation:
  status:       "CONFIRMED"
  methods:      ["http_exploit", "runtime_reproduction"]
  evidence_dir: "itemdb/evidence/CC-0022"
exploitation:
  status:           "DEMONSTRATED"
  severity_before:  "HIGH"
  severity_after:   "CRITICAL"
  artifacts_dir:    "itemdb/evidence/CC-0022/exploits"
---

# Summary

The user.get JSON-RPC API accepts a selectRole array
whose elements are concatenated into a SQL SELECT clause without any
allowlist check. Low-privilege authenticated users can inject SQL.

Why files, not a database

A vulnerability research project should still be readable in five years. Files survive renaming, forking, GitHub outages and SQL migrations. grep, git log and diff are the only tools you need.

YAML you can validate

Run make frontmatter to validate every finding's metadata via tools/check-frontmatter.py. Bad frontmatter fails fast. The Markdown body stays free-form so researchers aren't fighting a form.

Tooling that travels

Findings render in any Markdown viewer — GitHub, Obsidian, an editor preview pane. The "dashboard" is just tree itemdb/findings/ or make status.

sandbox validation

Validation happens in a sandbox.

Before a finding is marked CONFIRMED, CodeCome reproduces it against a real build of the project — in a Docker container, behind a network namespace, away from your host. Phase 1b bootstraps a sandbox suited to the stack; if the payload doesn't fire there, it doesn't make the cut.

$ make phase-4 FINDING=CC-0022
 sandbox already bootstrapped (Phase 1b)
 starting container       … healthy
 replaying payload        … itemdb/evidence/CC-0022/exploit.sh

  POST /api_jsonrpc.php HTTP/1.1
  Authorization: Bearer <low-priv-token>
  Body: {"method":"user.get","params":{"selectRole":["roleid,(SELECT version())"]}}

  HTTP/1.1 200 OK
  Body contains: "10.6.18-MariaDB"

 assertions
  ✓ status 200
  ✓ response inlines server version()
  ✓ query log shows injected SELECT in r.* clause

 result: CONFIRMED
 moved itemdb/findings/PENDING/CC-0022 → itemdb/findings/CONFIRMED/CC-0022

Per-capability sandbox helpers exist as separate targets: sandbox-list, sandbox-detect, sandbox-inspect ID=python, sandbox-bootstrap, sandbox-validate, sandbox-regenerate, sandbox-status, plus runtime helpers sandbox-{setup,up,check,build,test,down,shell,logs,clean,reset}. See docs/sandbox.md.

make sweep · the secret weapon

When breadth isn’t enough, sweep file-by-file.

Phase 2 is wide and fast. make sweep is the opposite: it runs the auditor agent once per file, forcing exhaustive line-by-line analysis on every high-risk file in itemdb/notes/file-risk-index.yml. It catches what broad audits miss; Phase 3 cleans the overlap.

deep sweep
# preview which files would be swept
$ make list-risk-files

# dry-run: show selected files + prompts, no agent calls
$ python tools/run-sweep.py --dry-run

# sweep everything scoring 4+ in file-risk-index.yml
$ make sweep

# sweep a specific file…
$ make sweep FILE="src/path/to/file.ext"

# …or a glob
$ make sweep FILE="src/**/*.cs"

Trade-offs to know

  • One full agent session per file. Token cost scales linearly with the number of files swept — sweep on 10 files costs roughly 10 Phase-2 runs.
  • Produces overlap with Phase 2. By design. Phase 3 deduplicates on semantic frontmatter fields (sources, sinks, entry_points, trust_boundary, target_area).
  • Always --dry-run first. See what would be swept and the per-file prompts before committing tokens.
  • Reads itemdb/notes/file-risk-index.yml written by Phase 1. Without a fresh recon, the sweep set is stale.
docs/file-risk-sweeps.md
screenshots

What CodeCome actually looks like.

Sanitized snapshots from real audits — enough to show the workflow, not enough to leak target-specific exploit details or credentials. Click any tile for the full-size image.

An asciinema cast of a full run is planned.

who it's for

Built for people who already do this work.

CodeCome won't turn a non-researcher into one. It will save a researcher hours of bookkeeping per audit. If you want a one-click vulnerability scanner, this is not it. CodeCome is for people who want the model to help them think, not to replace the thinking.

Solo security researchers

LLM help on source-code audits — without trusting an opaque chat

Audit codebases at your own pace, with a trail you can re-read months later or hand to someone else.

Blue + Red teamers

Source-code review that produces commit-friendly artifacts

From recon to PoC, every step lands in the workspace as a Markdown finding with evidence references, ready to ship.

LLM-assisted security studies

An instrumented harness you can fork or A/B

Intentionally simple — fork the prompts, swap the agent runner, compare runs across models. The harness is the experimental surface.

prerequisites

What you need before running it.

CodeCome runs on top of OpenCode (1.14.39 or newer) with your own LLM provider, plus a small Python + Make + Docker stack. make check warns about anything missing — the core workflow runs without the optional tools.

required core stack · every audit needs these
OC
OpenCode 1.14.39+

The open-source AI coding agent CodeCome drives. Install guide.

K
An LLM provider key

At least one of Anthropic, OpenAI, Google, xAI, Groq, Cerebras, GitHub Copilot, Google Vertex — or a local OpenAI-compatible endpoint. Provider setup.

PY
Python 3.10+

For workspace tooling. make venv creates a local virtualenv at .venv/.

MK
GNU Make

The entire workflow is driven through make targets.

D
Docker

Required for the sandboxed validation environment used by Phases 1b / 4 / 5.

optional for Phase 5 visual evidence
asciinema

Terminal recordings of exploit replays.

GIF
agg

Renders .cast files to GIFs. CodeCome falls back to a Docker container if missing.

ffmpeg + xvfb

For GUI / browser exploits where video evidence matters. xvfb-run is fine too.

Auditing untrusted code?

Read Safety considerations below before pointing CodeCome at code you didn't write.

safety considerations

Treat unknown source code as data, not safe input.

Read this before pointing CodeCome at code you did not write.

CodeCome feeds target source code to an LLM agent with powerful tools — it reads and writes files in the workspace, executes commands in a sandbox, builds and runs the target, and can fetch resources from the network. Treating unknown source code as data is not safe by default.

Risks worth knowing about

Prompt injection from the target

Comments, docstrings, READMEs, test fixtures, log strings, commit messages, filenames — even crafted binary blobs inside src/ — can carry instructions aimed at the agent ("ignore previous instructions…", "exfiltrate $HOME/.ssh/…"). The agent reads these as input, but LLMs are still susceptible.

Supply-chain hazards in the sandbox

Phase 1b will try to build and run the target. A malicious setup.py, package.json lifecycle hook, Makefile, Dockerfile, or configure script executes inside the sandbox container with whatever permissions Docker gives it.

Resource exhaustion and side effects

Adversarial code may try to consume CPU, disk, or network from the validation phase. A prompt-injected runaway agent loop can burn tokens just as easily.

Exfiltration via network

If the sandbox (or your host) can reach the internet, an injected agent or a malicious build step can attempt to send data out. The default policy assumes egress is possible.

Recommended precautions

Run the whole workspace inside an isolation boundary when auditing untrusted sources — a disposable VM (Multipass, Vagrant, UTM, Proxmox), a dedicated container, or a remote throwaway host. Do not run CodeCome on a machine that holds credentials, SSH keys, browser profiles, or production access you can't afford to lose.

Treat src/ as untrusted. CodeCome funnels execution through sandbox/, but the make runner itself, the agent, and any helper scripts still execute on the host.

Restrict network egress from the sandbox (and ideally from the outer VM) to only what you need for builds and package installs.

Use a fresh API key with low spend limits for the LLM provider, so a prompt-injected runaway loop can't rack up an unbounded bill.

Review what the agent writes under itemdb/, sandbox/ and tmp/ before trusting any of it. Findings, evidence and reports are all attacker-influenced when the target is untrusted.

Avoid make exploit-all and make validate-all on untrusted targets until you have walked at least one finding through manually and confirmed the sandbox behaves the way you expect.

CodeCome's sandbox is a containment aid, not a security boundary against a determined attacker. If you wouldn't be willing to run docker build and ./run-tests.sh from the target's repo on the host, you shouldn't run CodeCome against it on the host either.
workspace layout

One repo. Everything on disk.

A CodeCome workspace is a normal git repo with a small, fixed set of folders. The heart is itemdb/ — findings, evidence, notes and reports. Drop it into any IDE and read it as code.

~/research/my-target
my-target/
├─README.md
├─AGENTS.mdagent rules
├─codecome.ymlproject + audit config
├─src/target source code
├─sandbox/Docker validation env
├─itemdb/heart of the audit
│ ├─notes/recon, sandbox-plan
│ ├─findings/
│ │ ├─PENDING/
│ │ ├─CONFIRMED/
│ │ ├─EXPLOITED/
│ │ ├─REJECTED/
│ │ └─DUPLICATE/
│ ├─evidence/PoCs · exploits/
│ └─reports/report.md
├─runs/run summaries + transcripts
├─templates/finding, evidence, sandbox templates
├─tools/Python helper scripts
├─prompts/per-phase prompts
├─docs/deeper documentation
└─.opencode/agents + skills

Why this shape

  • Code under audit lives under src/. Vendor it, submodule it, or symlink it — your call.
  • itemdb/ is the audit. Everything generated about the target — notes, findings, evidence, reports — lives there.
  • Findings are folders. No DB, no SaaS. State is "which folder is this file in".
  • Prompts and sandbox are first-class. They live in the repo and ship with the audit.
  • .opencode/agents/ holds the six agents: recon, auditor, reviewer, validator, exploiter, reporter.
model strategy

Bring your own model. Pin per agent.

CodeCome lets you pin a different model for every phase via agents.<name>.model in codecome.yml (or CODECOME_MODEL on the command line). Different models see different bugs — running Phase 2 with two providers and letting Phase 3 dedupe is a legitimate strategy.

RS

Reasoning-heavy for P2 & P5

Audit and exploit phases benefit from reasoning models — Opus, GPT reasoning variants, Gemini Pro reasoning. Pin with agents.auditor.model: anthropic/claude-opus-4-7.

FW

Fast workhorses for P3 & P6

Counter-analysis and reporting are well-served by smaller, cheaper models. Mix freely; no requirement to run a single provider end-to-end.

L

Local mode

Point any agent at an OpenAI-compatible local endpoint (Ollama, vLLM, llama.cpp). No code leaves your machine. Use make show-model to print the resolution table per agent.

The model helps you think. You stay in control. Nothing is committed to a finding folder without an explicit phase being run.

styled wrapper

Tool calls rendered as panels — not as JSON soup.

By default, phase targets wrap opencode run --format json with a CodeCome-owned styled renderer. Assistant output, tool calls and tool results render with consistent colors and structure. It also routes plain bash invocations (cat, head, tail, rg, ls, find, tree, rtk …) through the matching styled renderer.

environment toggles
# bypass the styled wrapper entirely
CODECOME_USE_WRAPPER=0

# control --thinking on the provider call
CODECOME_THINKING=1            # force on
CODECOME_THINKING=0            # force off (don't pay for reasoning tokens)

# model resolution
CODECOME_MODEL=anthropic/claude-opus-4-7
CODECOME_MODEL_VARIANT=high

# surfaces
CODECOME_RENDER_REASONING=0     # hide on-screen Thinking panels
CODECOME_SANDBOX_RENDER=0       # disable structured Sandbox panel
CODECOME_BASH_SHIM_RENDER=0     # disable rtk/cat/head/tail/rg routing

# budgets
CODECOME_BOOTSTRAP_MAX_RETRIES=3
CODECOME_REASONING_MAX_CHARS=4000

# forward extra flags to opencode run
OPENCODE_ARGS="--max-tokens 8192"

What it gives you

  • Per-tool panels. read, write, edit, apply_patch, grep, glob, bash, todowrite, skill all get their own renderer.
  • Sandbox panel. Detects tools/sandbox-bootstrap.py --format json calls and renders capability tables, validation tier summaries and color-coded gate badges.
  • Per-provider --thinking defaults. Anthropic off (interleaved), most others on. Override with CODECOME_THINKING.
  • Model resolution banner. Every phase prints which model it actually picked and where the value came from. Useful when a run feels off.
docs/development.md
local helper commands

Everything you need without an agent.

A handful of make targets cover day-to-day workspace bookkeeping — no LLM call required.

make help
Show all available commands.
make check
Validate workspace + model creds + Docker reachability.
make status
Show finding status counts.
make findings
List findings (filter with STATUS=PENDING).
make findings-create TITLE="…"
Create a new finding skeleton.
make findings-move FINDING=CC-0001 STATUS=CONFIRMED
Move a finding between status folders.
make findings-evidence FINDING=CC-0001
Create the evidence directory for a finding.
make next-id
Print the next free finding id.
make frontmatter
Validate finding frontmatter via tools/check-frontmatter.py.
make index
Regenerate the finding index.
make report
Regenerate the lightweight local report (no agent).
make list-risk-files
Top-scoring risky files from the risk index.
make show-model [AGENT=auditor]
Print the model resolution table for a phase.
make itemdb-reset
Reset local audit artifacts (destructive — keep prior work elsewhere first).
make tests
Run the Python test suite under tests/.
make sandbox-shell
Open an interactive shell inside the sandbox container.
project status

Early. Useful. Honest about both.

CodeCome is an early PoC. The conventions are stable enough to use; the tooling around them is still moving. Below is what works well today and what is still rough — no marketing.

Works well Markdown findings with structured YAML frontmatter — stable schema itemdb/findings/
Works well File-based item DB — no DB, no RAG, easy to grep and commit itemdb/
Works well Per-phase make targets with readiness gates Makefile
Works well Docker sandbox bootstrap (Python, C/C++, .NET, PHP, IaC, …) templates/sandboxes/
Works well Styled wrapper output with per-tool renderers tools/
Rough One agent at a time — no parallel auditing or validation v0.next
Rough validate-all is sequential v0.next
Rough Docker is the only first-class sandbox runtime today sandbox/
Rough Phase 2 + deep sweep produce overlapping findings (Phase 3 cleans) prompts/
Rough Provider coverage for --thinking is hand-maintained tools/
Missing No CI — quality gate is make tests run locally v0.next
documentation

Read the docs before you trust the output.

CodeCome's value is in its methodology, not its UI. The docs explain how each phase works and what its prompts assume.

authors

Who builds CodeCome.

Project Lead

Pablo Ruiz García

Architecture, engineering, implementation, and the person who turns vague ideas into working code.

Product Lead

Alejandro Ramos

Product direction, use cases, requirements, and official provider of impossible requests that somehow keep becoming roadmap items.

Pull requests are expected, encouraged, and appreciated. See CONTRIBUTING.md.

contributing

Help shape an honest research harness.

CodeCome is small. A patch to a phase prompt, a sandbox template for a new language, or a bug report on a confusing convention are all valuable. We won't accept PRs that turn this into a scanner.

Read CONTRIBUTING Open an issue

Prompts

Improve a phase prompt with a diff and a short rationale. Bring a run summary if you can.

Sandbox templates

Contribute a Dockerfile + scripts for a stack we don't cover yet, under templates/sandboxes/.

Methodology

Disagree with a phase boundary? Open a discussion before a PR.

Tooling

CLI ergonomics, schema validation, report generation — all welcome.

license

GPL-3.0-or-later OR AGPL-3.0-or-later.

Dual-licensed — pick whichever copyleft fits your context. Files under templates/sandboxes/ are an exception: MIT, so they can be copied into user workspaces without imposing copyleft on those projects. See LICENSE, AGPL-LICENSE, templates/sandboxes/LICENSE and NOTICE.

LICENSE AGPL-LICENSE NOTICE