Getting Started with DeepWork — Your First Quality-Gated AI Workflow
DeepWork is a CLI tool that adds quality gates to multi-step Claude Code workflows — so every step is validated before the next one runs, failed outputs trigger automatic retries, and your workflow definitions persist as plain SKILL.md files you can version and share.
This is a hands-on walkthrough: install, create your first workflow, add quality gates, and run a real example. You'll have a working quality-gated workflow in under 15 minutes.
Install
DeepWork ships as a Homebrew package. Two commands:
brew tap unsupervisedcom/deepwork
brew install deepwork
Verify it's working:
$ deepwork --version
deepwork 0.4.2
$ deepwork --help
Usage: deepwork <command> [options]
Commands:
init <workflow-name> Scaffold a new workflow
run <workflow-name> Execute a workflow
learn <workflow-name> Run a learn cycle on recent outputs
list List all local workflows
That's it. No Docker, no API keys to configure, no cloud accounts. DeepWork runs locally and calls Claude Code on your machine. The only requirement is an active Claude Code session (or ANTHROPIC_API_KEY set in your environment).
Your First Workflow
DeepWork workflows are defined by two files: a job.yml that describes the steps and wires them together, and one or more SKILL.md files that define what each step should do and what "good output" looks like.
Let's build a three-step workflow: Write a function → Add tests → Review for edge cases. A minimal but complete workflow that demonstrates every core concept.
Scaffold the workflow
deepwork init write-and-test
$ deepwork init write-and-test
✓ Created .deepwork/write-and-test/
✓ Created .deepwork/write-and-test/job.yml
✓ Created .deepwork/write-and-test/skills/
✓ Created .deepwork/write-and-test/skills/implement/SKILL.md
✓ Created .deepwork/write-and-test/skills/write-tests/SKILL.md
✓ Created .deepwork/write-and-test/skills/edge-case-review/SKILL.md
Edit job.yml to define your steps, then:
deepwork run write-and-test
The scaffolded job.yml looks like this:
# .deepwork/write-and-test/job.yml
name: write-and-test
description: Write a function, add tests, review for edge cases
steps:
- name: implement
skill: implement
input:
task: "{{ inputs.task }}"
- name: write-tests
skill: write-tests
input:
implementation: "{{ steps.implement.output }}"
- name: edge-case-review
skill: edge-case-review
input:
implementation: "{{ steps.implement.output }}"
tests: "{{ steps.write-tests.output }}"
The {{ inputs.task }} syntax is a template variable — you'll pass the actual task description at runtime. The {{ steps.implement.output }} syntax wires the output from the implement step directly into the write-tests step's input. No copy-pasting, no re-prompting.
Define the SKILL.md files
Each skill has a YAML frontmatter section that defines quality criteria, and a Markdown body that gives Claude its instructions. Here's the implement skill:
# .deepwork/write-and-test/skills/implement/SKILL.md
---
name: implement
description: Write a clean, production-ready function
quality_criteria:
- Function handles null and undefined inputs explicitly
- TypeScript types are present on all parameters and return values
- No bare throws — errors return Result types or throw typed errors
- Implementation is under 50 lines
---
You are implementing a function based on the task description.
Write a complete, production-ready implementation:
1. Define the function signature with full TypeScript types
2. Handle all edge cases including null, undefined, and out-of-range inputs
3. Use descriptive variable names — no single-letter variables except loop indices
4. Add a JSDoc comment explaining what the function does and its parameters
Output the implementation as a TypeScript code block only. No explanation,
no preamble — just the code.
The write-tests skill:
# .deepwork/write-and-test/skills/write-tests/SKILL.md
---
name: write-tests
description: Write comprehensive unit tests for the implementation
quality_criteria:
- Happy path test present
- Null/undefined input tests present
- At least one edge case test (empty array, zero, boundary value)
- Test descriptions are specific, not generic ("should work")
---
You are writing unit tests for the implementation provided.
Write tests using Vitest syntax (describe/it/expect).
Cover:
1. The primary happy-path case
2. Null and undefined inputs
3. At least two edge cases relevant to the function's logic
4. Any error paths if the function can throw or return an error
Output only the test code as a TypeScript code block.
And the edge-case-review skill:
# .deepwork/write-and-test/skills/edge-case-review/SKILL.md
---
name: edge-case-review
description: Review implementation and tests for missed edge cases
quality_criteria:
- Review explicitly addresses concurrency if relevant
- Review explicitly addresses large input handling
- All uncovered edge cases are flagged with specific examples
- A final score (0-100) is provided with reasoning
---
You are reviewing a function implementation and its test suite for edge cases.
Analyze both the implementation and tests together:
1. Identify any inputs that would cause incorrect behavior not covered by tests
2. Check for integer overflow, off-by-one errors, or empty collection handling
3. Note any assumptions the implementation makes that aren't enforced or tested
4. List uncovered edge cases with concrete example inputs
End your review with: SCORE: [0-100] — [one-sentence reasoning]
Quality Gates in Action
Without quality gates, step 2 runs regardless of whether step 1 produced good output. Add a quality_gate block to any step and DeepWork will evaluate the output before proceeding — retrying automatically if it doesn't pass.
Here's the job.yml with gates added:
name: write-and-test
description: Write a function, add tests, review for edge cases
steps:
- name: implement
skill: implement
input:
task: "{{ inputs.task }}"
quality_gate:
criteria:
- "TypeScript types present on all parameters and return value"
- "Null and undefined inputs handled explicitly"
- "No bare throw statements"
threshold: 85
max_retries: 2
- name: write-tests
skill: write-tests
input:
implementation: "{{ steps.implement.output }}"
quality_gate:
criteria:
- "Happy path test present"
- "Null/undefined input tests present"
- "At least one edge case test"
threshold: 85
max_retries: 2
- name: edge-case-review
skill: edge-case-review
input:
implementation: "{{ steps.implement.output }}"
tests: "{{ steps.write-tests.output }}"
The gate evaluates the step's output against each criterion and produces a score from 0–100. If the score is below threshold, DeepWork retries the step up to max_retries times — passing the failure reasoning back into the next attempt so Claude can correct specifically what was wrong.
What a gate failure looks like
Step: implement
Score: 61/100
Failed criteria:
✗ TypeScript types present on all
parameters and return value
→ found: (items) => { ... }
→ missing return type annotation
✗ Null and undefined inputs handled
→ no null check on `items` param
Retrying (attempt 2/2)...
Step: implement
Score: 91/100
✓ TypeScript types present on all
parameters and return value
✓ Null and undefined inputs handled
✓ No bare throw statements
Passing output to: write-tests
The key detail: when a gate fails, DeepWork passes the failure reasoning back into the retry prompt. Claude doesn't just re-run the same prompt — it sees exactly what was wrong and corrects it. First-run pass rates on well-defined workflows land around 83–87%. With retry logic, effective pass rates climb to 92–96%.
You're reviewing outputs that already cleared a quality bar — not outputs that might be fine or might have missed half your criteria. For more on why this matters, see why Claude Code output quality degrades without enforcement.
Real Example: Refactor a Legacy Function with Test Coverage
Here's a practical workflow you'd actually use. The task: take a legacy JavaScript function with no types and no tests, refactor it to TypeScript, and add test coverage.
The SKILL.md definition
# .deepwork/refactor-legacy/job.yml
name: refactor-legacy
description: Refactor a legacy JS function to TypeScript with tests
steps:
- name: analyze
skill: analyze-legacy
input:
code: "{{ inputs.code }}"
- name: refactor
skill: ts-refactor
input:
code: "{{ inputs.code }}"
analysis: "{{ steps.analyze.output }}"
quality_gate:
criteria:
- "All parameters have TypeScript type annotations"
- "Return type is explicitly declared"
- "Original logic is preserved — no behavior changes"
- "Magic numbers replaced with named constants"
threshold: 88
max_retries: 3
- name: add-tests
skill: write-tests
input:
implementation: "{{ steps.refactor.output }}"
quality_gate:
criteria:
- "Tests cover the happy path"
- "Tests cover at least 2 edge cases from the analysis"
- "Tests use the TypeScript types from the refactored version"
threshold: 85
max_retries: 2
Run it
deepwork run refactor-legacy \
--code "$(cat src/utils/calculate-discount.js)"
$ deepwork run refactor-legacy \
--code "$(cat src/utils/calculate-discount.js)"
Running: refactor-legacy
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[1/3] analyze
✓ Completed — 3 edge cases identified, 2 magic numbers flagged
[2/3] refactor
Gate evaluation... score 74/100
✗ Magic numbers replaced with named constants → found: 0.15, 0.20 inline
Retry 1/3 with failure context...
Gate evaluation... score 92/100
✓ Passed
[3/3] add-tests
Gate evaluation... score 89/100
✓ Passed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Workflow complete
Steps: 3/3 passed
Retries: 1 (refactor step, magic numbers)
Output: .deepwork/refactor-legacy/output/latest/
The output lands in .deepwork/refactor-legacy/output/latest/ — one file per step, plus a combined summary. The retry on the refactor step caught the inline magic numbers that the first attempt missed. You didn't have to re-run anything manually.
The scored output
The final output directory includes each step's result and a summary.md with the gate scores and what each retry corrected:
# summary.md
Workflow: refactor-legacy
Run: 2026-03-29T14:31:07Z
## Steps
### analyze — ✓ PASSED (no gate)
- 3 edge cases identified: negative inputs, zero quantity, discount > 1.0
- 2 magic numbers flagged: 0.15 (standard rate), 0.20 (premium rate)
### refactor — ✓ PASSED (attempt 2/2, score 92)
- Attempt 1 score: 74 — magic numbers not replaced
- Attempt 2 score: 92 — DISCOUNT_RATE_STANDARD, DISCOUNT_RATE_PREMIUM extracted
### add-tests — ✓ PASSED (score 89)
- 5 tests written: happy path, negative input, zero quantity,
discount > 1.0, discount = 0
The learn loop: After a few runs, execute deepwork learn refactor-legacy --last-n 5. DeepWork reviews what triggered retries and updates the SKILL.md files automatically — so the magic number check becomes baked into the quality criteria and the first-attempt pass rate climbs. Workflows that self-improve without manual skill file maintenance.
What to Build First
The fastest way to see value: convert the Claude Code workflow you've been running manually most often. The one where you find yourself re-prompting the same corrections, re-explaining the same standards, catching the same missed edge cases.
- PR review workflows — Define your team's standards once in SKILL.md; every review hits the same bar
- Code generation — Generate + test + review as a single gated sequence instead of three separate sessions
- Documentation generation — Ensure generated docs always cover parameters, return values, and error conditions
- Data transformation scripts — Validate that transformation logic handles nulls, empty arrays, and type coercions before you ship it
If you've run the same Claude Code workflow manually more than twice, it's worth converting. Setup time pays back on the third run. For the full case on why the manual approach breaks down under repetition, read DeepWork vs. Manual Claude Code Workflows.
Install DeepWork and Run Your First Workflow
Two commands to install. Your first quality-gated workflow in under 15 minutes.
brew tap unsupervisedcom/deepwork
brew install deepwork
View on GitHub
Join Early Access
Or get early access for hosted workflows, scheduled runs, and team skill sharing:
You're on the list. We'll be in touch.
Questions or feedback? Open an issue on GitHub.