Mutation Testing | The Garnet Wiki

Code coverage lies. A line can be covered without being tested. Mutation testing reveals the truth: it modifies your code in small ways (mutations) and checks whether your tests catch the change. If your tests still pass after the code is mutated, your tests are weak — they exercise the code but don’t actually verify its behavior.

How Mutation Testing Works

Original code:
    if (age >= 18) return "adult"

Mutation 1: Change >= to >
    if (age > 18) return "adult"
    → Does any test fail? If not: SURVIVED (test gap!)

Mutation 2: Change >= to ==
    if (age == 18) return "adult"
    → Does any test fail? If yes: KILLED (test works)

Mutation 3: Remove the return
    if (age >= 18) { /* nothing */ }
    → Does any test fail? If yes: KILLED

Mutation Operators

Category	Operator	Example
Arithmetic	Replace + with -	`a + b` → `a - b`
Relational	Replace >= with >	`x >= 10` → `x > 10`
Logical	Replace && with \|\|	`a && b` → `a \|\| b`
Negation	Negate condition	`if (valid)` → `if (!valid)`
Return value	Change return value	`return true` → `return false`
Void method	Remove method call	`logger.info(msg)` → `/* removed */`
Constant	Replace constant	`timeout = 30` → `timeout = 0`
Null	Return null instead	`return user` → `return null`

Mutation Score

Mutation Score = (Killed Mutants / Total Mutants) × 100%

Score	Interpretation	Action
90-100%	Excellent test coverage	Maintain
70-89%	Good but gaps exist	Investigate survived mutants
50-69%	Significant test weakness	Prioritize critical code paths
Below 50%	Tests are decorative	Major test improvement needed

Tool Comparison

Tool	Language	Speed	Integration	Best For
Stryker	JS/TS, C#, Scala	Fast (incremental)	Dashboard, CI	JavaScript/TypeScript projects
PIT (Pitest)	Java/JVM	Fast (bytecode)	Maven, Gradle	Java enterprise applications
mutmut	Python	Medium	pytest integration	Python projects
cosmic-ray	Python	Slow	Celery distributed	Large Python codebases
cargo-mutants	Rust	Medium	Cargo integration	Rust projects
infection	PHP	Medium	PHPUnit	PHP applications

Dealing with Equivalent Mutants

Equivalent mutants produce code that behaves identically to the original despite the syntactic change. They cannot be killed because there is no observable difference.

Example	Why It’s Equivalent
`i = 0; while (i < 10)` → `i = 0; while (i != 10)`	Same behavior when i increments by 1
`return x * 1` → `return x * -1` when x is always 0	Same result (0) regardless of operator
Dead code mutation	Code is unreachable, mutation has no effect

Handling: Mark as equivalent and exclude from score calculation. Most tools allow annotations or filters for known equivalents.

CI/CD Integration

Pull request:
    → Unit tests pass
        → Mutation testing (incremental: only changed files)
            → If mutation score < threshold: WARN or BLOCK
            → Report: survived mutants as PR comments
                → Developer reviews and adds missing assertions

Performance Optimization

Technique	Speedup	How
Incremental mutation	10-50x	Only mutate changed files
Parallel execution	2-8x	Run mutants on multiple cores
Early termination	2-3x	Stop testing a mutant once one test kills it
Mutation sampling	Variable	Test a random subset of mutants
Baseline caching	5-10x	Cache previous results, only re-run affected

Anti-Patterns

Anti-Pattern	Problem	Fix
Running mutation tests on entire codebase every PR	Too slow for CI	Use incremental mode on changed files only
Treating mutation score as a target	Teams write trivial assertions to kill mutants	Use as a diagnostic tool, not a KPI
Ignoring equivalent mutants	Inflates failure rate, frustrates developers	Classify and exclude equivalent mutants
No triage of survived mutants	Results are noise without action	Review survived mutants weekly, add tests for important ones
Mutating test utilities	Mutations in test helpers produce confusing results	Exclude test infrastructure from mutation

Checklist

Mutation testing tool selected and configured
Baseline mutation score established for critical modules
Incremental mutation testing integrated into CI (changed files only)
Survived mutants reviewed and triaged weekly
Equivalent mutants identified and excluded
Performance optimizations applied (parallelism, early termination)
Mutation score tracked as a health signal (not a target)
Team trained on interpreting mutation reports

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For test engineering consulting, visit garnetgrid.com. :::