Running Effective Architecture Reviews
Conduct architecture reviews that catch problems early without becoming bureaucratic bottlenecks. Covers review triggers, lightweight RFC processes, decision frameworks, review checklists, and scaling reviews across multiple teams.
Architecture reviews exist to catch expensive mistakes before they become expensive production problems. A missed scaling bottleneck caught in review costs a whiteboard session. The same bottleneck caught in production costs an incident, a rewrite, and three months of rework.
But reviews can also become bureaucratic gates that slow teams to a crawl. The goal is finding the right balance: enough review to catch systemic risks, not so much that every feature needs a committee to approve.
When to Trigger a Review
Not every change needs an architecture review. Reviews should be triggered by risk, not by habit.
Trigger Criteria
A review is warranted when a change:
- Introduces a new service or removes an existing one
- Changes a public API contract (breaking or non-breaking)
- Adds a new data store or significantly changes schema
- Crosses a security boundary (authentication, authorization, encryption)
- Affects data retention or compliance (GDPR, SOC2, HIPAA)
- Introduces a new external dependency (third-party API, vendor SDK)
- Changes deployment topology (new region, new cloud provider, new network path)
- Has a blast radius affecting more than two teams
What Does NOT Need a Review
- Bug fixes within a single service
- Refactoring that does not change interfaces
- Adding tests or improving documentation
- Upgrading dependencies (unless major version with breaking changes)
- Feature work within established patterns
The Lightweight RFC Process
A Request for Comments (RFC) is a short document that proposes a technical approach and invites feedback before implementation begins.
RFC Template
# RFC: [Title]
**Author**: [Name]
**Date**: [Date]
**Status**: Draft / In Review / Accepted / Rejected
## Problem Statement
What are we solving? [2-3 sentences]
## Proposed Solution
How do we solve it? [Architecture diagram + explanation]
## Alternatives Considered
What else did we consider and why did we reject it?
## Risks and Mitigations
What could go wrong? How do we handle it?
## Data Model Changes
Any schema changes, new tables, new fields?
## API Changes
Any new or modified endpoints?
## Rollout Plan
How will this be deployed? Canary? Feature flag? Big bang?
## Open Questions
What do you need input on?
RFC Review Workflow
- Author writes the RFC in a shared doc or Git PR
- Reviewers are automatically assigned based on affected systems (CODEOWNERS-style)
- Review period: 3-5 business days (not weeks)
- Synchronous review meeting only if async comments are insufficient
- Decision recorded in the document: Accepted, Accepted with modifications, or Rejected with rationale
Scaling RFCs
For organizations with 5+ teams:
- Lightweight RFCs (within one team’s domain): 2-3 reviewers, 3-day review, async only
- Cross-team RFCs (affecting multiple domains): 4-6 reviewers, 5-day review, one sync meeting
- Platform RFCs (affecting all teams): Platform team + stakeholders, 7-day review, dedicated meeting
The Review Meeting
When a synchronous meeting is needed, structure it to be productive:
Before the Meeting
- All reviewers have read the RFC (enforce this — cancel if they have not)
- Author identifies 2-3 specific questions for the group
- Time-boxed to 45 minutes
During the Meeting
0:00 - 0:05 Author summarizes the proposal (NOT reads the doc aloud)
0:05 - 0:10 Clarifying questions only
0:10 - 0:30 Discussion of open questions and concerns
0:30 - 0:40 Decision: go / no-go / modify
0:40 - 0:45 Action items and next steps
The Review Checklist
Use a standard checklist to ensure consistency:
- Failure modes: What happens when this fails? Is there a fallback?
- Scale: Will this work at 10x current load? 100x?
- Data consistency: Are there race conditions? Eventual consistency issues?
- Security: Authentication, authorization, data encryption, input validation
- Observability: Can we detect problems? Are there metrics, logs, alerts?
- Reversibility: Can we roll this back without data loss?
- Dependencies: Are we coupling to services that have different SLAs?
- Cost: What is the infrastructure cost at projected scale?
- Team impact: Does this require knowledge from someone not on the team?
Decision Framework
When reviewers disagree, use a structured framework:
The DACI Model
- Driver: Owns the RFC and final decision (usually the author’s team lead)
- Approver: Has veto power (usually the architect or VP of Engineering)
- Contributors: Provide input and expertise
- Informed: Need to know the outcome but do not participate in the decision
Tie-Breaking Rules
- Prefer reversible decisions over irreversible ones
- Prefer boring technology over novel technology
- When two options are equally good, go with the one that has less operational burden
- If no decision is clearly best, set a time limit and choose — analysis paralysis is worse than a suboptimal choice
Architecture Decision Records
Every review should produce an ADR that captures:
## ADR-023: Use Event Sourcing for Order History
**Status**: Accepted
**Decision Group**: Backend Architects
**Date**: 2026-02-15
### Context
Order history requires audit logging, replay capability, and temporal queries.
### Decision
Implement event sourcing for the order domain using Kafka as the event store.
### Consequences
- (+) Full audit trail for compliance
- (+) Temporal queries ("what was the order state at 3pm?")
- (-) Higher complexity in read-path (CQRS required)
- (-) Team needs training on event-sourcing patterns
ADRs are immutable. If a decision is superseded, a new ADR references the old one:
**Supersedes**: ADR-023
**Reason**: Event sourcing complexity exceeded benefits for our scale
Anti-Patterns
| Anti-Pattern | Impact | Fix |
|---|---|---|
| Reviewing every change | Teams cannot ship | Define trigger criteria, trust teams |
| Ivory tower reviews | Reviewers disconnected from reality | Include implementers in the review |
| No follow-through | Decisions made but not enforced | ADRs in the codebase, automated checks |
| Design by committee | Lowest-common-denominator decisions | Clear DACI roles with one final decision-maker |
| Reviewing after implementation | Too late to change course | Review at design phase, not PR phase |
Architecture reviews are an investment. Done well, they prevent weeks of rework and reduce incidents. Done poorly, they add weeks to every project and teach teams to route around the process. The key is calibrating the review intensity to the risk of the change.