Running Effective Architecture Reviews

Architecture reviews exist to catch expensive mistakes before they become expensive production problems. A missed scaling bottleneck caught in review costs a whiteboard session. The same bottleneck caught in production costs an incident, a rewrite, and three months of rework.

But reviews can also become bureaucratic gates that slow teams to a crawl. The goal is finding the right balance: enough review to catch systemic risks, not so much that every feature needs a committee to approve.

When to Trigger a Review

Not every change needs an architecture review. Reviews should be triggered by risk, not by habit.

Trigger Criteria

A review is warranted when a change:

Introduces a new service or removes an existing one
Changes a public API contract (breaking or non-breaking)
Adds a new data store or significantly changes schema
Crosses a security boundary (authentication, authorization, encryption)
Affects data retention or compliance (GDPR, SOC2, HIPAA)
Introduces a new external dependency (third-party API, vendor SDK)
Changes deployment topology (new region, new cloud provider, new network path)
Has a blast radius affecting more than two teams

What Does NOT Need a Review

Bug fixes within a single service
Refactoring that does not change interfaces
Adding tests or improving documentation
Upgrading dependencies (unless major version with breaking changes)
Feature work within established patterns

The Lightweight RFC Process

A Request for Comments (RFC) is a short document that proposes a technical approach and invites feedback before implementation begins.

RFC Template

# RFC: [Title]

**Author**: [Name]  
**Date**: [Date]  
**Status**: Draft / In Review / Accepted / Rejected  

## Problem Statement
What are we solving? [2-3 sentences]

## Proposed Solution
How do we solve it? [Architecture diagram + explanation]

## Alternatives Considered
What else did we consider and why did we reject it?

## Risks and Mitigations
What could go wrong? How do we handle it?

## Data Model Changes
Any schema changes, new tables, new fields?

## API Changes
Any new or modified endpoints?

## Rollout Plan
How will this be deployed? Canary? Feature flag? Big bang?

## Open Questions
What do you need input on?

RFC Review Workflow

Author writes the RFC in a shared doc or Git PR
Reviewers are automatically assigned based on affected systems (CODEOWNERS-style)
Review period: 3-5 business days (not weeks)
Synchronous review meeting only if async comments are insufficient
Decision recorded in the document: Accepted, Accepted with modifications, or Rejected with rationale

Scaling RFCs

For organizations with 5+ teams:

Lightweight RFCs (within one team’s domain): 2-3 reviewers, 3-day review, async only
Cross-team RFCs (affecting multiple domains): 4-6 reviewers, 5-day review, one sync meeting
Platform RFCs (affecting all teams): Platform team + stakeholders, 7-day review, dedicated meeting

The Review Meeting

When a synchronous meeting is needed, structure it to be productive:

Before the Meeting

All reviewers have read the RFC (enforce this — cancel if they have not)
Author identifies 2-3 specific questions for the group
Time-boxed to 45 minutes

During the Meeting

0:00 - 0:05  Author summarizes the proposal (NOT reads the doc aloud)
0:05 - 0:10  Clarifying questions only
0:10 - 0:30  Discussion of open questions and concerns
0:30 - 0:40  Decision: go / no-go / modify
0:40 - 0:45  Action items and next steps

The Review Checklist

Use a standard checklist to ensure consistency:

Failure modes: What happens when this fails? Is there a fallback?
Scale: Will this work at 10x current load? 100x?
Data consistency: Are there race conditions? Eventual consistency issues?
Security: Authentication, authorization, data encryption, input validation
Observability: Can we detect problems? Are there metrics, logs, alerts?
Reversibility: Can we roll this back without data loss?
Dependencies: Are we coupling to services that have different SLAs?
Cost: What is the infrastructure cost at projected scale?
Team impact: Does this require knowledge from someone not on the team?

Decision Framework

When reviewers disagree, use a structured framework:

The DACI Model

Driver: Owns the RFC and final decision (usually the author’s team lead)
Approver: Has veto power (usually the architect or VP of Engineering)
Contributors: Provide input and expertise
Informed: Need to know the outcome but do not participate in the decision

Tie-Breaking Rules

Prefer reversible decisions over irreversible ones
Prefer boring technology over novel technology
When two options are equally good, go with the one that has less operational burden
If no decision is clearly best, set a time limit and choose — analysis paralysis is worse than a suboptimal choice

Architecture Decision Records

Every review should produce an ADR that captures:

## ADR-023: Use Event Sourcing for Order History

**Status**: Accepted  
**Decision Group**: Backend Architects  
**Date**: 2026-02-15  

### Context
Order history requires audit logging, replay capability, and temporal queries.

### Decision
Implement event sourcing for the order domain using Kafka as the event store.

### Consequences
- (+) Full audit trail for compliance
- (+) Temporal queries ("what was the order state at 3pm?")
- (-) Higher complexity in read-path (CQRS required)
- (-) Team needs training on event-sourcing patterns

ADRs are immutable. If a decision is superseded, a new ADR references the old one:

**Supersedes**: ADR-023  
**Reason**: Event sourcing complexity exceeded benefits for our scale

Anti-Patterns

Anti-Pattern	Impact	Fix
Reviewing every change	Teams cannot ship	Define trigger criteria, trust teams
Ivory tower reviews	Reviewers disconnected from reality	Include implementers in the review
No follow-through	Decisions made but not enforced	ADRs in the codebase, automated checks
Design by committee	Lowest-common-denominator decisions	Clear DACI roles with one final decision-maker
Reviewing after implementation	Too late to change course	Review at design phase, not PR phase

Architecture reviews are an investment. Done well, they prevent weeks of rework and reduce incidents. Done poorly, they add weeks to every project and teach teams to route around the process. The key is calibrating the review intensity to the risk of the change.