GDPR for Engineers: Building Privacy-Compliant Systems
Implement GDPR compliance as an engineering practice, not a legal checkbox. Covers data minimization, consent management, right to erasure, data portability, privacy by design patterns, and the technical architecture that makes compliance maintainable.
GDPR is not a legal problem you hand to the legal team and forget about. It is an engineering constraint that affects how you design databases, APIs, logging systems, and data pipelines. Every INSERT INTO users statement, every log line that contains an email address, every analytics event that includes a user ID — these are all privacy decisions that engineers make dozens of times a day, usually without realizing it.
This guide covers the engineering patterns that make GDPR compliance a natural part of your architecture rather than a painful retrofit.
The 8 Principles That Affect Your Architecture
| GDPR Principle | What It Means for Engineering |
|---|---|
| Lawful basis | Every piece of personal data needs a legal reason to exist in your system |
| Purpose limitation | Data collected for login cannot be used for marketing without separate consent |
| Data minimization | Do not collect data you do not need. Do not keep data longer than necessary |
| Accuracy | Users can correct their data. Your system must support updates |
| Storage limitation | Data has a shelf life. Auto-delete when no longer needed |
| Integrity & confidentiality | Encrypt at rest and in transit. Access controls everywhere |
| Accountability | You must prove compliance, not just claim it. Audit trails required |
| Rights of data subjects | Users can access, export, correct, and delete their data |
Data Classification: Know What You Have
Before you can protect personal data, you need to know where it lives. This is harder than it sounds because personal data spreads through systems like water finding cracks.
┌─────────────────────────────────────────┐
│ PERSONAL DATA CATEGORIES │
├─────────────────────────────────────────┤
│ Directly Identifying: │
│ Name, Email, Phone, Address │
│ → Always encrypted at rest │
│ → Never in logs │
│ → Retention: legal minimum only │
├─────────────────────────────────────────┤
│ Indirectly Identifying: │
│ User ID, IP Address, Device ID │
│ → Pseudonymized where possible │
│ → Hashed in analytics │
│ → Retention: purpose-dependent │
├─────────────────────────────────────────┤
│ Sensitive (Special Category): │
│ Health, Religion, Ethnicity, │
│ Biometrics, Political opinions │
│ → Explicit consent required │
│ → Encrypted with separate keys │
│ → Strict access controls + audit log │
├─────────────────────────────────────────┤
│ Non-Personal: │
│ Aggregated metrics, anonymized data │
│ → No GDPR restrictions │
│ → Still good practice to protect │
└─────────────────────────────────────────┘
Data Inventory Template
| Data Field | Category | Where Stored | Legal Basis | Retention | Deletion Method |
|---|---|---|---|---|---|
| Direct PII | users table, email service, logs | Contract | Account lifetime + 30 days | Hard delete + provider API | |
| IP Address | Indirect PII | access logs, CDN logs | Legitimate interest | 90 days | Auto-purge cron |
| Name | Direct PII | users table, billing | Contract | Account lifetime | Hard delete |
| Analytics events | Indirect PII | analytics DB | Consent | 26 months | Automated TTL |
Right to Erasure: “Delete My Data”
This is the most technically challenging GDPR requirement. A user requests deletion, and you must remove their personal data from everywhere — primary databases, backups, logs, analytics, third-party services, caches.
Erasure Architecture
class DataErasureService:
"""Orchestrate user data deletion across all systems."""
def __init__(self):
self.handlers = [
PrimaryDatabaseHandler(),
AnalyticsDatabaseHandler(),
LogPurgeHandler(),
ThirdPartyServiceHandler(), # Stripe, Sendgrid, etc.
SearchIndexHandler(),
CacheInvalidationHandler(),
BackupRedactionHandler(),
]
async def erase_user(self, user_id: str, request_id: str) -> ErasureReport:
report = ErasureReport(user_id=user_id, request_id=request_id)
for handler in self.handlers:
try:
result = await handler.erase(user_id)
report.add_success(handler.name, result)
except Exception as e:
report.add_failure(handler.name, str(e))
# Do not stop — continue with other handlers
# Failed handlers will retry
# Audit trail (must NOT contain personal data)
await self.audit_log.record(
event="data_erasure",
request_id=request_id,
user_id_hash=hash(user_id), # Hashed, not the real ID
results=report.summary(),
timestamp=datetime.utcnow(),
)
return report
Soft Delete vs Hard Delete
| Approach | GDPR Compliant? | Use When |
|---|---|---|
Soft delete (set deleted=true) | ❌ No — data still exists | Only as a temporary step before hard delete |
| Hard delete (remove from DB) | ✅ Yes | Primary databases |
| Anonymize (replace with dummy data) | ✅ Yes | When you need to keep the record structure (e.g., for order history) |
| Crypto shredding (delete encryption key) | ✅ Yes | When data is encrypted per-user |
-- Anonymization example: keep order records, remove personal data
UPDATE orders
SET
customer_name = 'REDACTED',
customer_email = 'deleted-' || id || '@redacted.local',
shipping_address = 'REDACTED',
phone = NULL,
anonymized_at = NOW()
WHERE customer_id = $1;
-- Then delete the user record entirely
DELETE FROM users WHERE id = $1;
Consent Management
class ConsentManager:
"""Track user consent with full audit trail."""
PURPOSES = {
'essential': 'Required for service operation (no consent needed)',
'analytics': 'Anonymous usage analytics',
'marketing': 'Email marketing and promotions',
'personalization': 'Personalized content recommendations',
'third_party': 'Data sharing with partners',
}
async def record_consent(self, user_id: str, purpose: str,
granted: bool, source: str):
await self.db.insert('consent_records', {
'user_id': user_id,
'purpose': purpose,
'granted': granted,
'source': source, # 'signup_form', 'settings_page', etc.
'ip_address': None, # Do NOT store IP with consent
'timestamp': datetime.utcnow(),
'version': self.current_policy_version,
})
async def check_consent(self, user_id: str, purpose: str) -> bool:
# Get most recent consent record for this purpose
record = await self.db.query(
'SELECT granted FROM consent_records '
'WHERE user_id = $1 AND purpose = $2 '
'ORDER BY timestamp DESC LIMIT 1',
user_id, purpose
)
return record and record['granted']
Privacy by Design: Patterns That Work
| Pattern | Implementation | Example |
|---|---|---|
| Data minimization | Collect only what you need | Do not ask for birthdate if you do not need it |
| Pseudonymization | Replace identifiers with tokens | Use opaque user IDs in analytics, never emails |
| Encryption at rest | Encrypt PII columns or entire tables | AES-256 for PII fields, per-user keys for crypto shredding |
| Purpose binding | Tag data with its purpose at collection time | purpose='authentication' on email, cannot use for marketing |
| Automatic expiry | TTL on data that should not live forever | Log entries expire after 90 days |
| Access controls | Limit who can see what personal data | PII accessible only to specific service accounts |
Implementation Checklist
- Create a data inventory: list every personal data field, where it is stored, and its legal basis
- Implement data retention policies: auto-delete data past its purpose
- Build the erasure pipeline: orchestrate deletion across all systems (DB, logs, third parties)
- Never log PII directly — use user IDs in logs, never emails or names
- Implement consent management with full audit trail
- Encrypt PII at rest (column-level or table-level encryption)
- Add data classification to your schema documentation
- Build data export API for right-to-portability (JSON format)
- Test the erasure pipeline monthly: create test user, request deletion, verify complete removal
- Train engineers on data classification: what is PII, what requires consent, what must auto-expire