GDPR for Engineers: Building Privacy-Compliant Systems

GDPR is not a legal problem you hand to the legal team and forget about. It is an engineering constraint that affects how you design databases, APIs, logging systems, and data pipelines. Every INSERT INTO users statement, every log line that contains an email address, every analytics event that includes a user ID — these are all privacy decisions that engineers make dozens of times a day, usually without realizing it.

This guide covers the engineering patterns that make GDPR compliance a natural part of your architecture rather than a painful retrofit.

The 8 Principles That Affect Your Architecture

GDPR Principle	What It Means for Engineering
Lawful basis	Every piece of personal data needs a legal reason to exist in your system
Purpose limitation	Data collected for login cannot be used for marketing without separate consent
Data minimization	Do not collect data you do not need. Do not keep data longer than necessary
Accuracy	Users can correct their data. Your system must support updates
Storage limitation	Data has a shelf life. Auto-delete when no longer needed
Integrity & confidentiality	Encrypt at rest and in transit. Access controls everywhere
Accountability	You must prove compliance, not just claim it. Audit trails required
Rights of data subjects	Users can access, export, correct, and delete their data

Data Classification: Know What You Have

Before you can protect personal data, you need to know where it lives. This is harder than it sounds because personal data spreads through systems like water finding cracks.

┌─────────────────────────────────────────┐
│  PERSONAL DATA CATEGORIES               │
├─────────────────────────────────────────┤
│  Directly Identifying:                  │
│    Name, Email, Phone, Address          │
│    → Always encrypted at rest           │
│    → Never in logs                      │
│    → Retention: legal minimum only      │
├─────────────────────────────────────────┤
│  Indirectly Identifying:               │
│    User ID, IP Address, Device ID       │
│    → Pseudonymized where possible       │
│    → Hashed in analytics                │
│    → Retention: purpose-dependent       │
├─────────────────────────────────────────┤
│  Sensitive (Special Category):          │
│    Health, Religion, Ethnicity,         │
│    Biometrics, Political opinions       │
│    → Explicit consent required          │
│    → Encrypted with separate keys       │
│    → Strict access controls + audit log │
├─────────────────────────────────────────┤
│  Non-Personal:                          │
│    Aggregated metrics, anonymized data   │
│    → No GDPR restrictions               │
│    → Still good practice to protect     │
└─────────────────────────────────────────┘

Data Inventory Template

Data Field	Category	Where Stored	Legal Basis	Retention	Deletion Method
Email	Direct PII	users table, email service, logs	Contract	Account lifetime + 30 days	Hard delete + provider API
IP Address	Indirect PII	access logs, CDN logs	Legitimate interest	90 days	Auto-purge cron
Name	Direct PII	users table, billing	Contract	Account lifetime	Hard delete
Analytics events	Indirect PII	analytics DB	Consent	26 months	Automated TTL

Right to Erasure: “Delete My Data”

This is the most technically challenging GDPR requirement. A user requests deletion, and you must remove their personal data from everywhere — primary databases, backups, logs, analytics, third-party services, caches.

Erasure Architecture

class DataErasureService:
    """Orchestrate user data deletion across all systems."""

    def __init__(self):
        self.handlers = [
            PrimaryDatabaseHandler(),
            AnalyticsDatabaseHandler(),
            LogPurgeHandler(),
            ThirdPartyServiceHandler(),   # Stripe, Sendgrid, etc.
            SearchIndexHandler(),
            CacheInvalidationHandler(),
            BackupRedactionHandler(),
        ]

    async def erase_user(self, user_id: str, request_id: str) -> ErasureReport:
        report = ErasureReport(user_id=user_id, request_id=request_id)

        for handler in self.handlers:
            try:
                result = await handler.erase(user_id)
                report.add_success(handler.name, result)
            except Exception as e:
                report.add_failure(handler.name, str(e))
                # Do not stop — continue with other handlers
                # Failed handlers will retry

        # Audit trail (must NOT contain personal data)
        await self.audit_log.record(
            event="data_erasure",
            request_id=request_id,
            user_id_hash=hash(user_id),   # Hashed, not the real ID
            results=report.summary(),
            timestamp=datetime.utcnow(),
        )

        return report

Soft Delete vs Hard Delete

Approach	GDPR Compliant?	Use When
Soft delete (set `deleted=true`)	❌ No — data still exists	Only as a temporary step before hard delete
Hard delete (remove from DB)	✅ Yes	Primary databases
Anonymize (replace with dummy data)	✅ Yes	When you need to keep the record structure (e.g., for order history)
Crypto shredding (delete encryption key)	✅ Yes	When data is encrypted per-user

-- Anonymization example: keep order records, remove personal data
UPDATE orders
SET
  customer_name = 'REDACTED',
  customer_email = 'deleted-' || id || '@redacted.local',
  shipping_address = 'REDACTED',
  phone = NULL,
  anonymized_at = NOW()
WHERE customer_id = $1;

-- Then delete the user record entirely
DELETE FROM users WHERE id = $1;

class ConsentManager:
    """Track user consent with full audit trail."""

    PURPOSES = {
        'essential': 'Required for service operation (no consent needed)',
        'analytics': 'Anonymous usage analytics',
        'marketing': 'Email marketing and promotions',
        'personalization': 'Personalized content recommendations',
        'third_party': 'Data sharing with partners',
    }

    async def record_consent(self, user_id: str, purpose: str,
                              granted: bool, source: str):
        await self.db.insert('consent_records', {
            'user_id': user_id,
            'purpose': purpose,
            'granted': granted,
            'source': source,           # 'signup_form', 'settings_page', etc.
            'ip_address': None,         # Do NOT store IP with consent
            'timestamp': datetime.utcnow(),
            'version': self.current_policy_version,
        })

    async def check_consent(self, user_id: str, purpose: str) -> bool:
        # Get most recent consent record for this purpose
        record = await self.db.query(
            'SELECT granted FROM consent_records '
            'WHERE user_id = $1 AND purpose = $2 '
            'ORDER BY timestamp DESC LIMIT 1',
            user_id, purpose
        )
        return record and record['granted']

Privacy by Design: Patterns That Work

Pattern	Implementation	Example
Data minimization	Collect only what you need	Do not ask for birthdate if you do not need it
Pseudonymization	Replace identifiers with tokens	Use opaque user IDs in analytics, never emails
Encryption at rest	Encrypt PII columns or entire tables	AES-256 for PII fields, per-user keys for crypto shredding
Purpose binding	Tag data with its purpose at collection time	`purpose='authentication'` on email, cannot use for marketing
Automatic expiry	TTL on data that should not live forever	Log entries expire after 90 days
Access controls	Limit who can see what personal data	PII accessible only to specific service accounts

The 8 Principles That Affect Your Architecture

Data Classification: Know What You Have

Data Inventory Template

Right to Erasure: “Delete My Data”

Erasure Architecture

Soft Delete vs Hard Delete

Consent Management

Privacy by Design: Patterns That Work

Implementation Checklist

More in Compliance & Governance

Automated Compliance Scanning

Cloud Compliance Continuous Monitoring Architecture

CCPA Data Engineering