CCPA Data Engineering
Implement California Consumer Privacy Act technical requirements. Covers data discovery, consumer request automation, data deletion pipelines, opt-out mechanisms, privacy signals, and the patterns that make CCPA compliance a technical capability.
The California Consumer Privacy Act (CCPA) and its amendment CPRA give California residents rights over their personal information: the right to know what data is collected, the right to delete it, the right to opt out of its sale, and the right to non-discrimination. For engineering teams, this means building systems that can discover, report, delete, and restrict personal data on demand.
Consumer Rights and Technical Requirements
Right to Know (§1798.100):
Consumer asks: "What personal information do you have about me?"
Technical requirement:
☐ Discover all data stores containing PI
☐ Generate complete data inventory per consumer
☐ Deliver within 45 days
☐ Machine-readable format (JSON, CSV)
Right to Delete (§1798.105):
Consumer asks: "Delete my personal information"
Technical requirement:
☐ Delete from all primary data stores
☐ Delete from backups (or flag for exclusion)
☐ Notify service providers to delete
☐ Confirm deletion within 45 days
Right to Opt Out (§1798.120):
Consumer says: "Do not sell/share my personal information"
Technical requirement:
☐ Global Privacy Control (GPC) signal detection
☐ "Do Not Sell" flag in user profile
☐ Propagate opt-out to all downstream systems
☐ Respect within 15 business days
Right to Correct (§1798.106):
Consumer says: "This information about me is wrong"
Technical requirement:
☐ Update in all systems of record
☐ Propagate corrections downstream
Data Discovery
class PIDiscoveryEngine:
"""Find all personal information across data stores."""
def discover_consumer_data(self, consumer_id: str):
"""Search all registered data stores for a consumer's PI."""
results = {}
for store in self.data_store_registry:
try:
records = store.search(
identifiers=[
("email", consumer_id),
("user_id", consumer_id),
("phone", consumer_id),
]
)
if records:
results[store.name] = {
"categories": self.classify_pi(records),
"record_count": len(records),
"sources": store.collection_sources,
"purposes": store.processing_purposes,
}
except Exception as e:
self.audit_log.log_error(
f"Discovery failed for {store.name}: {e}"
)
return ConsumerDataReport(
consumer_id=consumer_id,
data_stores=results,
generated_at=datetime.utcnow(),
)
def classify_pi(self, records):
"""Classify personal information by CCPA categories."""
categories = set()
for record in records:
for field, value in record.items():
if field in ("name", "address", "email", "phone"):
categories.add("Identifiers")
elif field in ("ip_address", "device_id", "cookies"):
categories.add("Internet Activity")
elif field in ("purchase_history", "products_viewed"):
categories.add("Commercial Information")
elif field in ("location", "gps"):
categories.add("Geolocation Data")
return list(categories)
Deletion Pipeline
class DeletionPipeline:
"""Execute consumer data deletion requests."""
def execute_deletion(self, consumer_id: str, request_id: str):
results = []
for store in self.data_store_registry:
try:
deleted = store.delete_consumer_data(consumer_id)
results.append({
"store": store.name,
"status": "deleted",
"records_deleted": deleted,
})
except DeletionExemptionError as e:
# Some data exempt from deletion (fraud prevention, legal hold)
results.append({
"store": store.name,
"status": "exempt",
"reason": str(e),
})
# Notify service providers
for provider in self.service_providers:
provider.request_deletion(consumer_id)
self.audit_log.log_deletion(request_id, consumer_id, results)
return results
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Manual data discovery | Cannot respond within 45 days | Automated PI discovery across all stores |
| Deletion from primary only | PI persists in caches, logs, backups | Deletion pipeline covers all data stores |
| No GPC signal handling | California AG enforcement action | Detect and honor Global Privacy Control |
| No audit trail | Cannot prove compliance | Log every request, response, and timeline |
| Ignore service providers | PI still held by third parties | Automated downstream deletion notifications |
CCPA compliance is a data engineering problem. The companies that build privacy infrastructure — discovery, deletion pipelines, consent management — handle consumer requests in hours, not weeks.