Graph Databases: Neo4j Use Cases & Architecture
When and how to use graph databases in enterprise applications. Covers Neo4j fundamentals, Cypher queries, data modeling, performance tuning, and real-world use cases from fraud detection to recommendation engines.
Graph databases model data as nodes and relationships — not tables and rows. When your queries care about how entities connect to each other, graph databases outperform relational databases by orders of magnitude. A 12-hop relationship traversal that takes 30 minutes in PostgreSQL completes in milliseconds in Neo4j.
The fundamental insight: relational databases store data and infer relationships at query time (JOINs). Graph databases store relationships as first-class citizens, making traversal operations O(1) per hop instead of O(n) per JOIN.
When to Use a Graph Database
| Use Case | Why Graph Wins | Complexity |
|---|---|---|
| Social networks | ”Friends of friends who like X” — multi-hop traversal | Medium |
| Fraud detection | Ring detection, suspicious transaction patterns | High |
| Recommendation engines | ”Users who bought X also bought Y” — collaborative filtering | Medium |
| Knowledge graphs | Entity relationships, semantic search | High |
| Network/IT infrastructure | Dependency mapping, impact analysis | Medium |
| Supply chain | Track goods across multi-tier supplier networks | Medium |
| Authorization | Role hierarchies, permission inheritance | Low-Medium |
The Test: When SQL Gets Painful
If your SQL has self-joins, recursive CTEs, or you’re implementing a BFS/DFS algorithm in application code, a graph database is likely a better fit.
-- SQL: Find all friends-of-friends (painful)
SELECT DISTINCT f2.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = 'alice'
AND f2.friend_id != 'alice'
AND f2.friend_id NOT IN (
SELECT friend_id FROM friendships WHERE user_id = 'alice'
);
-- 3 hops? Add another JOIN. 4 hops? Another. N hops? Recursive CTE.
-- Performance degrades exponentially with depth.
Performance Comparison at Depth
| Query Depth | SQL (PostgreSQL) | Graph (Neo4j) |
|---|---|---|
| 1 hop | <10ms | <5ms |
| 2 hops | 10-50ms | <5ms |
| 3 hops | 100-500ms | <10ms |
| 4 hops | 1-10 seconds | <10ms |
| 6 hops | 30+ seconds (may timeout) | <50ms |
| 12 hops | Not feasible | <100ms |
Neo4j Fundamentals
Data Model
Nodes: (Person {name: "Alice", age: 30})
(Company {name: "Acme Corp"})
Relationships: (Alice)-[:WORKS_AT {since: 2022}]->(Acme)
(Alice)-[:FRIENDS_WITH]->(Bob)
Labels: Person, Company, Project (like table names)
Properties: Key-value pairs on nodes and relationships
Cypher Query Language
// Create nodes and relationships
CREATE (alice:Person {name: "Alice", role: "Engineer"})
CREATE (bob:Person {name: "Bob", role: "Manager"})
CREATE (acme:Company {name: "Acme Corp", industry: "Tech"})
CREATE (alice)-[:WORKS_AT {since: 2022, team: "Platform"}]->(acme)
CREATE (bob)-[:WORKS_AT {since: 2020, team: "Platform"}]->(acme)
CREATE (alice)-[:REPORTS_TO]->(bob)
// Query: Find Alice's coworkers
MATCH (alice:Person {name: "Alice"})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(coworker)
RETURN coworker.name, company.name
// Query: Friends of friends (2 hops — trivial in Cypher)
MATCH (me:Person {name: "Alice"})-[:FRIENDS_WITH*2]-(fof:Person)
WHERE fof <> me
RETURN DISTINCT fof.name
// Query: Shortest path between two people
MATCH path = shortestPath(
(a:Person {name: "Alice"})-[:FRIENDS_WITH*..6]-(b:Person {name: "Zara"})
)
RETURN path, length(path)
Real-World Use Cases
Fraud Detection
Detect circular payment patterns and suspicious clusters:
// Find circular money flows (potential money laundering)
MATCH path = (a:Account)-[:TRANSFERRED_TO*3..6]->(a)
WHERE ALL(t IN relationships(path) WHERE t.amount > 10000)
RETURN path,
reduce(total = 0, t IN relationships(path) | total + t.amount) AS total_flow
// Find accounts with unusually dense connections
MATCH (a:Account)-[t:TRANSFERRED_TO]->(b:Account)
WITH a, COUNT(DISTINCT b) AS connections, SUM(t.amount) AS total
WHERE connections > 50 AND total > 1000000
RETURN a.id, connections, total
ORDER BY connections DESC
Recommendation Engine
// Collaborative filtering: "People who bought X also bought..."
MATCH (user:Customer {id: "cust_123"})-[:PURCHASED]->(product:Product)
<-[:PURCHASED]-(other:Customer)-[:PURCHASED]->(rec:Product)
WHERE NOT (user)-[:PURCHASED]->(rec)
RETURN rec.name, COUNT(other) AS score
ORDER BY score DESC
LIMIT 10
// Content-based: Similar products by shared attributes
MATCH (p:Product {id: "prod_456"})-[:IN_CATEGORY]->(cat)<-[:IN_CATEGORY]-(similar)
WHERE similar <> p
OPTIONAL MATCH (similar)<-[r:PURCHASED]-()
RETURN similar.name, similar.price, COUNT(r) AS popularity
ORDER BY popularity DESC
LIMIT 10
Access Control / Authorization
// Does Alice have access to this document through any permission path?
MATCH path = (user:User {name: "Alice"})-[:MEMBER_OF|:HAS_ROLE|:INHERITS*1..5]->
(perm)-[:GRANTS_ACCESS]->(doc:Document {id: "doc_789"})
RETURN COUNT(path) > 0 AS has_access
// What can Alice access? (full permission tree)
MATCH (user:User {name: "Alice"})-[:MEMBER_OF|:HAS_ROLE|:INHERITS*1..5]->
(perm)-[:GRANTS_ACCESS]->(resource)
RETURN DISTINCT resource.name, resource.type, labels(resource)
Data Modeling Best Practices
1. Relationships Are First-Class Citizens
// ❌ Don't store relationships as properties
CREATE (a:Person {friends: ["bob", "charlie"]})
// ✅ Model them as graph relationships
CREATE (a)-[:FRIENDS_WITH {since: 2023}]->(b)
CREATE (a)-[:FRIENDS_WITH {since: 2024}]->(c)
2. Choose Between Embedding and Connecting
| Approach | Use When | Example |
|---|---|---|
| Embed (properties) | Few values, rarely queried independently | skills: ["Python", "Go"] |
| Connect (nodes) | Many values, shared across entities, queried independently | (Person)-[:HAS_SKILL]->(Skill) |
// Embed: few values, rarely queried independently
CREATE (p:Person {name: "Alice", skills: ["Python", "Go", "SQL"]})
// Connect: many values, queried independently, shared across nodes
CREATE (p:Person {name: "Alice"})
CREATE (s:Skill {name: "Python"})
CREATE (p)-[:HAS_SKILL {level: "expert", years: 8}]->(s)
3. Use Relationship Types Liberally
// ❌ Generic relationship with type property
(a)-[:INTERACTED {type: "purchased", date: "2026-01-15"}]->(b)
// ✅ Specific relationship types (faster traversal)
(a)-[:PURCHASED {date: "2026-01-15"}]->(b)
(a)-[:VIEWED {date: "2026-01-14"}]->(b)
(a)-[:WISHLISTED {date: "2026-01-10"}]->(b)
Performance Optimization
Indexes
// Unique constraint + index
CREATE CONSTRAINT FOR (p:Person) REQUIRE p.id IS UNIQUE;
// Composite index for lookup patterns
CREATE INDEX FOR (p:Product) ON (p.category, p.status);
// Full-text search index
CREATE FULLTEXT INDEX productSearch FOR (p:Product) ON EACH [p.name, p.description];
CALL db.index.fulltext.queryNodes("productSearch", "wireless headphones")
YIELD node, score
RETURN node.name, score LIMIT 10;
Query Optimization Rules
| Rule | Bad | Good |
|---|---|---|
| Bound depth | MATCH (a)-[*]->(b) | MATCH (a)-[*1..5]->(b) |
| Avoid cartesian products | MATCH (a:Person), (b:Product) | MATCH (a:Person)-[:PURCHASED]->(b:Product) |
| Use PROFILE | Guess at performance | PROFILE MATCH ... RETURN ... |
| Filter early | Filter after collecting all data | Use WHERE in the MATCH clause |
| Use parameters | Inline values (no plan caching) | $userId parameter (plan caching) |
Neo4j vs Alternatives
| Feature | Neo4j | Amazon Neptune | ArangoDB | JanusGraph |
|---|---|---|---|---|
| Query language | Cypher | Gremlin/SPARQL | AQL | Gremlin |
| Hosting | Self-hosted + Aura (cloud) | AWS managed | Self-hosted + cloud | Self-hosted |
| ACID | Full | Full | Full | Eventual (configurable) |
| Scalability | Fabric (sharding) | Auto-scaling | Native sharding | Distributed |
| Visualization | Neo4j Browser (excellent) | Limited | Built-in | Third-party |
| Ecosystem | Largest, most mature | AWS integrated | Multi-model | Open-source |
| Best for | Most use cases, richest community | AWS-native shops | Multi-model needs | JVM/Hadoop ecosystems |
When NOT to Use a Graph Database
| Scenario | Better Alternative |
|---|---|
| Simple CRUD with no relationship queries | PostgreSQL, MySQL |
| Tabular analytics and aggregations | Data warehouse (Snowflake, BigQuery) |
| High-volume writes with minimal reads | Time-series DB (TimescaleDB), append-only store |
| Your “graph” is really just a tree | Nested sets or materialized paths in SQL |
| Full-text search as primary use case | Elasticsearch |
| Key-value lookups only | Redis, DynamoDB |
Graph Database Market Comparison
| Database | License | Hosting | Query Language | Best For |
|---|---|---|---|---|
| Neo4j | Community (free) / Enterprise | Self-hosted, AuraDB (cloud) | Cypher | General-purpose graph, knowledge graphs |
| Amazon Neptune | Proprietary | AWS only | Gremlin, SPARQL | AWS-native, RDF support |
| ArangoDB | Apache 2.0 | Self-hosted, ArangoDB Cloud | AQL | Multi-model (graph + document + key-value) |
| TigerGraph | Community / Enterprise | Self-hosted, TigerGraph Cloud | GSQL | Large-scale analytics, deep link traversal |
| Dgraph | Apache 2.0 | Self-hosted, Dgraph Cloud | DQL (GraphQL-like) | GraphQL-native applications |
When to Start with a Graph Database
Start with a graph database from day one when:
- Your core value proposition depends on relationships (social networks, recommendation engines, fraud detection)
- You need real-time traversals at depth 3+ (friends-of-friends, supply chain tracing)
- Your schema evolves frequently and relationships are first-class entities
- You are building a knowledge graph that connects disparate data sources
Getting Started Checklist
- Identified the core entities (nodes) and their relationships
- Drawn the data model as a whiteboard graph
- Validated that traversal queries are a primary access pattern
- Compared performance at depth: SQL vs graph on your data
- Chosen between Neo4j Aura (managed) and self-hosted
- Created unique constraints on primary identifiers
- Bounded all variable-length path queries
- Used specific relationship types (not generic with type property)
- Populated test data and benchmarked critical queries with PROFILE
- Decided embed vs connect for each property
- Set up APOC library for advanced procedures
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database architecture consulting, visit garnetgrid.com. :::