Graph Databases: Neo4j Use Cases & Architecture

Graph databases model data as nodes and relationships — not tables and rows. When your queries care about how entities connect to each other, graph databases outperform relational databases by orders of magnitude. A 12-hop relationship traversal that takes 30 minutes in PostgreSQL completes in milliseconds in Neo4j.

The fundamental insight: relational databases store data and infer relationships at query time (JOINs). Graph databases store relationships as first-class citizens, making traversal operations O(1) per hop instead of O(n) per JOIN.

When to Use a Graph Database

Use Case	Why Graph Wins	Complexity
Social networks	”Friends of friends who like X” — multi-hop traversal	Medium
Fraud detection	Ring detection, suspicious transaction patterns	High
Recommendation engines	”Users who bought X also bought Y” — collaborative filtering	Medium
Knowledge graphs	Entity relationships, semantic search	High
Network/IT infrastructure	Dependency mapping, impact analysis	Medium
Supply chain	Track goods across multi-tier supplier networks	Medium
Authorization	Role hierarchies, permission inheritance	Low-Medium

The Test: When SQL Gets Painful

If your SQL has self-joins, recursive CTEs, or you’re implementing a BFS/DFS algorithm in application code, a graph database is likely a better fit.

-- SQL: Find all friends-of-friends (painful)
SELECT DISTINCT f2.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = 'alice'
  AND f2.friend_id != 'alice'
  AND f2.friend_id NOT IN (
    SELECT friend_id FROM friendships WHERE user_id = 'alice'
  );

-- 3 hops? Add another JOIN. 4 hops? Another. N hops? Recursive CTE.
-- Performance degrades exponentially with depth.

Performance Comparison at Depth

Query Depth	SQL (PostgreSQL)	Graph (Neo4j)
1 hop	<10ms	<5ms
2 hops	10-50ms	<5ms
3 hops	100-500ms	<10ms
4 hops	1-10 seconds	<10ms
6 hops	30+ seconds (may timeout)	<50ms
12 hops	Not feasible	<100ms

Neo4j Fundamentals

Data Model

Nodes:       (Person {name: "Alice", age: 30})
             (Company {name: "Acme Corp"})

Relationships: (Alice)-[:WORKS_AT {since: 2022}]->(Acme)
               (Alice)-[:FRIENDS_WITH]->(Bob)

Labels:      Person, Company, Project (like table names)
Properties:  Key-value pairs on nodes and relationships

Cypher Query Language

// Create nodes and relationships
CREATE (alice:Person {name: "Alice", role: "Engineer"})
CREATE (bob:Person {name: "Bob", role: "Manager"})
CREATE (acme:Company {name: "Acme Corp", industry: "Tech"})
CREATE (alice)-[:WORKS_AT {since: 2022, team: "Platform"}]->(acme)
CREATE (bob)-[:WORKS_AT {since: 2020, team: "Platform"}]->(acme)
CREATE (alice)-[:REPORTS_TO]->(bob)

// Query: Find Alice's coworkers
MATCH (alice:Person {name: "Alice"})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(coworker)
RETURN coworker.name, company.name

// Query: Friends of friends (2 hops — trivial in Cypher)
MATCH (me:Person {name: "Alice"})-[:FRIENDS_WITH*2]-(fof:Person)
WHERE fof <> me
RETURN DISTINCT fof.name

// Query: Shortest path between two people
MATCH path = shortestPath(
  (a:Person {name: "Alice"})-[:FRIENDS_WITH*..6]-(b:Person {name: "Zara"})
)
RETURN path, length(path)

Real-World Use Cases

Fraud Detection

Detect circular payment patterns and suspicious clusters:

// Find circular money flows (potential money laundering)
MATCH path = (a:Account)-[:TRANSFERRED_TO*3..6]->(a)
WHERE ALL(t IN relationships(path) WHERE t.amount > 10000)
RETURN path, 
       reduce(total = 0, t IN relationships(path) | total + t.amount) AS total_flow

// Find accounts with unusually dense connections
MATCH (a:Account)-[t:TRANSFERRED_TO]->(b:Account)
WITH a, COUNT(DISTINCT b) AS connections, SUM(t.amount) AS total
WHERE connections > 50 AND total > 1000000
RETURN a.id, connections, total
ORDER BY connections DESC

Recommendation Engine

// Collaborative filtering: "People who bought X also bought..."
MATCH (user:Customer {id: "cust_123"})-[:PURCHASED]->(product:Product)
      <-[:PURCHASED]-(other:Customer)-[:PURCHASED]->(rec:Product)
WHERE NOT (user)-[:PURCHASED]->(rec)
RETURN rec.name, COUNT(other) AS score
ORDER BY score DESC
LIMIT 10

// Content-based: Similar products by shared attributes
MATCH (p:Product {id: "prod_456"})-[:IN_CATEGORY]->(cat)<-[:IN_CATEGORY]-(similar)
WHERE similar <> p
OPTIONAL MATCH (similar)<-[r:PURCHASED]-()
RETURN similar.name, similar.price, COUNT(r) AS popularity
ORDER BY popularity DESC
LIMIT 10

Access Control / Authorization

// Does Alice have access to this document through any permission path?
MATCH path = (user:User {name: "Alice"})-[:MEMBER_OF|:HAS_ROLE|:INHERITS*1..5]->
             (perm)-[:GRANTS_ACCESS]->(doc:Document {id: "doc_789"})
RETURN COUNT(path) > 0 AS has_access

// What can Alice access? (full permission tree)
MATCH (user:User {name: "Alice"})-[:MEMBER_OF|:HAS_ROLE|:INHERITS*1..5]->
      (perm)-[:GRANTS_ACCESS]->(resource)
RETURN DISTINCT resource.name, resource.type, labels(resource)

Data Modeling Best Practices

1. Relationships Are First-Class Citizens

// ❌ Don't store relationships as properties
CREATE (a:Person {friends: ["bob", "charlie"]})

// ✅ Model them as graph relationships
CREATE (a)-[:FRIENDS_WITH {since: 2023}]->(b)
CREATE (a)-[:FRIENDS_WITH {since: 2024}]->(c)

2. Choose Between Embedding and Connecting

Approach	Use When	Example
Embed (properties)	Few values, rarely queried independently	`skills: ["Python", "Go"]`
Connect (nodes)	Many values, shared across entities, queried independently	`(Person)-[:HAS_SKILL]->(Skill)`

// Embed: few values, rarely queried independently
CREATE (p:Person {name: "Alice", skills: ["Python", "Go", "SQL"]})

// Connect: many values, queried independently, shared across nodes
CREATE (p:Person {name: "Alice"})
CREATE (s:Skill {name: "Python"})
CREATE (p)-[:HAS_SKILL {level: "expert", years: 8}]->(s)

3. Use Relationship Types Liberally

// ❌ Generic relationship with type property
(a)-[:INTERACTED {type: "purchased", date: "2026-01-15"}]->(b)

// ✅ Specific relationship types (faster traversal)
(a)-[:PURCHASED {date: "2026-01-15"}]->(b)
(a)-[:VIEWED {date: "2026-01-14"}]->(b)
(a)-[:WISHLISTED {date: "2026-01-10"}]->(b)

Performance Optimization

Indexes

// Unique constraint + index
CREATE CONSTRAINT FOR (p:Person) REQUIRE p.id IS UNIQUE;

// Composite index for lookup patterns
CREATE INDEX FOR (p:Product) ON (p.category, p.status);

// Full-text search index
CREATE FULLTEXT INDEX productSearch FOR (p:Product) ON EACH [p.name, p.description];
CALL db.index.fulltext.queryNodes("productSearch", "wireless headphones") 
YIELD node, score
RETURN node.name, score LIMIT 10;

Query Optimization Rules

Rule	Bad	Good
Bound depth	`MATCH (a)-[*]->(b)`	`MATCH (a)-[*1..5]->(b)`
Avoid cartesian products	`MATCH (a:Person), (b:Product)`	`MATCH (a:Person)-[:PURCHASED]->(b:Product)`
Use PROFILE	Guess at performance	`PROFILE MATCH ... RETURN ...`
Filter early	Filter after collecting all data	Use `WHERE` in the `MATCH` clause
Use parameters	Inline values (no plan caching)	`$userId` parameter (plan caching)

Neo4j vs Alternatives

Feature	Neo4j	Amazon Neptune	ArangoDB	JanusGraph
Query language	Cypher	Gremlin/SPARQL	AQL	Gremlin
Hosting	Self-hosted + Aura (cloud)	AWS managed	Self-hosted + cloud	Self-hosted
ACID	Full	Full	Full	Eventual (configurable)
Scalability	Fabric (sharding)	Auto-scaling	Native sharding	Distributed
Visualization	Neo4j Browser (excellent)	Limited	Built-in	Third-party
Ecosystem	Largest, most mature	AWS integrated	Multi-model	Open-source
Best for	Most use cases, richest community	AWS-native shops	Multi-model needs	JVM/Hadoop ecosystems

When NOT to Use a Graph Database

Scenario	Better Alternative
Simple CRUD with no relationship queries	PostgreSQL, MySQL
Tabular analytics and aggregations	Data warehouse (Snowflake, BigQuery)
High-volume writes with minimal reads	Time-series DB (TimescaleDB), append-only store
Your “graph” is really just a tree	Nested sets or materialized paths in SQL
Full-text search as primary use case	Elasticsearch
Key-value lookups only	Redis, DynamoDB

Graph Database Market Comparison

Database	License	Hosting	Query Language	Best For
Neo4j	Community (free) / Enterprise	Self-hosted, AuraDB (cloud)	Cypher	General-purpose graph, knowledge graphs
Amazon Neptune	Proprietary	AWS only	Gremlin, SPARQL	AWS-native, RDF support
ArangoDB	Apache 2.0	Self-hosted, ArangoDB Cloud	AQL	Multi-model (graph + document + key-value)
TigerGraph	Community / Enterprise	Self-hosted, TigerGraph Cloud	GSQL	Large-scale analytics, deep link traversal
Dgraph	Apache 2.0	Self-hosted, Dgraph Cloud	DQL (GraphQL-like)	GraphQL-native applications

When to Start with a Graph Database

Start with a graph database from day one when:

Your core value proposition depends on relationships (social networks, recommendation engines, fraud detection)
You need real-time traversals at depth 3+ (friends-of-friends, supply chain tracing)
Your schema evolves frequently and relationships are first-class entities
You are building a knowledge graph that connects disparate data sources

Getting Started Checklist

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database architecture consulting, visit garnetgrid.com. :::