Knowledge Graphs for Enterprise AI
Build enterprise knowledge graphs for AI applications. Covers graph modeling, ontology design, ingestion pipelines, querying with Cypher/SPARQL, RAG integration, and production deployment.
Knowledge graphs represent the structured understanding that makes AI systems truly useful in enterprise contexts. While vector databases handle semantic similarity and LLMs handle language understanding, knowledge graphs capture the explicit relationships, hierarchies, and rules that define how your business actually works. A customer has orders, orders contain products, products belong to categories, categories map to business units — this structured knowledge is exactly what LLMs lack and what knowledge graphs provide.
The enterprise applications are compelling: intelligent search that understands entity relationships, recommendation engines that traverse business logic, compliance systems that trace regulatory dependencies, and AI copilots grounded in organizational structure rather than just document chunks. This guide covers how to build, populate, query, and integrate knowledge graphs into your AI stack.
When Knowledge Graphs Beat Alternatives
| Scenario | Knowledge Graph | Vector DB | Relational DB |
|---|---|---|---|
| ”Find all products affected by supplier X’s recall” | ✅ Traverse relationships | ❌ Similarity isn’t enough | ⚠️ Complex JOINs |
| ”Who approved this change and what was the justification?” | ✅ Follow audit trail | ❌ Can’t trace approvals | ⚠️ Multiple table lookups |
| ”What regulations apply to this data type in the EU?” | ✅ Navigate regulatory graph | ❌ Wrong tool | ⚠️ Denormalized tables |
| ”Find semantically similar documents” | ❌ Wrong tool | ✅ Embedding search | ❌ Wrong tool |
| ”What was last quarter’s revenue by region?” | ❌ Overkill | ❌ Wrong tool | ✅ SQL aggregation |
Rule of thumb: Use knowledge graphs when the answer requires traversing relationships between entities. Use vector databases for semantic similarity. Use SQL for structured aggregation.
Graph Data Modeling
Core Concepts
Nodes (Entities) → Things: Person, Product, Order, Policy
Edges (Relationships) → Connections: PURCHASED, REPORTS_TO, APPLIES_TO
Properties → Attributes: name, date, status, score
Labels → Categories: :Customer, :Employee, :Document
Enterprise Domain Example
// Organizational knowledge graph
CREATE (alice:Employee {name: "Alice Chen", title: "VP Engineering", department: "Engineering"})
CREATE (bob:Employee {name: "Bob Martinez", title: "Staff Engineer", department: "Platform"})
CREATE (proj:Project {name: "Customer Portal v2", status: "active", budget: 500000})
CREATE (tech:Technology {name: "React", version: "18.2", category: "frontend"})
CREATE (pol:Policy {name: "SOC 2 Data Handling", version: "3.1", effective_date: "2025-01-01"})
CREATE (team:Team {name: "Platform Engineering", headcount: 12})
// Relationships with properties
CREATE (bob)-[:REPORTS_TO {since: "2024-03-15"}]->(alice)
CREATE (bob)-[:WORKS_ON {role: "tech_lead", allocation: 0.8}]->(proj)
CREATE (proj)-[:USES]->(tech)
CREATE (proj)-[:MUST_COMPLY_WITH]->(pol)
CREATE (bob)-[:MEMBER_OF]->(team)
CREATE (alice)-[:LEADS]->(team)
Ontology Design
An ontology defines the schema — the allowed entity types, relationship types, and constraints:
# ontology.yaml
entities:
Employee:
properties:
name: {type: string, required: true}
title: {type: string, required: true}
department: {type: string, required: true}
hire_date: {type: date}
level: {type: enum, values: [IC1, IC2, IC3, IC4, IC5, M1, M2, M3, VP, C]}
Project:
properties:
name: {type: string, required: true}
status: {type: enum, values: [planning, active, paused, completed, cancelled]}
budget: {type: decimal}
start_date: {type: date}
end_date: {type: date}
relationships:
REPORTS_TO:
from: Employee
to: Employee
properties:
since: {type: date}
constraints:
- "No self-loops"
- "Employee can report to at most one manager"
WORKS_ON:
from: Employee
to: Project
properties:
role: {type: string}
allocation: {type: float, min: 0, max: 1.0}
Graph Database Selection
| Database | Query Language | Hosting | Best For | License |
|---|---|---|---|---|
| Neo4j | Cypher | Self-hosted, Aura (cloud) | General purpose, mature ecosystem | Community/Enterprise |
| Amazon Neptune | Gremlin, SPARQL, openCypher | AWS managed | AWS-native, RDF + property graph | Proprietary |
| Azure Cosmos (Gremlin) | Gremlin | Azure managed | Azure-native, global distribution | Proprietary |
| TigerGraph | GSQL | Self-hosted, cloud | High-performance analytics | Community/Enterprise |
| Memgraph | Cypher | Self-hosted, cloud | Real-time streaming graphs | BSL/Enterprise |
| ArangoDB | AQL | Self-hosted, cloud | Multi-model (graph + document) | Apache 2.0 |
Ingestion Pipeline
from neo4j import GraphDatabase
class KnowledgeGraphIngester:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def ingest_from_api(self, records, entity_type):
"""Batch ingest records into the knowledge graph."""
with self.driver.session() as session:
for batch in chunk(records, size=500):
session.execute_write(
self._create_entities, batch, entity_type
)
@staticmethod
def _create_entities(tx, records, entity_type):
query = f"""
UNWIND $records AS record
MERGE (e:{entity_type} {{id: record.id}})
SET e += record.properties
"""
tx.run(query, records=records)
def create_relationships(self, relationships):
"""Create edges between existing entities."""
with self.driver.session() as session:
for batch in chunk(relationships, size=500):
session.execute_write(self._create_edges, batch)
@staticmethod
def _create_edges(tx, relationships):
for rel in relationships:
query = f"""
MATCH (a {{id: $from_id}})
MATCH (b {{id: $to_id}})
MERGE (a)-[r:{rel['type']}]->(b)
SET r += $properties
"""
tx.run(query,
from_id=rel["from_id"],
to_id=rel["to_id"],
properties=rel.get("properties", {}))
LLM-Assisted Entity Extraction
def extract_entities_and_relationships(document_text):
"""Use LLMs to extract structured graph data from unstructured text."""
prompt = f"""Extract entities and relationships from this text.
Return valid JSON with:
- entities: array of {{type, name, properties}}
- relationships: array of {{from, to, type, properties}}
Only extract entities matching: Employee, Project, Technology, Policy, Team
Only extract relationships matching: REPORTS_TO, WORKS_ON, USES, LEADS, MUST_COMPLY_WITH
Text:
{document_text}
JSON:"""
result = json.loads(llm.generate(prompt, temperature=0))
# Validate against ontology
validated = validate_against_ontology(result)
return validated
Querying
Cypher Query Examples
// Find the reporting chain from an employee to the CEO
MATCH path = (emp:Employee {name: "Bob Martinez"})-[:REPORTS_TO*]->(ceo:Employee)
WHERE NOT (ceo)-[:REPORTS_TO]->()
RETURN path
// Find all projects using a technology with known vulnerabilities
MATCH (p:Project)-[:USES]->(t:Technology)-[:HAS_VULNERABILITY]->(v:Vulnerability)
WHERE v.severity = "critical" AND p.status = "active"
RETURN p.name, t.name, v.cve_id, v.severity
// Impact analysis: What's affected if we deprecate a technology?
MATCH (t:Technology {name: "React 16"})<-[:USES]-(p:Project)-[:OWNED_BY]->(team:Team)
RETURN team.name, collect(p.name) AS affected_projects, count(p) AS project_count
ORDER BY project_count DESC
// Compliance tracing: Which projects need SOC 2 review?
MATCH (p:Project)-[:MUST_COMPLY_WITH]->(pol:Policy {name: "SOC 2 Data Handling"})
WHERE NOT (p)-[:HAS_REVIEW {type: "soc2", year: 2025}]->()
RETURN p.name, p.status, pol.version
Knowledge Graph + RAG Integration
The most powerful pattern: combine knowledge graph traversal with RAG retrieval:
def graph_enhanced_rag(query, user_context):
"""Use knowledge graph to enhance RAG retrieval."""
# Step 1: Extract entities from the query
entities = extract_entities(query) # ["React", "Platform Team"]
# Step 2: Expand context using graph traversal
graph_context = []
for entity in entities:
# Get entity's relationships and properties
related = knowledge_graph.query(f"""
MATCH (e {{name: $name}})-[r]-(related)
RETURN e, type(r) as relationship, related
LIMIT 20
""", name=entity)
graph_context.extend(related)
# Step 3: Use graph context to enhance vector search
enhanced_query = f"{query}\nContext: {format_graph_context(graph_context)}"
# Step 4: Standard RAG retrieval with enhanced query
chunks = vector_store.search(enhanced_query, top_k=5)
# Step 5: Generate response with both graph + document context
response = llm.generate(
system="Answer using both the knowledge graph context and retrieved documents.",
context=f"Graph: {graph_context}\n\nDocuments: {chunks}",
query=query,
)
return response
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Graph as relational DB | Using a graph for simple tabular queries | Use graphs ONLY when relationships are the primary value |
| No ontology | Free-form entity/relationship creation leads to chaos | Define ontology first, validate all ingestion against it |
| Ingestion without deduplication | Duplicate entities create phantom relationships | MERGE instead of CREATE, implement entity resolution |
| Monolithic graph | One massive graph with everything leads to slow queries | Partition by domain, use subgraphs |
| No freshness tracking | Stale relationships persist indefinitely | Add timestamps, implement TTL-based cleanup |
| Over-extraction with LLMs | LLM-based extraction introduces hallucinated entities | Validate extracted entities against existing graph and ontology |
Knowledge Graph Checklist
- Use case validated: relationships are the primary value (not tabular data)
- Ontology designed with entity types, relationship types, and constraints
- Graph database selected based on scale, hosting, and query requirements
- Ingestion pipeline built with deduplication (MERGE, entity resolution)
- LLM-assisted extraction validated against ontology
- Cypher/SPARQL queries optimized with indexes on frequently searched properties
- Graph + RAG integration tested for enhanced retrieval quality
- Freshness tracking implemented (timestamps on all entities and edges)
- Access control: sensitive relationships filtered by user permissions
- Monitoring: query latency, graph size, ingestion throughput
- Backup and disaster recovery for graph database
- Documentation: ontology, query patterns, data sources, refresh cadence
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For knowledge graph consulting, visit garnetgrid.com. :::