Knowledge Graphs for Enterprise AI

Knowledge graphs represent the structured understanding that makes AI systems truly useful in enterprise contexts. While vector databases handle semantic similarity and LLMs handle language understanding, knowledge graphs capture the explicit relationships, hierarchies, and rules that define how your business actually works. A customer has orders, orders contain products, products belong to categories, categories map to business units — this structured knowledge is exactly what LLMs lack and what knowledge graphs provide.

The enterprise applications are compelling: intelligent search that understands entity relationships, recommendation engines that traverse business logic, compliance systems that trace regulatory dependencies, and AI copilots grounded in organizational structure rather than just document chunks. This guide covers how to build, populate, query, and integrate knowledge graphs into your AI stack.

When Knowledge Graphs Beat Alternatives

Scenario	Knowledge Graph	Vector DB	Relational DB
”Find all products affected by supplier X’s recall”	✅ Traverse relationships	❌ Similarity isn’t enough	⚠️ Complex JOINs
”Who approved this change and what was the justification?”	✅ Follow audit trail	❌ Can’t trace approvals	⚠️ Multiple table lookups
”What regulations apply to this data type in the EU?”	✅ Navigate regulatory graph	❌ Wrong tool	⚠️ Denormalized tables
”Find semantically similar documents”	❌ Wrong tool	✅ Embedding search	❌ Wrong tool
”What was last quarter’s revenue by region?”	❌ Overkill	❌ Wrong tool	✅ SQL aggregation

Rule of thumb: Use knowledge graphs when the answer requires traversing relationships between entities. Use vector databases for semantic similarity. Use SQL for structured aggregation.

Graph Data Modeling

Core Concepts

Nodes (Entities)        → Things: Person, Product, Order, Policy
Edges (Relationships)   → Connections: PURCHASED, REPORTS_TO, APPLIES_TO
Properties              → Attributes: name, date, status, score
Labels                  → Categories: :Customer, :Employee, :Document

Enterprise Domain Example

// Organizational knowledge graph
CREATE (alice:Employee {name: "Alice Chen", title: "VP Engineering", department: "Engineering"})
CREATE (bob:Employee {name: "Bob Martinez", title: "Staff Engineer", department: "Platform"})
CREATE (proj:Project {name: "Customer Portal v2", status: "active", budget: 500000})
CREATE (tech:Technology {name: "React", version: "18.2", category: "frontend"})
CREATE (pol:Policy {name: "SOC 2 Data Handling", version: "3.1", effective_date: "2025-01-01"})
CREATE (team:Team {name: "Platform Engineering", headcount: 12})

// Relationships with properties
CREATE (bob)-[:REPORTS_TO {since: "2024-03-15"}]->(alice)
CREATE (bob)-[:WORKS_ON {role: "tech_lead", allocation: 0.8}]->(proj)
CREATE (proj)-[:USES]->(tech)
CREATE (proj)-[:MUST_COMPLY_WITH]->(pol)
CREATE (bob)-[:MEMBER_OF]->(team)
CREATE (alice)-[:LEADS]->(team)

Ontology Design

An ontology defines the schema — the allowed entity types, relationship types, and constraints:

# ontology.yaml
entities:
  Employee:
    properties:
      name: {type: string, required: true}
      title: {type: string, required: true}
      department: {type: string, required: true}
      hire_date: {type: date}
      level: {type: enum, values: [IC1, IC2, IC3, IC4, IC5, M1, M2, M3, VP, C]}
    
  Project:
    properties:
      name: {type: string, required: true}
      status: {type: enum, values: [planning, active, paused, completed, cancelled]}
      budget: {type: decimal}
      start_date: {type: date}
      end_date: {type: date}

relationships:
  REPORTS_TO:
    from: Employee
    to: Employee
    properties:
      since: {type: date}
    constraints:
      - "No self-loops"
      - "Employee can report to at most one manager"
  
  WORKS_ON:
    from: Employee
    to: Project
    properties:
      role: {type: string}
      allocation: {type: float, min: 0, max: 1.0}

Graph Database Selection

Database	Query Language	Hosting	Best For	License
Neo4j	Cypher	Self-hosted, Aura (cloud)	General purpose, mature ecosystem	Community/Enterprise
Amazon Neptune	Gremlin, SPARQL, openCypher	AWS managed	AWS-native, RDF + property graph	Proprietary
Azure Cosmos (Gremlin)	Gremlin	Azure managed	Azure-native, global distribution	Proprietary
TigerGraph	GSQL	Self-hosted, cloud	High-performance analytics	Community/Enterprise
Memgraph	Cypher	Self-hosted, cloud	Real-time streaming graphs	BSL/Enterprise
ArangoDB	AQL	Self-hosted, cloud	Multi-model (graph + document)	Apache 2.0

Ingestion Pipeline

from neo4j import GraphDatabase

class KnowledgeGraphIngester:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
    
    def ingest_from_api(self, records, entity_type):
        """Batch ingest records into the knowledge graph."""
        with self.driver.session() as session:
            for batch in chunk(records, size=500):
                session.execute_write(
                    self._create_entities, batch, entity_type
                )
    
    @staticmethod
    def _create_entities(tx, records, entity_type):
        query = f"""
        UNWIND $records AS record
        MERGE (e:{entity_type} {{id: record.id}})
        SET e += record.properties
        """
        tx.run(query, records=records)
    
    def create_relationships(self, relationships):
        """Create edges between existing entities."""
        with self.driver.session() as session:
            for batch in chunk(relationships, size=500):
                session.execute_write(self._create_edges, batch)
    
    @staticmethod
    def _create_edges(tx, relationships):
        for rel in relationships:
            query = f"""
            MATCH (a {{id: $from_id}})
            MATCH (b {{id: $to_id}})
            MERGE (a)-[r:{rel['type']}]->(b)
            SET r += $properties
            """
            tx.run(query, 
                   from_id=rel["from_id"],
                   to_id=rel["to_id"],
                   properties=rel.get("properties", {}))

LLM-Assisted Entity Extraction

def extract_entities_and_relationships(document_text):
    """Use LLMs to extract structured graph data from unstructured text."""
    
    prompt = f"""Extract entities and relationships from this text.
Return valid JSON with:
- entities: array of {{type, name, properties}}
- relationships: array of {{from, to, type, properties}}

Only extract entities matching: Employee, Project, Technology, Policy, Team
Only extract relationships matching: REPORTS_TO, WORKS_ON, USES, LEADS, MUST_COMPLY_WITH

Text:
{document_text}

JSON:"""
    
    result = json.loads(llm.generate(prompt, temperature=0))
    
    # Validate against ontology
    validated = validate_against_ontology(result)
    
    return validated

Querying

Cypher Query Examples

// Find the reporting chain from an employee to the CEO
MATCH path = (emp:Employee {name: "Bob Martinez"})-[:REPORTS_TO*]->(ceo:Employee)
WHERE NOT (ceo)-[:REPORTS_TO]->()
RETURN path

// Find all projects using a technology with known vulnerabilities
MATCH (p:Project)-[:USES]->(t:Technology)-[:HAS_VULNERABILITY]->(v:Vulnerability)
WHERE v.severity = "critical" AND p.status = "active"
RETURN p.name, t.name, v.cve_id, v.severity

// Impact analysis: What's affected if we deprecate a technology?
MATCH (t:Technology {name: "React 16"})<-[:USES]-(p:Project)-[:OWNED_BY]->(team:Team)
RETURN team.name, collect(p.name) AS affected_projects, count(p) AS project_count
ORDER BY project_count DESC

// Compliance tracing: Which projects need SOC 2 review?
MATCH (p:Project)-[:MUST_COMPLY_WITH]->(pol:Policy {name: "SOC 2 Data Handling"})
WHERE NOT (p)-[:HAS_REVIEW {type: "soc2", year: 2025}]->()
RETURN p.name, p.status, pol.version

Knowledge Graph + RAG Integration

The most powerful pattern: combine knowledge graph traversal with RAG retrieval:

def graph_enhanced_rag(query, user_context):
    """Use knowledge graph to enhance RAG retrieval."""
    
    # Step 1: Extract entities from the query
    entities = extract_entities(query)  # ["React", "Platform Team"]
    
    # Step 2: Expand context using graph traversal
    graph_context = []
    for entity in entities:
        # Get entity's relationships and properties
        related = knowledge_graph.query(f"""
            MATCH (e {{name: $name}})-[r]-(related)
            RETURN e, type(r) as relationship, related
            LIMIT 20
        """, name=entity)
        graph_context.extend(related)
    
    # Step 3: Use graph context to enhance vector search
    enhanced_query = f"{query}\nContext: {format_graph_context(graph_context)}"
    
    # Step 4: Standard RAG retrieval with enhanced query
    chunks = vector_store.search(enhanced_query, top_k=5)
    
    # Step 5: Generate response with both graph + document context
    response = llm.generate(
        system="Answer using both the knowledge graph context and retrieved documents.",
        context=f"Graph: {graph_context}\n\nDocuments: {chunks}",
        query=query,
    )
    
    return response

Anti-Patterns

Anti-Pattern	Problem	Fix
Graph as relational DB	Using a graph for simple tabular queries	Use graphs ONLY when relationships are the primary value
No ontology	Free-form entity/relationship creation leads to chaos	Define ontology first, validate all ingestion against it
Ingestion without deduplication	Duplicate entities create phantom relationships	MERGE instead of CREATE, implement entity resolution
Monolithic graph	One massive graph with everything leads to slow queries	Partition by domain, use subgraphs
No freshness tracking	Stale relationships persist indefinitely	Add timestamps, implement TTL-based cleanup
Over-extraction with LLMs	LLM-based extraction introduces hallucinated entities	Validate extracted entities against existing graph and ontology

Knowledge Graph Checklist

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For knowledge graph consulting, visit garnetgrid.com. :::