Data Systems

Graph Databases

Graph databases excel at modeling and querying relationships. Master the graph data model, traversal algorithms, and the specific use cases where graphs outperform relational databases by orders of magnitude.

Relationships — First-class citizens, not foreign keys
Traversal — Constant-time relationship following via index-free adjacency
Pattern Matching — Express complex relationship queries declaratively

When relationships are as important as data, graphs are the answer.

The Graph Data Model

DfGraph Database

A graph database stores data as nodes (entities) and edges (relationships), with properties on both nodes and edges. Unlike relational databases where relationships are computed at query time via JOINs, graph databases store relationships physically, enabling constant-time traversal between connected nodes via index-free adjacency.

Graph Components

Neo4j and Cypher

DfCypher Query Language

Cypher is a declarative graph query language for Neo4j. It uses ASCII art patterns to describe graph structures, making it intuitive to express relationship queries. Cypher supports pattern matching, path finding, and graph algorithms.

Common Cypher Queries

Query Type	Cypher Example
Find nodes	`MATCH (p:Person) WHERE p.name = 'Alice' RETURN p`
Find relationships	`MATCH (p:Person)-[:WORKS_AT]->(c:Company) RETURN p, c`
Variable-length paths	`MATCH (p:Person)-[:KNOWS*1..3]-(friend) RETURN friend`
Shortest path	`MATCH path = shortestPath((a)-[:KNOWS*]-(b)) RETURN path`
Pattern matching	`MATCH (p:Person)-[:USES]->(prod:Product)<-[:USES]-(q:Person)`

Social Network Query in Cypher

Find all people within 3 degrees of separation who work at the same company:

MATCH (alice:Person {name: 'Alice'})-[:KNOWS*1..3]-(friend:Person)
      -[:WORKS_AT]->(company:Company)
      <-[:WORKS_AT]-(alice)
RETURN friend.name, company.name

In SQL, this requires 3 self-joins on the friends table plus a join on the company table—extremely expensive at scale. In Neo4j, this is a simple traversal using index-free adjacency.

Index-Free Adjacency

DfIndex-Free Adjacency

Index-free adjacency means each node directly references its relationships, without requiring index lookups. To traverse from node A to node B, the database follows a physical pointer, not an index search. This makes relationship traversal O(1) per hop, regardless of total graph size.

Traversal Complexity

T_{traversal} = O(d) \text{ where } d = \text{path depth}

Here,

$T_{traversal}$ =Traversal time
$d$ =Path depth (number of hops)
$O(1)$ =Constant time per hop (index-free adjacency)

This is the fundamental advantage of graph databases over relational databases. In a relational database, each JOIN requires an index lookup (O(log N)), making multi-hop traversals O(N log N). In a graph database, each hop is O(1), making multi-hop traversals O(d).

When to Use Graph Databases

Use Case	Why Graph
Social networks	Friend recommendations, degree of separation
Fraud detection	Finding suspicious patterns in transaction networks
Knowledge graphs	Connecting entities with semantic relationships
Recommendation engines	"Users who bought X also bought Y"
Network/IT operations	Mapping infrastructure dependencies
Access control	Role-based permissions with complex hierarchies

Graph databases are NOT ideal for: bulk data loading, analytics aggregations, simple CRUD without relationship queries, or time-series data. Use the right tool for the job.

Graph Algorithms

Algorithm	Use Case	Complexity
BFS/DFS	Reachability, path finding	O(V + E)
Dijkstra	Shortest weighted path	O((V + E) log V)
PageRank	Node importance	O(k × E)
Louvain	Community detection	O(E)
Betweenness Centrality	Bridge detection	O(V × E)

PageRank

PR(v) = \frac{1-d}{N} + d \sum_{u \in M(v)} \frac{PR(u)}{L(u)}

Here,

$PR(v)$ =PageRank of node v
$d$ =Damping factor (typically 0.85)
$N$ =Total number of nodes
$M(v)$ =Set of nodes linking to v
$L(u)$ =Number of outbound links from u

Practice Exercises

Data Modeling: Design a graph model for a movie database with actors, directors, movies, and genres. Write Cypher queries to find: (a) all movies featuring a specific actor, (b) the shortest path between two actors, (c) actors who worked with the same director more than 3 times.
Performance Analysis: Compare the performance of a graph traversal (3-degree friend finding) in Neo4j vs a relational database with self-joins. Estimate the difference at 100M users with 50 friends each.
Algorithm Application: Design a fraud detection system using graph algorithms. How would you identify suspicious transaction patterns using community detection and centrality analysis?
Migration Decision: Your team currently stores social network data in PostgreSQL. Evaluate whether migrating to a graph database would improve performance for your top 3 query patterns.

Key Takeaways:

Graph databases store relationships as first-class citizens with index-free adjacency
Traversal is O(1) per hop, regardless of total graph size
Cypher provides intuitive pattern matching for relationship queries
Best for: social networks, fraud detection, recommendations, knowledge graphs
Not ideal for: bulk analytics, time-series, simple CRUD without relationships

What to Learn Next

-> NoSQL Deep Dive Document, key-value, column-family, and graph databases overview.

-> Elasticsearch Deep Dive Full-text search, inverted indices, and relevance scoring.

-> MongoDB Deep Dive Advanced MongoDB features, aggregation pipeline, and sharding.

-> Choosing the Right Database Systematic framework for database selection.

-> Data Partitioning Sharding strategies, consistent hashing, and partition keys.

-> Data Replication Sync vs async replication, leader election, and consistency.

Graph Databases

Graph Databases

The Graph Data Model

DfGraph Database

Graph Components

Neo4j and Cypher

DfCypher Query Language

Common Cypher Queries

Social Network Query in Cypher

Index-Free Adjacency

DfIndex-Free Adjacency

Traversal Complexity

When to Use Graph Databases

Graph Algorithms

PageRank

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?