Data Systems
Graph Databases
Graph databases excel at modeling and querying relationships. Master the graph data model, traversal algorithms, and the specific use cases where graphs outperform relational databases by orders of magnitude.
- Relationships — First-class citizens, not foreign keys
- Traversal — Constant-time relationship following via index-free adjacency
- Pattern Matching — Express complex relationship queries declaratively
When relationships are as important as data, graphs are the answer.
The Graph Data Model
DfGraph Database
A graph database stores data as nodes (entities) and edges (relationships), with properties on both nodes and edges. Unlike relational databases where relationships are computed at query time via JOINs, graph databases store relationships physically, enabling constant-time traversal between connected nodes via index-free adjacency.
Graph Components
Neo4j and Cypher
DfCypher Query Language
Cypher is a declarative graph query language for Neo4j. It uses ASCII art patterns to describe graph structures, making it intuitive to express relationship queries. Cypher supports pattern matching, path finding, and graph algorithms.
Common Cypher Queries
| Query Type | Cypher Example |
|---|---|
| Find nodes | MATCH (p:Person) WHERE p.name = 'Alice' RETURN p |
| Find relationships | MATCH (p:Person)-[:WORKS_AT]->(c:Company) RETURN p, c |
| Variable-length paths | MATCH (p:Person)-[:KNOWS*1..3]-(friend) RETURN friend |
| Shortest path | MATCH path = shortestPath((a)-[:KNOWS*]-(b)) RETURN path |
| Pattern matching | MATCH (p:Person)-[:USES]->(prod:Product)<-[:USES]-(q:Person) |
Social Network Query in Cypher
Find all people within 3 degrees of separation who work at the same company:
MATCH (alice:Person {name: 'Alice'})-[:KNOWS*1..3]-(friend:Person)
-[:WORKS_AT]->(company:Company)
<-[:WORKS_AT]-(alice)
RETURN friend.name, company.name
In SQL, this requires 3 self-joins on the friends table plus a join on the company table—extremely expensive at scale. In Neo4j, this is a simple traversal using index-free adjacency.
Index-Free Adjacency
DfIndex-Free Adjacency
Index-free adjacency means each node directly references its relationships, without requiring index lookups. To traverse from node A to node B, the database follows a physical pointer, not an index search. This makes relationship traversal O(1) per hop, regardless of total graph size.
Traversal Complexity
Here,
- =Traversal time
- =Path depth (number of hops)
- =Constant time per hop (index-free adjacency)
This is the fundamental advantage of graph databases over relational databases. In a relational database, each JOIN requires an index lookup (O(log N)), making multi-hop traversals O(N log N). In a graph database, each hop is O(1), making multi-hop traversals O(d).
When to Use Graph Databases
| Use Case | Why Graph |
|---|---|
| Social networks | Friend recommendations, degree of separation |
| Fraud detection | Finding suspicious patterns in transaction networks |
| Knowledge graphs | Connecting entities with semantic relationships |
| Recommendation engines | "Users who bought X also bought Y" |
| Network/IT operations | Mapping infrastructure dependencies |
| Access control | Role-based permissions with complex hierarchies |
Graph databases are NOT ideal for: bulk data loading, analytics aggregations, simple CRUD without relationship queries, or time-series data. Use the right tool for the job.
Graph Algorithms
| Algorithm | Use Case | Complexity |
|---|---|---|
| BFS/DFS | Reachability, path finding | O(V + E) |
| Dijkstra | Shortest weighted path | O((V + E) log V) |
| PageRank | Node importance | O(k × E) |
| Louvain | Community detection | O(E) |
| Betweenness Centrality | Bridge detection | O(V × E) |
PageRank
Here,
- =PageRank of node v
- =Damping factor (typically 0.85)
- =Total number of nodes
- =Set of nodes linking to v
- =Number of outbound links from u
Practice Exercises
-
Data Modeling: Design a graph model for a movie database with actors, directors, movies, and genres. Write Cypher queries to find: (a) all movies featuring a specific actor, (b) the shortest path between two actors, (c) actors who worked with the same director more than 3 times.
-
Performance Analysis: Compare the performance of a graph traversal (3-degree friend finding) in Neo4j vs a relational database with self-joins. Estimate the difference at 100M users with 50 friends each.
-
Algorithm Application: Design a fraud detection system using graph algorithms. How would you identify suspicious transaction patterns using community detection and centrality analysis?
-
Migration Decision: Your team currently stores social network data in PostgreSQL. Evaluate whether migrating to a graph database would improve performance for your top 3 query patterns.
Key Takeaways:
- Graph databases store relationships as first-class citizens with index-free adjacency
- Traversal is O(1) per hop, regardless of total graph size
- Cypher provides intuitive pattern matching for relationship queries
- Best for: social networks, fraud detection, recommendations, knowledge graphs
- Not ideal for: bulk analytics, time-series, simple CRUD without relationships
What to Learn Next
-> NoSQL Deep Dive Document, key-value, column-family, and graph databases overview.
-> Elasticsearch Deep Dive Full-text search, inverted indices, and relevance scoring.
-> MongoDB Deep Dive Advanced MongoDB features, aggregation pipeline, and sharding.
-> Choosing the Right Database Systematic framework for database selection.
-> Data Partitioning Sharding strategies, consistent hashing, and partition keys.
-> Data Replication Sync vs async replication, leader election, and consistency.