What is a skip list?
A skip list is a probabilistic data structure that arranges elements in a hierarchical linked list and uses randomized "skips" to enable fast search over ordered data. It provides logarithmic search and insertion complexity like balanced trees.
Skip lists use layers of pointers to "skip" over elements, giving fast access to elements based on their sort order. Extra pointer levels are added probabilistically to create shortcuts for faster traversal.
This allows efficient searches, inserts, and deletes in log N time on average. Skip lists have more locality and simpler implementation than tree structures.
Skip lists employ probabilistic techniques reminiscent of Bloom filters and interval arithmetic used in tree structures like B-trees. Skip lists are commonly used to implement fast, scalable ordered key-value storage including distributed variants.
How does a skip list work?
A skip list consists of sorted linked list layers where higher layers act as an "express lane" to skip segments of the lower layer. An element is promoted to a higher layer randomly based on a configurable probability.
Search and insert traverse the highest possible layer and drop down only as elements are encountered. This skipping allows fast access based on sort order.
Why are skip lists important? Where are they used?
Skip lists provide fast search over dynamic ordered data without complex rebalancing operations. They are used in databases, caches, networking, probabilistic data structures, and search trees.
Skip lists scale better than binary search trees while providing similar access complexity. Their simplicity compared to tree balancing enables efficient implementations.
FAQ
How is a skip list different from a binary search tree?
Skip lists use probabilistic "skipping" via multilayer linked lists rather than complex rebalancing rules. Insertions are faster but queries are probabilistic.
What are the advantages of skip lists over trees?
- Simpler to implement than balanced binary search trees.
- Faster to insert and delete since no rebalancing needed.
- Concurrent operations have less impact on performance.
- Cache-friendly linear structure.
What are the downsides of using skip lists?
- Worse (but still logarithmic) search time compared to balanced trees.
- Not deterministically balanced, small probability of degradation.
- Require random number generation.
Where are skip lists commonly used?
- Database indexing, in-memory caching layers
- Network routing table lookups
- Fast search over sorted data streams
- Probabilistic data structures like Bloof filters
References:
- [Book] Designing Data-Intensive Applications
- [Article] Skip lists: a probabilistic alternative to balanced trees
- [Post] Skip List
- [Post] Sliding Window Hash Join: Efficiently Joining Infinite Streams with Order Preservation
- [Post] General-purpose Stream Joins via Pruning Symmetric Hash Joins