Consistent hashing is an algorithm for distributing data across nodes.
## Main Objectives
1. **Scalability**. Allows dynamically adding or removing nodes.
2. **Efficiency**. Minimises re-mapping when nodes are added or removed.
3. **Weighted Load**. Distributes data according to node weights.
## Use Cases
1. **Database Sharding**. Cassandra and DynamoDB.
- Scalability: Allows adding or removing database instances.
- Efficiency: Minimizes data migration.
- Weighted Load: Powerful database instances can handle more data.
2. **Distributed Cache**. Memcached.
- Scalability: Allows adding or removing Memcached servers.
- Efficiency: Minimizes cache misses.
- Weighted Load: Distributes data evenly across servers.
## Python Implementation
This is my attempt in implementing Consistent Hashing. My goal is to keep the code simple and easy to understand.
### Hash function
We cannot use Python's native `hash()` because it is not stable across different processes. Therefore, we use our own implementation using MD5.
```python
import hashlib
def _hash(key: str) -> int:
return int(hashlib.md5(key.encode("utf-8")).hexdigest(), 16)
```
### HashRing Interface
The interface is quite straightforward. The method `key_iter_from` returns a clockwise iterator of keys starting from the key that contains the input key. This is needed to support redundancy.
```python
from abc import ABC, abstractmethod
from typing import Iterator
class HashRing(ABC):
@abstractmethod
def add_key(self, key: str): ...
@abstractmethod
def remove_key(self, key: str): ...
@abstractmethod
def key_iter_from(self, key: str) -> Iterator[str]: ...
@abstractmethod
def num_keys(self) -> int: ...
```
### HashRing Implementation
We use `SortedDict` as our ring data structure. As the name suggests, the keys are ordered and thus allows efficient bisect search.
```python
from sortedcontainers import SortedDict
class HashRingImpl(HashRing):
def __init__(self):
self._ring = SortedDict()
@override
def add_key(self, key: str):
self._ring[_hash(key)] = key
@override
def remove_key(self, key: str):
self._ring.pop(_hash(key))
@override
def key_iter_from(self, key: str) -> Iterator[str]:
if self.num_keys() == 0:
raise ValueError("No keys")
idx = self._ring.bisect_left(_hash(key)) % self.num_keys()
for _ in range(self.num_keys()):
yield self._ring.peekitem(idx)[1]
idx = (idx + 1) % self.num_keys()
@override
def num_keys(self) -> int:
return len(self._ring)
```
This implementation is simple. To improve efficiency and weighted load, we can use the **virtual node** implementation technique. Instead of one key map to one node on the ring, we make one key map to several *virtual* nodes.
### WeightedHashRing Implementation
We want to reuse HashRing when implementing WeightedHashRing. We mainly have two choices: inheritance or composition. I think this is a good opportunity to use composition.
```python
class WeightedHashRingImpl(HashRing):
def __init__(self, default_weight: int):
self._default_weight = default_weight
self._weight_by_key: dict[str, int] = {}
self._key_by_vkey: dict[str, str] = {}
self._hash_ring = HashRingImpl()
def add_weighted_key(self, key: str, weight: int):
self._weight_by_key[key] = weight
for i in range(weight):
vkey = self._to_vkey(key, i)
self._key_by_vkey[vkey] = key
self._hash_ring.add_key(vkey)
@override
def add_key(self, key: str):
self.add_weighted_key(key, self._default_weight)
@override
def remove_key(self, key: str):
weight = self._weight_by_key.pop(key)
for i in range(weight):
vkey = self._to_vkey(key, i)
self._hash_ring.remove_key(vkey)
@override
def key_iter_from(self, key: str) -> Iterator[str]:
vkey_iter = self._hash_ring.key_iter_from(key)
keys = set()
while len(keys) < self.num_keys():
vkey = next(vkey_iter)
key = self._key_by_vkey[vkey]
if key not in keys:
keys.add(key)
yield key
@override
def num_keys(self) -> int:
return len(self._weight_by_key)
@staticmethod
def _to_vkey(key: str, vkey_index: int) -> str:
return f"{key}_{vkey_index}"
```