Consistent hashing is an algorithm for distributing data across nodes. ## Main Objectives 1. **Scalability**. Allows dynamically adding or removing nodes. 2. **Efficiency**. Minimises re-mapping when nodes are added or removed. 3. **Weighted Load**. Distributes data according to node weights. ## Use Cases 1. **Database Sharding**. Cassandra and DynamoDB. - Scalability: Allows adding or removing database instances. - Efficiency: Minimizes data migration. - Weighted Load: Powerful database instances can handle more data. 2. **Distributed Cache**. Memcached. - Scalability: Allows adding or removing Memcached servers. - Efficiency: Minimizes cache misses. - Weighted Load: Distributes data evenly across servers. ## Python Implementation This is my attempt in implementing Consistent Hashing. My goal is to keep the code simple and easy to understand. ### Hash function We cannot use Python's native `hash()` because it is not stable across different processes. Therefore, we use our own implementation using MD5. ```python import hashlib def _hash(key: str) -> int: return int(hashlib.md5(key.encode("utf-8")).hexdigest(), 16) ``` ### HashRing Interface The interface is quite straightforward. The method `key_iter_from` returns a clockwise iterator of keys starting from the key that contains the input key. This is needed to support redundancy. ```python from abc import ABC, abstractmethod from typing import Iterator class HashRing(ABC): @abstractmethod def add_key(self, key: str): ... @abstractmethod def remove_key(self, key: str): ... @abstractmethod def key_iter_from(self, key: str) -> Iterator[str]: ... @abstractmethod def num_keys(self) -> int: ... ``` ### HashRing Implementation We use `SortedDict` as our ring data structure. As the name suggests, the keys are ordered and thus allows efficient bisect search. ```python from sortedcontainers import SortedDict class HashRingImpl(HashRing): def __init__(self): self._ring = SortedDict() @override def add_key(self, key: str): self._ring[_hash(key)] = key @override def remove_key(self, key: str): self._ring.pop(_hash(key)) @override def key_iter_from(self, key: str) -> Iterator[str]: if self.num_keys() == 0: raise ValueError("No keys") idx = self._ring.bisect_left(_hash(key)) % self.num_keys() for _ in range(self.num_keys()): yield self._ring.peekitem(idx)[1] idx = (idx + 1) % self.num_keys() @override def num_keys(self) -> int: return len(self._ring) ``` This implementation is simple. To improve efficiency and weighted load, we can use the **virtual node** implementation technique. Instead of one key map to one node on the ring, we make one key map to several *virtual* nodes. ### WeightedHashRing Implementation We want to reuse HashRing when implementing WeightedHashRing. We mainly have two choices: inheritance or composition. I think this is a good opportunity to use composition. ```python class WeightedHashRingImpl(HashRing): def __init__(self, default_weight: int): self._default_weight = default_weight self._weight_by_key: dict[str, int] = {} self._key_by_vkey: dict[str, str] = {} self._hash_ring = HashRingImpl() def add_weighted_key(self, key: str, weight: int): self._weight_by_key[key] = weight for i in range(weight): vkey = self._to_vkey(key, i) self._key_by_vkey[vkey] = key self._hash_ring.add_key(vkey) @override def add_key(self, key: str): self.add_weighted_key(key, self._default_weight) @override def remove_key(self, key: str): weight = self._weight_by_key.pop(key) for i in range(weight): vkey = self._to_vkey(key, i) self._hash_ring.remove_key(vkey) @override def key_iter_from(self, key: str) -> Iterator[str]: vkey_iter = self._hash_ring.key_iter_from(key) keys = set() while len(keys) < self.num_keys(): vkey = next(vkey_iter) key = self._key_by_vkey[vkey] if key not in keys: keys.add(key) yield key @override def num_keys(self) -> int: return len(self._weight_by_key) @staticmethod def _to_vkey(key: str, vkey_index: int) -> str: return f"{key}_{vkey_index}" ```