Content-Defined Chunking (CDC) is an algorithm to split data into smaller (and variable-sized) chunks. The main objective is **deduplication**. If only a small part of the data has changed, most chunks remains the same. The main use case is for efficiency in **data synchronisation**, such as rsync. When local file changes, we don't want to synchronise by sending the entire file, but only the chunks that are modified. ### How it works 1. Do a **rolling hash** on the data. 2. Create a chunk boundary whenever the hash matches a specific bit pattern.