Content-Defined Chunking (CDC) is an algorithm to split data into smaller (and variable-sized) chunks.
The main objective is **deduplication**. If only a small part of the data has changed, most chunks remains the same.
The main use case is for efficiency in **data synchronisation**, such as rsync. When local file changes, we don't want to synchronise by sending the entire file, but only the chunks that are modified.
### How it works
1. Do a **rolling hash** on the data.
2. Create a chunk boundary whenever the hash matches a specific bit pattern.