Column Family Stores & Apache Cassandra — Unit IV Study Notes
Three Catchy Title Options:
- “Cassandra Knew Everything - Now So Will You: The Ultimate NoSQL Column Store Guide”
- “Never Go Down Again: How Apache Cassandra Makes Your Data Bulletproof”
- “Rows Are So Last Decade: A Student’s Survival Guide to Apache Cassandra”
🎣 The Hook: Why Should You Even Care About Cassandra?
Imagine you run the world’s busiest messaging app — billions of messages flying around every second, users on every continent, and your database can never, ever go down. What do you use?
Facebook had this exact problem. Their solution? They built Apache Cassandra — a database so distributed, so fault-tolerant, and so fast that even if half your servers catch fire, your app keeps running without skipping a beat.
Cassandra isn’t just a database. It’s a philosophy — one that says: “Scale wide, fail gracefully, and write fast.” Whether you’re a student, a developer, or just a curious human, understanding Cassandra means understanding how the world’s biggest tech companies keep their data alive at ridiculous scale.
Let’s dive in. 🚀
Unit IV: Column Family Stores (Apache Cassandra)
4.1 Introduction to Apache Cassandra
4.1.1 The Cassandra Elevator Pitch
4.1.1.1 Cassandra in 50 Words or Less
Apache Cassandra is an open-source, distributed, NoSQL column-family database designed for high availability, elastic scalability, and fault tolerance — with no single point of failure. It excels at handling massive write-heavy workloads across multiple data centers and cloud environments at global scale.
Think of it as a database that’s been to the gym, studied abroad, and has a backup plan for every backup plan.
4.1.1.2 Distributed and Decentralized Architecture
One of Cassandra’s most powerful traits: there is no master node.
- In traditional RDBMS systems, one node is “in charge” — if it dies, everything crashes.
- In Cassandra, every node is equal (a “peer-to-peer” architecture).
- Data is distributed across all nodes in a ring topology.
🧠 Analogy: Traditional databases are like a monarchy — one king rules all. Cassandra is more like a republic — every node has a voice, and no single node dying brings down the whole country.
Key characteristics:
| Feature | Description |
|---|---|
| No Master Node | All nodes are equal peers |
| Ring Topology | Nodes form a logical ring for data distribution |
| Data Partitioning | Data is distributed via consistent hashing |
| Replication | Data is automatically copied across multiple nodes |
4.1.1.3 Elastic Scalability and High Performance
- Horizontal scaling: Add more nodes → get more capacity. It’s that simple.
- No need to shut down or reconfigure existing nodes.
- Performance scales linearly — double the nodes, roughly double the throughput.
- Optimized for write-heavy workloads: Cassandra can handle hundreds of thousands of writes per second.
🧠 Analogy: Scaling Cassandra is like adding more lanes to a highway — traffic keeps flowing while construction happens. Traditional databases are like resurfacing the only road in town: everything stops.
4.1.1.4 High Availability and Fault Tolerance
- No single point of failure (SPOF) — the death of one (or many) nodes doesn’t kill the cluster.
- Data is replicated across multiple nodes and data centers.
- Even during node failures, reads and writes can continue.
- Supports multi-data-center replication out of the box.
💡 Key Term: Replication Factor (RF) — the number of copies of each piece of data stored across the cluster. RF=3 means your data lives on 3 different nodes.
4.1.1.5 Tuneable Consistency
Cassandra gives you a dial, not a binary switch, for consistency vs. availability.
- You choose how many nodes must acknowledge a read or write before it’s considered successful.
- This is called the Consistency Level (CL).
- More nodes required = stronger consistency, but slower performance.
- Fewer nodes required = faster performance, but data might be slightly stale.
💡 Key Term: Tuneable Consistency — the ability to configure the trade-off between data consistency and read/write availability on a per-operation basis.
4.1.2 Theoretical Foundations
4.1.2.1 Brewer’s CAP Theorem
CAP Theorem states that any distributed data store can only guarantee two of the three following properties simultaneously:
C — Consistency (every read gets the most recent write)
A — Availability (every request gets a response)
P — Partition Tolerance (system works even if nodes can't talk to each other)| System Type | Guarantees | Trade-off |
|---|---|---|
| CP (e.g., HBase) | Consistency + Partition Tolerance | May be unavailable during partition |
| AP (e.g., Cassandra) | Availability + Partition Tolerance | May return stale data |
| CA (e.g., Traditional RDBMS) | Consistency + Availability | Cannot handle network partitions |
Cassandra is an AP system — it prioritizes Availability and Partition Tolerance over strict consistency. But (and this is key) — with tuneable consistency, you can lean toward consistency when needed.
Analogy: Imagine a group chat with friends in different countries. CAP Theorem says you can have messages that are: (1) always the same for everyone, (2) always delivered, or (3) delivered even when the internet is patchy — but never all three perfectly at once.
4.1.2.2 Row-Oriented Data Model
Despite being a “column-family” store, Cassandra organizes data in a wide-row model:
- Data is stored in tables (like SQL), but rows can have many, many columns.
- Each row is uniquely identified by a primary key.
- Columns are grouped into column families (now called tables in modern Cassandra).
- Unlike RDBMS, rows don’t need to share the same columns (sparse model).
💡 Key Term: Column Family — a container for rows that share a similar structure, analogous to a table in RDBMS, but far more flexible in column structure.
4.1.3 Cassandra’s Origins and Evolution
| Year | Milestone |
|---|---|
| 2007 | Developed at Facebook to power the Inbox Search feature |
| 2008 | Open-sourced by Facebook |
| 2009 | Became an Apache Incubator project |
| 2010 | Graduated to a top-level Apache project |
| 2011+ | DataStax founded; enterprise adoption surged |
| 2020+ | Cassandra 4.x released with major stability and performance improvements |
🎉 Fun fact: Cassandra is named after the prophet from Greek mythology who could foresee the future but was cursed so no one would believe her. The engineers thought it was fitting — their database could “predict” failures before they happened.
4.1.4 Use Cases and Applications
4.1.4.1 Large Deployments
- Netflix: Tracks viewing history and personalization for 200M+ subscribers.
- Apple: Runs over 75,000 Cassandra nodes to manage billions of devices.
- Instagram: Uses Cassandra for media metadata storage.
Best fit when:
- You have terabytes to petabytes of data.
- You need always-on availability with zero downtime tolerance.
4.1.4.2 Write-Heavy Workloads and Analytics
Cassandra is built for writes — inserts and updates are extremely fast because data is written to an in-memory structure first (no read-before-write required in most cases).
Ideal for:
- IoT sensor data (millions of writes per second)
- Time-series data (logs, metrics, financial ticks)
- Event tracking (clickstreams, user activity)
4.1.4.3 Geographical Distribution
- Cassandra supports multi-data-center replication natively.
- Data can be replicated to nodes in New York, London, and Tokyo simultaneously.
- Users are automatically served by the nearest data center.
- Compliant with data sovereignty regulations (keep EU data in EU).
4.1.4.4 Hybrid Cloud and Multicloud Deployment
- Cassandra runs on on-premises servers, public clouds, and in containers.
- A single cluster can span AWS + Azure + bare metal simultaneously.
- This makes it ideal for organizations transitioning to the cloud or avoiding vendor lock-in.
4.2 Cassandra Architecture and Data Model
4.2.1 Cassandra’s Distributed Architecture
4.2.1.1 Data Centers and Racks
Cassandra uses a hierarchical topology:
Cluster
└── Data Center (DC)
└── Rack
└── Node- Cluster: The top-level container — all nodes that work together.
- Data Center: A logical or physical grouping of nodes (often one per geographic region).
- Rack: A grouping within a data center (often represents physical server racks).
- Node: A single Cassandra instance on a machine.
This hierarchy helps Cassandra make smart replication decisions — spreading replicas across different racks and DCs to survive hardware failures.
4.2.1.2 Rings and Tokens
Cassandra maps all nodes into a logical ring:
- Each node is assigned one or more tokens — values on a numeric range (0 to 2^127).
- When data is written, its partition key is hashed to produce a token value.
- The node responsible for that token range handles (and replicates) that data.
Token Ring (simplified):
Node A: tokens 0–33
Node B: tokens 34–66
Node C: tokens 67–100💡 Key Term: Consistent Hashing — a technique that maps both data and nodes to the same numeric space, so adding/removing nodes only redistributes a small portion of the data.
4.2.1.3 Virtual Nodes (vnodes)
- Traditionally, each node owned one large token range → uneven distribution when adding nodes.
- Virtual nodes assign many small token ranges to each physical node (default: 256 vnodes/node).
- Benefits:
- Better load balancing across nodes
- Faster cluster resizing (adding or removing nodes)
- Automatic data redistribution without manual token assignment
🧠 Analogy: Instead of each delivery driver covering one huge zone, vnodes split the city into hundreds of tiny zones and distribute them evenly. Add a new driver? They take a few zones from everyone.
4.2.2 Core Components
4.2.2.1 Gossip Protocol and Failure Detection
- Cassandra nodes communicate using a Gossip Protocol — they periodically share state information with random neighbors.
- Within seconds, every node knows the state of every other node.
- Failure detection is handled by Phi Accrual Failure Detector — instead of a binary “alive/dead” signal, it calculates a suspicion score that rises the longer a node goes silent.
💡 Key Term: Gossip Protocol — a peer-to-peer communication protocol where nodes exchange state information in a manner similar to how rumors spread in a social network.
4.2.2.2 Snitches and Partitioners
Snitches tell Cassandra about the network topology — which nodes are in which rack and data center.
| Snitch Type | Description |
|---|---|
SimpleSnitch | For single DC, development use only |
GossipingPropertyFileSnitch | Production standard; reads DC/rack from config file |
Ec2Snitch | Auto-detects topology on AWS |
GoogleCloudSnitch | Auto-detects topology on GCP |
Partitioners determine how data is distributed across nodes:
- Murmur3Partitioner (default): Uses Murmur3 hash — fast, even distribution.
- RandomPartitioner: Uses MD5 — legacy option.
- ByteOrderedPartitioner: Preserves key order — generally avoided (causes hotspots).
4.2.2.3 Replication Strategies
When writing data, Cassandra places replicas on multiple nodes based on the Replication Strategy:
| Strategy | Use Case |
|---|---|
| SimpleStrategy | Single data center only (dev/test) |
| NetworkTopologyStrategy | Multi-DC production deployments |
-- Example: NetworkTopologyStrategy with RF=3 in each DC
CREATE KEYSPACE my_app
WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC1': 3,
'DC2': 3
};4.2.3 Cassandra’s Data Model
4.2.3.1 Clusters and Keyspaces
- Cluster: The entire Cassandra deployment (all DCs + nodes).
- Keyspace: The outermost data container — equivalent to a database in RDBMS.
- Defines the replication strategy and factor.
- A cluster can contain multiple keyspaces.
CREATE KEYSPACE ecommerce
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};4.2.3.2 Tables and Columns
Cassandra tables look SQL-like but work very differently:
CREATE TABLE users (
user_id UUID,
email TEXT,
username TEXT,
created TIMESTAMP,
PRIMARY KEY (user_id)
);Primary Key Anatomy:
PRIMARY KEY (partition_key, clustering_column_1, clustering_column_2)- Partition Key: Determines which node stores the data (via hashing).
- Clustering Columns: Determine the sort order of rows within a partition.
- Regular Columns: The actual data payload.
🚨 Critical Cassandra Truth: You design tables around your queries, not around your entities. This is the single biggest mindset shift from RDBMS!
4.2.3.3 CQL Types: Simple, Collection, and User-Defined
Simple Types:
| Type | Description |
|---|---|
UUID / TIMEUUID | Unique identifiers |
TEXT / VARCHAR | String data |
INT, BIGINT, FLOAT, DOUBLE | Numeric types |
BOOLEAN | True/False |
TIMESTAMP / DATE / TIME | Date & time |
BLOB | Binary data |
Collection Types:
| Type | Description | Example |
|---|---|---|
LIST<T> | Ordered list of values | ['a', 'b', 'c'] |
SET<T> | Unordered unique values | {'red', 'blue'} |
MAP<K,V> | Key-value pairs | {'name': 'Alice'} |
User-Defined Types (UDTs):
CREATE TYPE address (
street TEXT,
city TEXT,
zip TEXT
);
CREATE TABLE customers (
id UUID PRIMARY KEY,
name TEXT,
home FROZEN<address>
);4.3 Installing and Configuring Cassandra
4.3.1 Installation Methods
4.3.1.1 Apache Distribution
Download directly from the Apache Cassandra project:
# Download and extract
wget https://downloads.apache.org/cassandra/4.1.x/apache-cassandra-4.1.x-bin.tar.gz
tar -xvzf apache-cassandra-4.1.x-bin.tar.gz
# Start Cassandra
cd apache-cassandra-4.1.x
bin/cassandraPrerequisites: Java 11+ must be installed.
4.3.1.2 Building from Source
git clone https://github.com/apache/cassandra.git
cd cassandra
antUsed by contributors and those needing custom builds. Not recommended for production.
4.3.1.3 Docker Deployment
The fastest way to get started locally:
# Pull official image
docker pull cassandra:latest
# Start a single node
docker run --name cassandra-node -d cassandra:latest
# Connect with cqlsh
docker exec -it cassandra-node cqlsh4.3.2 Basic Server Operations
4.3.2.1 Starting and Stopping Cassandra
# Start (foreground)
bin/cassandra -f
# Start (background)
bin/cassandra
# Stop
kill $(cat cassandra.pid)
# OR using nodetool
bin/nodetool stopdaemon
# Check status
bin/nodetool status4.3.2.2 Environment Configuration
Key configuration files:
| File | Purpose |
|---|---|
conf/cassandra.yaml | Main config: cluster name, seeds, directories, ports |
conf/cassandra-env.sh | JVM settings (heap size, GC options) |
conf/jvm.options | Fine-grained JVM tuning |
conf/logback.xml | Logging configuration |
Important cassandra.yaml settings:
cluster_name: 'MyCluster'
seeds: "192.168.1.10,192.168.1.11"
listen_address: 192.168.1.12
native_transport_port: 9042
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog4.3.3 CQL Shell (cqlsh)
cqlsh is the interactive command-line interface for Cassandra — think of it like psql for PostgreSQL or mysql CLI.
# Connect to local instance
bin/cqlsh
# Connect to remote
bin/cqlsh 192.168.1.10 9042
# Connect with credentials
bin/cqlsh -u username -p password4.3.3.1 Basic cqlsh Commands
-- Show all keyspaces
DESCRIBE KEYSPACES;
-- Show all tables in current keyspace
DESCRIBE TABLES;
-- Show table schema
DESCRIBE TABLE users;
-- Check cluster info
SELECT * FROM system.local;
-- Exit
EXIT;4.3.3.2 Creating Keyspaces and Tables
-- Create keyspace
CREATE KEYSPACE blog
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
-- Use keyspace
USE blog;
-- Create table
CREATE TABLE posts (
author_id UUID,
created_at TIMESTAMP,
post_id UUID,
title TEXT,
body TEXT,
tags SET<TEXT>,
PRIMARY KEY ((author_id), created_at, post_id)
) WITH CLUSTERING ORDER BY (created_at DESC);4.3.3.3 Writing and Reading Data
-- Insert data
INSERT INTO posts (author_id, created_at, post_id, title, body)
VALUES (
uuid(),
toTimestamp(now()),
uuid(),
'Hello Cassandra',
'This is my first post!'
);
-- Read data
SELECT * FROM posts WHERE author_id = <some-uuid>;
-- Update data
UPDATE posts SET title = 'Updated Title'
WHERE author_id = <uuid> AND created_at = <ts> AND post_id = <uuid>;
-- Delete data
DELETE FROM posts
WHERE author_id = <uuid> AND created_at = <ts> AND post_id = <uuid>;4.4 Data Modeling in Cassandra
⚠️ This is the hardest part to unlearn if you know SQL. Read carefully.
4.4.1 Conceptual Data Modeling
At this stage, you identify:
- Entities (what objects exist — Users, Orders, Products)
- Relationships (how they connect)
- Attributes (what data they hold)
This phase looks similar to RDBMS ER modeling — but it’s just the starting point.
4.4.2 Logical Data Modeling
4.4.2.1 Differences from RDBMS Design
| RDBMS Approach | Cassandra Approach |
|---|---|
| Normalize data — eliminate redundancy | Denormalize — redundancy is OK, even encouraged |
| Design around entities | Design around queries |
| Joins allowed | No joins — ever |
| Ad hoc queries supported | Queries must be predefined |
| Foreign keys enforce relationships | Application handles relationships |
4.4.2.2 Query-Driven Modeling Approach
The golden rule of Cassandra modeling:
“Know your queries first. Design your tables around them.”
Workflow:
- Identify all application queries (e.g., “Get all posts by author, ordered by date”)
- For each query, design a table that satisfies it directly
- Accept that you may need multiple tables for the same data (that’s normal!)
Example:
Query 1: "Get user by email"
→ Table: users_by_email (partition key: email)
Query 2: "Get user by user_id"
→ Table: users_by_id (partition key: user_id)Yes — two tables, same data, different access patterns. This is correct Cassandra modeling.
4.4.3 Physical Data Modeling
4.4.3.1 Partition Design
The partition key is the most important design decision in Cassandra.
Rules for a good partition key:
- Should distribute data evenly across all nodes (high cardinality)
- Should be used in every query that accesses this table
- Avoid low-cardinality keys (e.g.,
status = 'active'/'inactive'→ only 2 nodes get all the data) - Avoid monotonically increasing keys with ByteOrderedPartitioner (causes hotspots)
-- BAD: Low cardinality partition key
PRIMARY KEY (status) -- only 2–3 values, huge hotspots
-- GOOD: High cardinality
PRIMARY KEY (user_id) -- millions of unique values, even distribution
-- COMPOSITE partition key (when needed for distribution)
PRIMARY KEY ((region, user_id), created_at)4.4.3.2 Clustering Columns
Clustering columns define the sort order of rows within a partition:
CREATE TABLE sensor_readings (
sensor_id UUID,
recorded TIMESTAMP,
value DOUBLE,
PRIMARY KEY (sensor_id, recorded)
) WITH CLUSTERING ORDER BY (recorded DESC);- Data within the
sensor_idpartition is stored sorted byrecorded(newest first). - You can query ranges:
WHERE sensor_id = X AND recorded > Y AND recorded < Z - Clustering columns must be used in order in WHERE clauses.
4.4.4 Schema Optimization Techniques
4.4.4.1 Calculating Partition Size
Large partitions cause performance problems. Use this formula:
Nv = Nr * (Nc - Npk - Nck) + Nr * Nsc
Where:
Nv = number of values (cells) in the partition
Nr = number of rows
Nc = number of columns in table
Npk = number of partition key columns
Nck = number of clustering columns
Nsc = number of static columns🎯 Target: Keep partitions under 100MB in size and under 100,000 rows.
4.4.4.2 Breaking Up Large Partitions
Technique 1: Add a bucket to the partition key
-- BEFORE: One huge partition per sensor
PRIMARY KEY (sensor_id, recorded)
-- AFTER: Bucket by month to limit partition size
PRIMARY KEY ((sensor_id, month), recorded)
-- Now query: WHERE sensor_id = X AND month = '2024-01'Technique 2: Time-based bucketing
- Break
user_id+year_monthso each partition only holds one month of data.
4.4.5 Data Modeling Tools for Cassandra
| Tool | Description |
|---|---|
| DataStax Studio | Visual query and data modeling IDE |
| Hackolade | Entity-relationship modeling for NoSQL databases |
| Chebotko Diagrams | Notation system for visually representing Cassandra table designs |
| NoSQLBench | Benchmarking and load testing tool |
📎 Reference: DataStax Data Modeling Guide
4.5 Advanced Cassandra Concepts
4.5.1 Consistency Models
4.5.1.1 Consistency Levels
Consistency Level (CL) = how many replica nodes must respond before a read/write is considered successful.
| Consistency Level | Description | RF=3 Example |
|---|---|---|
ONE | Only 1 replica must respond | Fast, weakest consistency |
TWO | 2 replicas must respond | Moderate |
THREE | 3 replicas must respond | All replicas |
QUORUM | Majority must respond | (3/2)+1 = 2 nodes |
LOCAL_QUORUM | Majority in local DC | Best for multi-DC |
EACH_QUORUM | Majority in each DC | Strongest multi-DC |
ALL | All replicas must respond | Strongest, least available |
ANY | At least 1 node (even hinted) | Fastest write |
💡 Strong Consistency Formula:
Read CL + Write CL > Replication Factor= Strong Consistency
Example:QUORUM + QUORUM > 3✅
🧠 Analogy: Think of CL as needing signatures on a document. ONE = just one co-signer. QUORUM = majority of the board. ALL = everyone must sign, or it’s invalid.
4.5.1.2 Lightweight Transactions and Paxos
Cassandra supports compare-and-swap (CAS) operations using the Paxos consensus protocol:
-- Only insert if the user doesn't already exist (IF NOT EXISTS)
INSERT INTO users (user_id, email)
VALUES (uuid(), 'alice@example.com')
IF NOT EXISTS;
-- Only update if current value matches (optimistic locking)
UPDATE users SET email = 'new@example.com'
WHERE user_id = <uuid>
IF email = 'old@example.com';⚠️ Warning: Lightweight Transactions (LWT) are significantly slower (4x round trips using Paxos). Use sparingly — only when true compare-and-swap is required.
4.5.2 Read and Write Path
4.5.2.1 Memtables, SSTables, and Commit Logs
Write Path:
Write Request
│
├──→ Commit Log (WAL — durability on disk, sequential write)
│
└──→ Memtable (in-memory table, fast write)
│
│ [when full or timeout]
▼
SSTable (immutable, sorted file on disk)- Commit Log: Write-Ahead Log (WAL) — ensures durability. If Cassandra crashes, this is replayed on restart.
- Memtable: In-memory buffer — writes are lightning fast here.
- SSTable (Sorted String Table): Immutable on-disk file — once written, never modified. New versions of rows create new SSTables.
Read Path:
Read Request
│
├──→ Row Cache (if enabled and hit → return immediately)
│
├──→ Bloom Filter (is this key probably in this SSTable?)
│
├──→ Key Cache (is the offset of this key cached?)
│
├──→ Partition Summary → Partition Index
│
└──→ SSTable data file → return result4.5.2.2 Bloom Filters and Caching
Bloom Filter:
- A probabilistic data structure that quickly answers: “Is this row definitely NOT in this SSTable?”
- If the Bloom Filter says NO → skip the SSTable entirely (huge I/O savings).
- Can have false positives (says “maybe yes” when actually no) — but never false negatives.
Caching Options:
| Cache | What It Stores | Use Case |
|---|---|---|
| Row Cache | Entire rows from SSTables | Frequently read, rarely updated rows |
| Key Cache | SSTable offset for a partition key | General purpose; enabled by default |
| Chunk Cache | Compressed SSTable chunks (off-heap) | High-throughput read workloads |
| Counter Cache | Counter column values | Counter-heavy workloads |
4.5.3 Background Processes
4.5.3.1 Hinted Handoff
Problem: Node B goes down. A write meant for Node B arrives.
Solution: Another node (the coordinator) stores a hint — a temporary record of the write.
- When Node B comes back online, the coordinator replays the hint to bring it up to date.
- Hints are stored for a configurable duration (
max_hint_window: default 3 hours). - After the window expires, the data is no longer hinted → read repair or manual repair needed.
💡 Key Term: Hinted Handoff — a mechanism ensuring writes are not lost when a replica node is temporarily unavailable, by storing the write as a “hint” on another node.
4.5.3.2 Anti-Entropy and Repair
Over time, replicas can drift out of sync. Anti-Entropy Repair fixes this:
- Uses Merkle Trees (hash trees) to efficiently compare data between replicas.
- Only rows that actually differ are synchronized — not entire datasets.
- Run with nodetool repair.
# Full repair of a node
nodetool repair
# Repair specific keyspace
nodetool repair my_keyspace
# Incremental repair (faster, modern approach)
nodetool repair --incremental🧠 Merkle Tree Analogy: Think of it like a file system checksum tree — if the top-level hash matches, nothing inside changed. If it doesn’t, drill down until you find exactly which files differ. Efficient!
4.5.3.3 Compaction
Problem: Every update/delete creates a new SSTable. Over time → thousands of SSTable files → slow reads.
Solution: Compaction — merging SSTables into fewer, larger, optimized files.
During compaction:
- Old versions of updated rows are discarded.
- Tombstones (deletion markers) are removed (after
gc_grace_secondswindow). - Remaining data is merged into a new, cleaner SSTable.
Compaction Strategies:
| Strategy | Best For | How It Works |
|---|---|---|
| STCS (Size-Tiered) | Write-heavy workloads (default) | Merges SSTables of similar size |
| LCS (Leveled) | Read-heavy workloads | Organizes SSTables into levels; more predictable reads |
| TWCS (Time-Window) | Time-series data | Groups SSTables by time window; perfect for expiring data |
-- Set compaction strategy on a table
ALTER TABLE sensor_readings
WITH compaction = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'HOURS',
'compaction_window_size': 1};💡 Key Term: Tombstone — a special marker written to disk when a row or column is deleted. It tells Cassandra “this data was deleted” until compaction cleans it up.
4.5.4 System Management
4.5.4.1 Managers and Services Overview
Cassandra’s internal architecture includes several key managers:
| Manager / Service | Responsibility |
|---|---|
| StorageService | Coordinates ring operations, token assignment |
| StorageProxy | Routes read/write requests to correct replicas |
| MessagingService | Handles inter-node communication |
| GossipStage | Manages gossip protocol execution |
| CommitLogService | Manages WAL writes and fsync |
| CompactionManager | Schedules and executes compaction tasks |
| HintedHandoffManager | Stores and replays hints for unavailable nodes |
| RepairService | Coordinates anti-entropy repair operations |
| StreamManager | Manages data streaming during topology changes |
4.5.4.2 System Keyspaces
Cassandra maintains internal system keyspaces that store cluster metadata. Never delete or modify these!
| Keyspace | Contents |
|---|---|
system | Local node state, schema, compaction history |
system_schema | All keyspace, table, and type definitions |
system_auth | User credentials and permissions |
system_distributed | Distributed metadata: repair history, views |
system_traces | Query trace data (for debugging) |
-- Inspect system keyspace
SELECT * FROM system.local;
SELECT * FROM system_schema.keyspaces;
SELECT * FROM system_schema.tables WHERE keyspace_name = 'my_app';📚 References
- Apache Cassandra Documentation — https://cassandra.apache.org/doc/latest/
- DataStax Documentation — https://docs.datastax.com/
- Eben Hewitt & Jeff Carpenter — Cassandra: The Definitive Guide (O’Reilly) — https://www.oreilly.com/library/view/cassandra-the-definitive/9781491933657/
- CAP Theorem — Brewer, E. (2000) — https://dl.acm.org/doi/10.1145/343477.343502
- DataStax Data Modeling Guide — https://docs.datastax.com/en/dse/6.8/dse-dev/datastax_enterprise/dbDesign/dbDesignIntro.html
- Apache Cassandra GitHub — https://github.com/apache/cassandra
- Paxos / Lightweight Transactions — https://cassandra.apache.org/doc/latest/cassandra/cql/dml.html#conditions
⚡ TL;DR — The Cheat Sheet You Actually Need
Too long? Fine. Here’s everything squeezed into one power block:
🏗️ Architecture:
- Cassandra = peer-to-peer ring, no master node, no single point of failure
- Data is distributed via consistent hashing of the partition key
- Virtual nodes (vnodes) = even load distribution across the ring
- Gossip Protocol = how nodes know each other’s state
📐 Data Model:
- Keyspace → Table → Row → Column
- Primary Key = Partition Key (which node?) + Clustering Columns (sort order within partition)
- Design tables around queries, NOT around data entities — this is the #1 rule
Read & Write Path:
- Writes go to Commit Log (durability) + Memtable (speed) → flush to SSTable (disk)
- Reads use Bloom Filters → Caches → SSTables
- Compaction merges SSTables and removes stale data/tombstones
Consistency:
- Cassandra is AP (CAP Theorem) — favors Availability + Partition Tolerance
- Use tuneable consistency levels (
ONE,QUORUM,ALL) to balance speed vs. accuracy QUORUM + QUORUM > RF= strong consistency guarantee
Operations:
nodetool status— check node healthnodetool repair— sync out-of-sync replicascqlsh— your SQL-like interface into Cassandra- Hinted Handoff = writes survive temporary node failures
- Anti-Entropy Repair using Merkle Trees = long-term replica synchronization
What Cassandra Can’t Do (and you shouldn’t try):
- No JOINs
- No ad hoc queries across arbitrary columns
- No foreign keys
- Avoid
ALLOW FILTERING(it’s a table scan — very slow)
