Data distribution and storage in Apache Cassandra | Perfomatix | Full Stack Engineering Company

  • Storing data with its variable event length
  • Query the massive, fast-growing dataset for insights and iterative, perpetual improvements
  • A distributed database that can accommodate evolving and variable-length data on a large scale.
  • Scalability and high availability of data without compromising performance
  • Manage the data with a query language everyone understands

Apache Cassandra

Key advantages of using Apache Cassandra as a database are;

  • Decentralized database: Each node is capable of communicating with end-user as a complete of a partial replica of the database.
  • Distributed: Cassandra is distributed among many data nodes or data centers.
  • Highly scalable: Each node can communicate with a constant amount of other nodes; this allows linear scaling of the application over a massive number of nodes.
  • Risk-tolerant: Database is risk-averse since it is stored in a decentralized network. Data will be available even if several nodes are unavailanle and data centers crash.
  • Variable consistency: Availability and consistency of Cassandra nodes are adjustable, by configuring replication factor and consistency level settings. For example, if consistency level is set to 3 on a 3-node cluster. It would require at least all three nodes to be in agreement for maximum consistency in this cluster.
  • Deployable on cloud or hybrid data environment

How does data distribution happen in Apache Cassandra?

  • Tokens are used to determine which node holds what data. A token is a 64-bit integer, and Cassandra assigns ranges of these tokens to nodes. This ensures that each token is owned by a node, adding or removing nodes from a cluster requires redistribution of these token ranges among nodes.
  • A row’s partition key is used to calculate a token using a given partitioner (a hash function for computing the token of a partition key) to determine which node owns that row. That’s how Cassandra finds data replicas.

Data Modeling in Cassandra

How data is added to Cassandra?

  • Columns already exist in the schema — unused columns in new rows are populated with NULL values during an insert operation;
  • Applications can dynamically run alter table commands to add new columns to the schema.

How Data is read in Cassandra?

Summing Up




Perfomatix is your trusted technology partner for Software Product Engineering Services.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Why Storing State Machines in ECS is a bad idea.

Journey into ARKit — Project “Wrap-Up”

CS371P Fall 2021: Cameron Courtney (Week of 8 Nov — 14 Nov)

Automatically following insiders transactions on the belgian stock market with Serverless on AWS

Enemy destroys power-up if in front

Decoupled Microservices Architecture with Materialize


A Code-Free Introduction to Computer Programming

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Perfomatix is your trusted technology partner for Software Product Engineering Services.

More from Medium

Zero Downtime Deployment Techniques — Canary Deployments

Produce Stripe events to Apache Kafka®

Log Aggregation: A Brief Guide

Build Applications faster with a low-code platform as