Data distribution and storage in Apache Cassandra | Perfomatix | Full Stack Engineering Company

  • Storing data with its variable event length
  • Query the massive, fast-growing dataset for insights and iterative, perpetual improvements
  • A distributed database that can accommodate evolving and variable-length data on a large scale.
  • Scalability and high availability of data without compromising performance
  • Manage the data with a query language everyone understands

Apache Cassandra

Key advantages of using Apache Cassandra as a database are;

  • Decentralized database: Each node is capable of communicating with end-user as a complete of a partial replica of the database.
  • Distributed: Cassandra is distributed among many data nodes or data centers.
  • Highly scalable: Each node can communicate with a constant amount of other nodes; this allows linear scaling of the application over a massive number of nodes.
  • Risk-tolerant: Database is risk-averse since it is stored in a decentralized network. Data will be available even if several nodes are unavailanle and data centers crash.
  • Variable consistency: Availability and consistency of Cassandra nodes are adjustable, by configuring replication factor and consistency level settings. For example, if consistency level is set to 3 on a 3-node cluster. It would require at least all three nodes to be in agreement for maximum consistency in this cluster.
  • Deployable on cloud or hybrid data environment

How does data distribution happen in Apache Cassandra?

  • Tokens are used to determine which node holds what data. A token is a 64-bit integer, and Cassandra assigns ranges of these tokens to nodes. This ensures that each token is owned by a node, adding or removing nodes from a cluster requires redistribution of these token ranges among nodes.
  • A row’s partition key is used to calculate a token using a given partitioner (a hash function for computing the token of a partition key) to determine which node owns that row. That’s how Cassandra finds data replicas.

Data Modeling in Cassandra

How data is added to Cassandra?

  • Columns already exist in the schema — unused columns in new rows are populated with NULL values during an insert operation;
  • Applications can dynamically run alter table commands to add new columns to the schema.

How Data is read in Cassandra?

Summing Up




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Simplify your Infrastructure-as-Code deployment experience in Azure with Bicep

How To Solve And Handle Big Project

Agile Architect Part 5 — Final Thoughts

When The Love Is Good, You Wait

Saavn Search: Building a microservice architecture

Achieve Resiliency using Kafka


How to integrate Gluu server and wso2IS using SAML : Federation Authentication.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


More from Medium

How to perform Load Test on REDIS with Python3

How to Create User in Azure AD B2C by using Microsoft Graph and Java

Flyway — Version control for your database

Logging with Context