
{"id":4862,"date":"2023-07-16T22:21:37","date_gmt":"2023-07-17T02:21:37","guid":{"rendered":"https:\/\/ikriv.com\/blog\/?p=4862"},"modified":"2023-07-16T22:21:37","modified_gmt":"2023-07-17T02:21:37","slug":"cassandra-database","status":"publish","type":"post","link":"https:\/\/ikriv.com\/blog\/?p=4862","title":{"rendered":"Cassandra Database"},"content":{"rendered":"<p><img decoding=\"async\" style=\"float: right;\" src=\"https:\/\/ikriv.com\/blog\/wp-content\/uploads\/2023\/07\/cassandra.png\" alt=\"Cassandra\" width=\"160\" \/><\/p>\n<p>Last week I had an opportunity to experiment with Cassandra database in a semi-real environment.<\/p>\n<p>A few notes:<\/p>\n<ul>\n<li>Cassandra goes out of its way to be friendly to &#8220;traditional&#8221; SQL developers.\n<ul>\n<li>It&#8217;s query language and DDL are as close to SQL as humanely possible.<\/li>\n<li>It allows composite partition and clustering keys.<\/li>\n<li>It allows &#8220;bad&#8221; queries with filtering by &#8220;wrong&#8221; columns if you ask nicely.<\/li>\n<\/ul>\n<\/li>\n<li>There are two ways to ask nicely:\n<ul>\n<li>Add <span style=\"text-decoration: line-through;\">pretty please<\/span> ALLOW FILTERING keywords to the query.<\/li>\n<li>Create a secondary index.<\/li>\n<\/ul>\n<\/li>\n<li>Don&#8217;t believe people who say to never use ALLOW FILTERING in production. Sometimes &#8220;bad&#8221; query with ALLOW FILTERING works faster than &#8220;good&#8221; query with secondary index: see <a href=\"https:\/\/stackoverflow.com\/questions\/76675960\/cassandra-allow-filtering-works-twice-as-fast-as-index-on-a-mapped-column\">this StackOverflow question<\/a>.<\/li>\n<\/ul>\n<pre><code>CREATE TABLE IF NOT EXISTS test.data (\r\n  channel text,\r\n  subdomain text,\r\n  data_center text,\r\n  day date, \/\/ we need this to limit the partition size\r\n  message_id timeuuid,\r\n  tags map&lt;text,text&gt;,\r\n  sender_id text,\r\n  sequence_number bigint,\r\n  sent_at timestamp,\r\n  received_at timestamp,\r\n  header blob,\r\n  payload blob,\r\n  \/\/ the stuff in the inner parens is the partition key; the rest is the clustering (sorting) key\r\n  PRIMARY KEY ((channel, subdomain, data_center, day), sent_at, message_id)\r\n)\r\nWITH CLUSTERING ORDER BY (sent_at ASC, message_id ASC);<\/code><\/pre>\n<p>Contrast it with Amazon DynamoDB that insists that both partition key and clustering key be strings with all the parts squished together as in <code>InvoiceNumber#121212#0<\/code>.<\/p>\n<p>A few more things to note:<\/p>\n<ul>\n<li>I was able to insert ~90,000 records per second using multiple processes, with the client and the server on the same machine, and each record being slightly over 1KB.<\/li>\n<li>When creating a new index, Cassandra will return right away, but will not allow to query the table for a while, presumably until the index has been built.<\/li>\n<li>Some queries that go cross partition are allowed, e.g.\n<ul>\n<li>SELECT * FROM table [LIMIT N] or<\/li>\n<li>SELECT COUNT(*) from table.<\/li>\n<\/ul>\n<\/li>\n<li>The latter tends to time out when the table reaches a few million records.<\/li>\n<\/ul>\n<p>Generally, inability to find the size of the table is annoying. There is a tool called nodetool that would return statistics, but the number of records is not part of that statistics unfortunately.<\/p>\n<p>I operated with a single-node setup, and did not have an opportunity to explore replication, quorum and the like, this would be the subject of another excercise I suppose.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last week I had an opportunity to experiment with Cassandra database in a semi-real environment. A few notes: Cassandra goes out of its way to be friendly to &#8220;traditional&#8221; SQL <a href=\"https:\/\/ikriv.com\/blog\/?p=4862\" class=\"more-link\">[&hellip;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"Layout":"","footnotes":""},"categories":[1],"tags":[],"class_list":["entry","author-ikriv","post-4862","post","type-post","status-publish","format-standard","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4862","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4862"}],"version-history":[{"count":1,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4862\/revisions"}],"predecessor-version":[{"id":4864,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/4862\/revisions\/4864"}],"wp:attachment":[{"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4862"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4862"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ikriv.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4862"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}