You are viewing a plain text version of this content. The canonical link for it is here.

Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2022/06/15 16:26:25 UTC

[GitHub] [cassandra-website] dtopdontstop opened a new pull request, #141: CASSANDRA-17692 June 2022 blog "Apache Cassandra 4.1: New SSTable Identifiers"

dtopdontstop opened a new pull request, #141:
URL: https://github.com/apache/cassandra-website/pull/141

   patch by Jacek Lewandowski, Chris Thornett, Diogenese Topper; reviewed by -- for CASSANDRA-17692
   
   Co-authored by: Jacek Lewandowski <le...@gmail.com>
   Co-authored by: Chris Thornett <ch...@constantia.io>
   Co-authored by: Diogenese Topper <di...@constantia.io>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org

[GitHub] [cassandra-website] ErickRamirezAU merged pull request #141: CASSANDRA-17692 June 2022 blog "Apache Cassandra 4.1: New SSTable Identifiers"

Posted by GitBox <gi...@apache.org>.

ErickRamirezAU merged PR #141:
URL: https://github.com/apache/cassandra-website/pull/141


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org

[GitHub] [cassandra-website] ErickRamirezAU commented on a diff in pull request #141: CASSANDRA-17692 June 2022 blog "Apache Cassandra 4.1: New SSTable Identifiers"

Posted by GitBox <gi...@apache.org>.

ErickRamirezAU commented on code in PR #141:
URL: https://github.com/apache/cassandra-website/pull/141#discussion_r898708011


##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].

Review Comment:
   Fix `/doc` link:
   ```suggestion
   You can read more about the particular SSTable components of the BigTable format in the link:/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation^].
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org

[GitHub] [cassandra-website] ErickRamirezAU commented on a diff in pull request #141: CASSANDRA-17692 June 2022 blog "Apache Cassandra 4.1: New SSTable Identifiers"

Posted by GitBox <gi...@apache.org>.

ErickRamirezAU commented on code in PR #141:
URL: https://github.com/apache/cassandra-website/pull/141#discussion_r898707523


##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.

Review Comment:
   Fix `/doc` link:
   ```suggestion
   SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a link:/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
   ```



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+
+
+data1/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+----
+
+The table directory name has an identifier `<id>`, which is unique for that table, and the same identifier is used for each table’s data directory on each Cassandra node.
+SSTable files have a precisely defined file name pattern, enabling Cassandra to determine the SSTable format, version, and order in which SSTables were created:
+
+*<version>* - The version identifier is made up of two lowercase letters. The letters denote the major and minor format versions (in the ancient Cassandra distributions, the version was denoted by one letter).
+
+*<generation id>* - This is the identifier that allows SSTables to be distinguished and the order of different SSTables.
+
+*<format>* - This is the SSTable format identifier. As mentioned, currently, the only existing format is BigTable, and its identifier is ‘big’.
+
+*<component>.<ext>*	- The component's name and the extension specific to that component.
+
+=== SSTable Identifiers
+
+SSTable identifiers (also known as generation identifiers) are used to distinguish and order different SSTables. Since an SSTable is created every time a table is flushed, many SSTables can exist simultaneously in the same directory. The generation identifier of a newly stored SSTable is guaranteed to be greater than any identifiers of previously-stored SSTables for a certain table on the node.
+Natural numbers are used as generation identifiers. Cassandra scans the directories on start up before any new SSTable is written, and the starting number is obtained by incrementing the largest generation identifier found across the local data directories for a certain table.
+
+*NOTE:* Cassandra includes live data directories and backup directories but ignores snapshots directories when performing its startup scan. Therefore, there may be SSTables with the same identifier among all the data directories while being different SSTables. 
+The general identifiers based on the natural numbers aim to be unique per Cassandra node and table. This means that not only two SSTables of two different tables created on the same node may have the same identifiers and, thus, the same file names, but two different SSTables of the same table created on different nodes.
+
+As you might expect, there can be some maintenance problems due to the identifier properties discussed above. For example, as we illustrate below, truncation of a table triggers snapshot creation and removal of all SSTables from data directories. If the node is restarted and there is no SSTable created before that, the sequence is restarted from the beginning because there is no existing SSTable for identifying the last generated identifier:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /nb-1-big-Data.db
+----
+
+There is a snapshot made before the truncation - that is, Cassandra creates hard links to all the SSTables files in a snapshot directory then files are removed from the live data directory:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /snapshots
+        /truncated-<timestamp>-tab_bar
+          /nb-1-big-Data.db
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].

Review Comment:
   Fix `/doc` link:
   ```suggestion
   You can read more about the particular SSTable components of the BigTable format in the  link:/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
   ```



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+
+
+data1/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+----
+
+The table directory name has an identifier `<id>`, which is unique for that table, and the same identifier is used for each table’s data directory on each Cassandra node.
+SSTable files have a precisely defined file name pattern, enabling Cassandra to determine the SSTable format, version, and order in which SSTables were created:
+
+*<version>* - The version identifier is made up of two lowercase letters. The letters denote the major and minor format versions (in the ancient Cassandra distributions, the version was denoted by one letter).
+
+*<generation id>* - This is the identifier that allows SSTables to be distinguished and the order of different SSTables.
+
+*<format>* - This is the SSTable format identifier. As mentioned, currently, the only existing format is BigTable, and its identifier is ‘big’.
+
+*<component>.<ext>*	- The component's name and the extension specific to that component.
+
+=== SSTable Identifiers
+
+SSTable identifiers (also known as generation identifiers) are used to distinguish and order different SSTables. Since an SSTable is created every time a table is flushed, many SSTables can exist simultaneously in the same directory. The generation identifier of a newly stored SSTable is guaranteed to be greater than any identifiers of previously-stored SSTables for a certain table on the node.
+Natural numbers are used as generation identifiers. Cassandra scans the directories on start up before any new SSTable is written, and the starting number is obtained by incrementing the largest generation identifier found across the local data directories for a certain table.
+
+*NOTE:* Cassandra includes live data directories and backup directories but ignores snapshots directories when performing its startup scan. Therefore, there may be SSTables with the same identifier among all the data directories while being different SSTables. 
+The general identifiers based on the natural numbers aim to be unique per Cassandra node and table. This means that not only two SSTables of two different tables created on the same node may have the same identifiers and, thus, the same file names, but two different SSTables of the same table created on different nodes.
+
+As you might expect, there can be some maintenance problems due to the identifier properties discussed above. For example, as we illustrate below, truncation of a table triggers snapshot creation and removal of all SSTables from data directories. If the node is restarted and there is no SSTable created before that, the sequence is restarted from the beginning because there is no existing SSTable for identifying the last generated identifier:
+
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1

Review Comment:
   ```suggestion
   :description: New SSTable Identifiers in Apache Cassandra 4.1
   ```



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+
+
+data1/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+
+
+data1/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+----
+
+The table directory name has an identifier `<id>`, which is unique for that table, and the same identifier is used for each table’s data directory on each Cassandra node.
+SSTable files have a precisely defined file name pattern, enabling Cassandra to determine the SSTable format, version, and order in which SSTables were created:
+
+*<version>* - The version identifier is made up of two lowercase letters. The letters denote the major and minor format versions (in the ancient Cassandra distributions, the version was denoted by one letter).
+
+*<generation id>* - This is the identifier that allows SSTables to be distinguished and the order of different SSTables.
+
+*<format>* - This is the SSTable format identifier. As mentioned, currently, the only existing format is BigTable, and its identifier is ‘big’.
+
+*<component>.<ext>*	- The component's name and the extension specific to that component.
+
+=== SSTable Identifiers
+
+SSTable identifiers (also known as generation identifiers) are used to distinguish and order different SSTables. Since an SSTable is created every time a table is flushed, many SSTables can exist simultaneously in the same directory. The generation identifier of a newly stored SSTable is guaranteed to be greater than any identifiers of previously-stored SSTables for a certain table on the node.
+Natural numbers are used as generation identifiers. Cassandra scans the directories on start up before any new SSTable is written, and the starting number is obtained by incrementing the largest generation identifier found across the local data directories for a certain table.
+
+*NOTE:* Cassandra includes live data directories and backup directories but ignores snapshots directories when performing its startup scan. Therefore, there may be SSTables with the same identifier among all the data directories while being different SSTables. 
+The general identifiers based on the natural numbers aim to be unique per Cassandra node and table. This means that not only two SSTables of two different tables created on the same node may have the same identifiers and, thus, the same file names, but two different SSTables of the same table created on different nodes.
+
+As you might expect, there can be some maintenance problems due to the identifier properties discussed above. For example, as we illustrate below, truncation of a table triggers snapshot creation and removal of all SSTables from data directories. If the node is restarted and there is no SSTable created before that, the sequence is restarted from the beginning because there is no existing SSTable for identifying the last generated identifier:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /nb-1-big-Data.db
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+
+
+data1/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+----
+
+The table directory name has an identifier `<id>`, which is unique for that table, and the same identifier is used for each table’s data directory on each Cassandra node.
+SSTable files have a precisely defined file name pattern, enabling Cassandra to determine the SSTable format, version, and order in which SSTables were created:
+
+*<version>* - The version identifier is made up of two lowercase letters. The letters denote the major and minor format versions (in the ancient Cassandra distributions, the version was denoted by one letter).
+
+*<generation id>* - This is the identifier that allows SSTables to be distinguished and the order of different SSTables.
+
+*<format>* - This is the SSTable format identifier. As mentioned, currently, the only existing format is BigTable, and its identifier is ‘big’.
+
+*<component>.<ext>*	- The component's name and the extension specific to that component.
+
+=== SSTable Identifiers
+
+SSTable identifiers (also known as generation identifiers) are used to distinguish and order different SSTables. Since an SSTable is created every time a table is flushed, many SSTables can exist simultaneously in the same directory. The generation identifier of a newly stored SSTable is guaranteed to be greater than any identifiers of previously-stored SSTables for a certain table on the node.
+Natural numbers are used as generation identifiers. Cassandra scans the directories on start up before any new SSTable is written, and the starting number is obtained by incrementing the largest generation identifier found across the local data directories for a certain table.
+
+*NOTE:* Cassandra includes live data directories and backup directories but ignores snapshots directories when performing its startup scan. Therefore, there may be SSTables with the same identifier among all the data directories while being different SSTables. 
+The general identifiers based on the natural numbers aim to be unique per Cassandra node and table. This means that not only two SSTables of two different tables created on the same node may have the same identifiers and, thus, the same file names, but two different SSTables of the same table created on different nodes.
+
+As you might expect, there can be some maintenance problems due to the identifier properties discussed above. For example, as we illustrate below, truncation of a table triggers snapshot creation and removal of all SSTables from data directories. If the node is restarted and there is no SSTable created before that, the sequence is restarted from the beginning because there is no existing SSTable for identifying the last generated identifier:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /nb-1-big-Data.db
+----
+
+There is a snapshot made before the truncation - that is, Cassandra creates hard links to all the SSTables files in a snapshot directory then files are removed from the live data directory:
+
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+
+
+data1/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+----
+
+The table directory name has an identifier `<id>`, which is unique for that table, and the same identifier is used for each table’s data directory on each Cassandra node.
+SSTable files have a precisely defined file name pattern, enabling Cassandra to determine the SSTable format, version, and order in which SSTables were created:
+
+*<version>* - The version identifier is made up of two lowercase letters. The letters denote the major and minor format versions (in the ancient Cassandra distributions, the version was denoted by one letter).
+
+*<generation id>* - This is the identifier that allows SSTables to be distinguished and the order of different SSTables.
+
+*<format>* - This is the SSTable format identifier. As mentioned, currently, the only existing format is BigTable, and its identifier is ‘big’.
+
+*<component>.<ext>*	- The component's name and the extension specific to that component.
+
+=== SSTable Identifiers
+
+SSTable identifiers (also known as generation identifiers) are used to distinguish and order different SSTables. Since an SSTable is created every time a table is flushed, many SSTables can exist simultaneously in the same directory. The generation identifier of a newly stored SSTable is guaranteed to be greater than any identifiers of previously-stored SSTables for a certain table on the node.
+Natural numbers are used as generation identifiers. Cassandra scans the directories on start up before any new SSTable is written, and the starting number is obtained by incrementing the largest generation identifier found across the local data directories for a certain table.
+
+*NOTE:* Cassandra includes live data directories and backup directories but ignores snapshots directories when performing its startup scan. Therefore, there may be SSTables with the same identifier among all the data directories while being different SSTables. 
+The general identifiers based on the natural numbers aim to be unique per Cassandra node and table. This means that not only two SSTables of two different tables created on the same node may have the same identifiers and, thus, the same file names, but two different SSTables of the same table created on different nodes.
+
+As you might expect, there can be some maintenance problems due to the identifier properties discussed above. For example, as we illustrate below, truncation of a table triggers snapshot creation and removal of all SSTables from data directories. If the node is restarted and there is no SSTable created before that, the sequence is restarted from the beginning because there is no existing SSTable for identifying the last generated identifier:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /nb-1-big-Data.db
+----
+
+There is a snapshot made before the truncation - that is, Cassandra creates hard links to all the SSTables files in a snapshot directory then files are removed from the live data directory:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /snapshots
+        /truncated-<timestamp>-tab_bar
+          /nb-1-big-Data.db
+----
+
+When the node gets restarted, Cassandra forgets about the current sequence and starts it over; when a new SSTable is stored, it gets ‘1’ as the identifier:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /nb-1-big-Data.db
+      /snapshots
+        /truncated-<timestamp>-tab_bar
+          /nb-1-big-Data.db
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+
+
+data1/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+----
+
+The table directory name has an identifier `<id>`, which is unique for that table, and the same identifier is used for each table’s data directory on each Cassandra node.
+SSTable files have a precisely defined file name pattern, enabling Cassandra to determine the SSTable format, version, and order in which SSTables were created:
+
+*<version>* - The version identifier is made up of two lowercase letters. The letters denote the major and minor format versions (in the ancient Cassandra distributions, the version was denoted by one letter).
+
+*<generation id>* - This is the identifier that allows SSTables to be distinguished and the order of different SSTables.
+
+*<format>* - This is the SSTable format identifier. As mentioned, currently, the only existing format is BigTable, and its identifier is ‘big’.
+
+*<component>.<ext>*	- The component's name and the extension specific to that component.
+
+=== SSTable Identifiers
+
+SSTable identifiers (also known as generation identifiers) are used to distinguish and order different SSTables. Since an SSTable is created every time a table is flushed, many SSTables can exist simultaneously in the same directory. The generation identifier of a newly stored SSTable is guaranteed to be greater than any identifiers of previously-stored SSTables for a certain table on the node.
+Natural numbers are used as generation identifiers. Cassandra scans the directories on start up before any new SSTable is written, and the starting number is obtained by incrementing the largest generation identifier found across the local data directories for a certain table.
+
+*NOTE:* Cassandra includes live data directories and backup directories but ignores snapshots directories when performing its startup scan. Therefore, there may be SSTables with the same identifier among all the data directories while being different SSTables. 
+The general identifiers based on the natural numbers aim to be unique per Cassandra node and table. This means that not only two SSTables of two different tables created on the same node may have the same identifiers and, thus, the same file names, but two different SSTables of the same table created on different nodes.
+
+As you might expect, there can be some maintenance problems due to the identifier properties discussed above. For example, as we illustrate below, truncation of a table triggers snapshot creation and removal of all SSTables from data directories. If the node is restarted and there is no SSTable created before that, the sequence is restarted from the beginning because there is no existing SSTable for identifying the last generated identifier:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /nb-1-big-Data.db
+----
+
+There is a snapshot made before the truncation - that is, Cassandra creates hard links to all the SSTables files in a snapshot directory then files are removed from the live data directory:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /snapshots
+        /truncated-<timestamp>-tab_bar
+          /nb-1-big-Data.db
+----
+
+When the node gets restarted, Cassandra forgets about the current sequence and starts it over; when a new SSTable is stored, it gets ‘1’ as the identifier:
+
+----

Review Comment:
   ``````suggestion
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+
+
+data1/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+----
+
+The table directory name has an identifier `<id>`, which is unique for that table, and the same identifier is used for each table’s data directory on each Cassandra node.
+SSTable files have a precisely defined file name pattern, enabling Cassandra to determine the SSTable format, version, and order in which SSTables were created:
+
+*<version>* - The version identifier is made up of two lowercase letters. The letters denote the major and minor format versions (in the ancient Cassandra distributions, the version was denoted by one letter).
+
+*<generation id>* - This is the identifier that allows SSTables to be distinguished and the order of different SSTables.
+
+*<format>* - This is the SSTable format identifier. As mentioned, currently, the only existing format is BigTable, and its identifier is ‘big’.
+
+*<component>.<ext>*	- The component's name and the extension specific to that component.
+
+=== SSTable Identifiers
+
+SSTable identifiers (also known as generation identifiers) are used to distinguish and order different SSTables. Since an SSTable is created every time a table is flushed, many SSTables can exist simultaneously in the same directory. The generation identifier of a newly stored SSTable is guaranteed to be greater than any identifiers of previously-stored SSTables for a certain table on the node.
+Natural numbers are used as generation identifiers. Cassandra scans the directories on start up before any new SSTable is written, and the starting number is obtained by incrementing the largest generation identifier found across the local data directories for a certain table.
+
+*NOTE:* Cassandra includes live data directories and backup directories but ignores snapshots directories when performing its startup scan. Therefore, there may be SSTables with the same identifier among all the data directories while being different SSTables. 
+The general identifiers based on the natural numbers aim to be unique per Cassandra node and table. This means that not only two SSTables of two different tables created on the same node may have the same identifiers and, thus, the same file names, but two different SSTables of the same table created on different nodes.
+
+As you might expect, there can be some maintenance problems due to the identifier properties discussed above. For example, as we illustrate below, truncation of a table triggers snapshot creation and removal of all SSTables from data directories. If the node is restarted and there is no SSTable created before that, the sequence is restarted from the beginning because there is no existing SSTable for identifying the last generated identifier:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /nb-1-big-Data.db
+----
+
+There is a snapshot made before the truncation - that is, Cassandra creates hard links to all the SSTables files in a snapshot directory then files are removed from the live data directory:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /snapshots
+        /truncated-<timestamp>-tab_bar
+          /nb-1-big-Data.db
+----
+
+When the node gets restarted, Cassandra forgets about the current sequence and starts it over; when a new SSTable is stored, it gets ‘1’ as the identifier:
+
+----
+  /ks_foo
+    /tab_bar-<id>
+      /nb-1-big-Data.db
+      /snapshots
+        /truncated-<timestamp>-tab_bar
+          /nb-1-big-Data.db
+----
+
+As you can see, it is possible to have two SSTables with the same name but with potentially different content. This situation only becomes a problem when a user stores the SSTables in a different location for a backup. The backups of the SSTables will likely clash with the existing ones due to the identical file names.
+
+=== Introducing Globally Unique Identifiers
+
+To solve some of the problems with SSTable identifiers based on natural numbers, Cassandra 4.1 introduces the ability to switch to globally unique identifiers. These new identifiers are based on Time UUIDs (UUID type 1), though their string representation is different, making them lexically ordered and providing a little aid for the administrators. 
+The structure of a globally unique identifier is as follows:
+
+`<date part>_<time part>_<nano part><random part>`

Review Comment:
   ``````suggestion
   ```
   <date part>_<time part>_<nano part><random part>
   ```
   ``````



##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].

Review Comment:
   ```suggestion
   You can read more about the particular SSTable components of the BigTable format in the link:/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation^].
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org

[GitHub] [cassandra-website] ErickRamirezAU commented on a diff in pull request #141: CASSANDRA-17692 June 2022 blog "Apache Cassandra 4.1: New SSTable Identifiers"

Posted by GitBox <gi...@apache.org>.

ErickRamirezAU commented on code in PR #141:
URL: https://github.com/apache/cassandra-website/pull/141#discussion_r898723155


##########
site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.adoc:
##########
@@ -0,0 +1,138 @@
+= Apache Cassandra 4.1: New SSTable Identifiers
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: June 16, 2022
+:page-post-author: Jacek Lewandowski
+:description: SSTable Identifiers in Apache Cassandra 4.1
+:keywords: apache cassandra, 4.1, sstable
+
+:!figure-caption:
+
+.Image credit: https://unsplash.com/@qwitka[Maksym Kaharlytskyi on Unsplash^]
+image::blog/apache-cassandra-4.1-new-sstable-identifiers-unsplash-maksym-kaharlytskyi.jpg[SSTable Identifiers in Apache Cassandra 4.1]
+
+Apache Cassandra, like many other databases, stores data in files. These files are located in data directories and organized in SSTables. This post will discuss the directory layout and the naming pattern used for these files and explain the new naming pattern introduced in Apache Cassandra 4.1.
+
+=== SSTables
+
+SSTables are files where Cassandra stores data from tables. In a typical operation, an SSTable is created either as a result of flushing a http://distributeddatastore.blogspot.com/2020/03/cassandra-memtable.html[memtable to disk^] or a https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html[compaction process]. Each SSTable contains data from a single table, but for a single table, there are usually many SSTables.
+A single SSTable is made of multiple files, called components. These components are generally specific to the SSTable format. BigTable is the only format supported right now, and they are the only type of component you will see being created by Apache Cassandra (at least at the time of writing). For example, a single SSTable can be a set of such files:
+
+----
+    nb-1-big-CompressionInfo.db
+    nb-1-big-Data.db
+    nb-1-big-Digest.crc32
+    nb-1-big-Filter.db
+    nb-1-big-Index.db
+    nb-1-big-Statistics.db
+    nb-1-big-Summary.db
+    nb-1-big-TOC.txt
+----
+
+You can read more about the particular SSTable components of the BigTable format in the  https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html#sstables[documentation].
+
+=== Directory Layout and File Names
+
+SSTable files are stored in data directories. The directory layout consists of a directory per keyspace and a directory per table under the keyspace directory. 
+
+----
+data0/
+  /ks_foo
+    /tab_bar-<id>
+   	/<version>-<generation id>-<format>-<component>.<ext>
+

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org