You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Léo FERLIN SUTTON <lf...@mailjet.com.INVALID> on 2019/04/04 15:23:36 UTC
SStable format change in 3.0.18 ?
Hello !
I have noticed something since I upgraded to cassandra 3.0.18.
Before all my Sstable used to be named this way :
```
mc-130817-big-CompressionInfo.db
mc-130817-big-Data.db
mc-130817-big-Digest.crc32
mc-130817-big-Filter.db
mc-130817-big-Index.db
mc-130817-big-Statistics.db
mc-130817-big-Summary.db
mc-130817-big-TOC.txt
```
Since the update I have a new type of files :
```
md-20631-big-Statistics.db
md-20631-big-Filter.db
md-20631-big-TOC.txt
md-20631-big-Summary.db
md-20631-big-CompressionInfo.db
md-20631-big-Data.db
md-20631-big-Digest.crc32
md-20631-big-Index.db
```
Starting with `md` mixed with my the ancient format starting with "mc".
Other than the name these files seems identical to regular Sstables. I
haven't seen any information about this in the changelog :
``` (lines with "sstables" from the changelog)
* Fix handling of collection tombstones for dropped columns from
legacy sstables (CASSANDRA-14912)
* Fix missing rows when reading 2.1 SSTables with static columns in
3.0 (CASSANDRA-14873)
* Sstable min/max metadata can cause data loss (CASSANDRA-14861)
* Dropped columns can cause reverse sstable iteration to return
prematurely (CASSANDRA-14838)
* Legacy sstables with multi block range tombstones create invalid
bound sequences (CASSANDRA-14823)
* Handle failures in parallelAllSSTableOperation
(cleanup/upgradesstables/etc) (CASSANDRA-14657)
* sstableloader should use discovered broadcast address to connect
intra-cluster (CASSANDRA-14522)
```
I am asking because I have read online that : The "mc" is the SSTable file
version. This changes whenever a new release of Cassandra changes anything
in the way data is stored in any of the files listed in the table above.
https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
Does anyone have any information about this ?
Regards,
Leo
Re: SStable format change in 3.0.18 ?
Posted by Jeff Jirsa <jj...@gmail.com>.
This is CASSANDRA-14861
--
Jeff Jirsa
> On Apr 4, 2019, at 8:23 AM, Léo FERLIN SUTTON <lf...@mailjet.com.invalid> wrote:
>
> Hello !
>
> I have noticed something since I upgraded to cassandra 3.0.18.
>
> Before all my Sstable used to be named this way :
> ```
> mc-130817-big-CompressionInfo.db
> mc-130817-big-Data.db
> mc-130817-big-Digest.crc32
> mc-130817-big-Filter.db
> mc-130817-big-Index.db
> mc-130817-big-Statistics.db
> mc-130817-big-Summary.db
> mc-130817-big-TOC.txt
> ```
>
> Since the update I have a new type of files :
>
> ```
> md-20631-big-Statistics.db
> md-20631-big-Filter.db
> md-20631-big-TOC.txt
> md-20631-big-Summary.db
> md-20631-big-CompressionInfo.db
> md-20631-big-Data.db
> md-20631-big-Digest.crc32
> md-20631-big-Index.db
> ```
>
> Starting with `md` mixed with my the ancient format starting with "mc".
>
> Other than the name these files seems identical to regular Sstables. I haven't seen any information about this in the changelog :
> ``` (lines with "sstables" from the changelog)
> * Fix handling of collection tombstones for dropped columns from legacy sstables (CASSANDRA-14912)
> * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 (CASSANDRA-14873)
> * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
> * Dropped columns can cause reverse sstable iteration to return prematurely (CASSANDRA-14838)
> * Legacy sstables with multi block range tombstones create invalid bound sequences (CASSANDRA-14823)
> * Handle failures in parallelAllSSTableOperation (cleanup/upgradesstables/etc) (CASSANDRA-14657)
> * sstableloader should use discovered broadcast address to connect intra-cluster (CASSANDRA-14522)
> ```
>
> I am asking because I have read online that : The "mc" is the SSTable file version. This changes whenever a new release of Cassandra changes anything in the way data is stored in any of the files listed in the table above. https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
>
> Does anyone have any information about this ?
>
> Regards,
>
> Leo
Re: SStable format change in 3.0.18 ?
Posted by Léo FERLIN SUTTON <lf...@mailjet.com.INVALID>.
Thank you guys !
On Thu, Apr 4, 2019 at 5:49 PM Dmitry Saprykin <sa...@gmail.com>
wrote:
> Hello,
>
> I think it was done in the following issue: Sstable min/max metadata can
> cause data loss (CASSANDRA-14861)
>
>
> https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f
> src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
> <https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f#diff-62875acfa21fb24c7167a0a2d761780e> :
> 129
> // md (3.0.18, 3.11.4): corrected sstable min/max clustering
>
> Dmitry Saprykin
>
> On Thu, Apr 4, 2019 at 11:23 AM Léo FERLIN SUTTON
> <lf...@mailjet.com.invalid> wrote:
>
>> Hello !
>>
>> I have noticed something since I upgraded to cassandra 3.0.18.
>>
>> Before all my Sstable used to be named this way :
>> ```
>> mc-130817-big-CompressionInfo.db
>> mc-130817-big-Data.db
>> mc-130817-big-Digest.crc32
>> mc-130817-big-Filter.db
>> mc-130817-big-Index.db
>> mc-130817-big-Statistics.db
>> mc-130817-big-Summary.db
>> mc-130817-big-TOC.txt
>> ```
>>
>> Since the update I have a new type of files :
>>
>> ```
>> md-20631-big-Statistics.db
>> md-20631-big-Filter.db
>> md-20631-big-TOC.txt
>> md-20631-big-Summary.db
>> md-20631-big-CompressionInfo.db
>> md-20631-big-Data.db
>> md-20631-big-Digest.crc32
>> md-20631-big-Index.db
>> ```
>>
>> Starting with `md` mixed with my the ancient format starting with "mc".
>>
>> Other than the name these files seems identical to regular Sstables. I
>> haven't seen any information about this in the changelog :
>> ``` (lines with "sstables" from the changelog)
>>
>> * Fix handling of collection tombstones for dropped columns from legacy sstables (CASSANDRA-14912)
>> * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 (CASSANDRA-14873)
>> * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
>> * Dropped columns can cause reverse sstable iteration to return prematurely (CASSANDRA-14838)
>> * Legacy sstables with multi block range tombstones create invalid bound sequences (CASSANDRA-14823)
>> * Handle failures in parallelAllSSTableOperation (cleanup/upgradesstables/etc) (CASSANDRA-14657)
>> * sstableloader should use discovered broadcast address to connect intra-cluster (CASSANDRA-14522)
>>
>> ```
>>
>> I am asking because I have read online that : The "mc" is the SSTable
>> file version. This changes whenever a new release of Cassandra changes
>> anything in the way data is stored in any of the files listed in the table
>> above.
>> https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
>>
>> Does anyone have any information about this ?
>>
>> Regards,
>>
>> Leo
>>
>
Re: SStable format change in 3.0.18 ?
Posted by Dmitry Saprykin <sa...@gmail.com>.
Hello,
I think it was done in the following issue: Sstable min/max metadata can
cause data loss (CASSANDRA-14861)
https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f
src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
<https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f#diff-62875acfa21fb24c7167a0a2d761780e>
:
129
// md (3.0.18, 3.11.4): corrected sstable min/max clustering
Dmitry Saprykin
On Thu, Apr 4, 2019 at 11:23 AM Léo FERLIN SUTTON
<lf...@mailjet.com.invalid> wrote:
> Hello !
>
> I have noticed something since I upgraded to cassandra 3.0.18.
>
> Before all my Sstable used to be named this way :
> ```
> mc-130817-big-CompressionInfo.db
> mc-130817-big-Data.db
> mc-130817-big-Digest.crc32
> mc-130817-big-Filter.db
> mc-130817-big-Index.db
> mc-130817-big-Statistics.db
> mc-130817-big-Summary.db
> mc-130817-big-TOC.txt
> ```
>
> Since the update I have a new type of files :
>
> ```
> md-20631-big-Statistics.db
> md-20631-big-Filter.db
> md-20631-big-TOC.txt
> md-20631-big-Summary.db
> md-20631-big-CompressionInfo.db
> md-20631-big-Data.db
> md-20631-big-Digest.crc32
> md-20631-big-Index.db
> ```
>
> Starting with `md` mixed with my the ancient format starting with "mc".
>
> Other than the name these files seems identical to regular Sstables. I
> haven't seen any information about this in the changelog :
> ``` (lines with "sstables" from the changelog)
>
> * Fix handling of collection tombstones for dropped columns from legacy sstables (CASSANDRA-14912)
> * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 (CASSANDRA-14873)
> * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
> * Dropped columns can cause reverse sstable iteration to return prematurely (CASSANDRA-14838)
> * Legacy sstables with multi block range tombstones create invalid bound sequences (CASSANDRA-14823)
> * Handle failures in parallelAllSSTableOperation (cleanup/upgradesstables/etc) (CASSANDRA-14657)
> * sstableloader should use discovered broadcast address to connect intra-cluster (CASSANDRA-14522)
>
> ```
>
> I am asking because I have read online that : The "mc" is the SSTable file
> version. This changes whenever a new release of Cassandra changes anything
> in the way data is stored in any of the files listed in the table above.
> https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
>
> Does anyone have any information about this ?
>
> Regards,
>
> Leo
>