You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Léo FERLIN SUTTON <lf...@mailjet.com.INVALID> on 2019/04/04 15:23:36 UTC

SStable format change in 3.0.18 ?

Hello !

I have noticed something since I upgraded to cassandra 3.0.18.

Before all my Sstable used to be named this way :
```
mc-130817-big-CompressionInfo.db
mc-130817-big-Data.db
mc-130817-big-Digest.crc32
mc-130817-big-Filter.db
mc-130817-big-Index.db
mc-130817-big-Statistics.db
mc-130817-big-Summary.db
mc-130817-big-TOC.txt
```

Since the update I have a new type of files :

```
md-20631-big-Statistics.db
md-20631-big-Filter.db
md-20631-big-TOC.txt
md-20631-big-Summary.db
md-20631-big-CompressionInfo.db
md-20631-big-Data.db
md-20631-big-Digest.crc32
md-20631-big-Index.db
```

Starting with `md` mixed with my the ancient format starting with "mc".

Other than the name these files seems identical to regular Sstables. I
haven't seen any information about this in the changelog :
``` (lines with "sstables" from the changelog)

 * Fix handling of collection tombstones for dropped columns from
legacy sstables (CASSANDRA-14912)
 * Fix missing rows when reading 2.1 SSTables with static columns in
3.0 (CASSANDRA-14873)
 * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
 * Dropped columns can cause reverse sstable iteration to return
prematurely (CASSANDRA-14838)
 * Legacy sstables with  multi block range tombstones create invalid
bound sequences (CASSANDRA-14823)
 * Handle failures in parallelAllSSTableOperation
(cleanup/upgradesstables/etc) (CASSANDRA-14657)
 * sstableloader should use discovered broadcast address to connect
intra-cluster (CASSANDRA-14522)

```

I am asking because I have read online that : The "mc" is the SSTable file
version. This changes whenever a new release of Cassandra changes anything
in the way data is stored in any of the files listed in the table above.
https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/

Does anyone have any information about this ?

Regards,

Leo

Re: SStable format change in 3.0.18 ?

Posted by Jeff Jirsa <jj...@gmail.com>.
This is CASSANDRA-14861



-- 
Jeff Jirsa


> On Apr 4, 2019, at 8:23 AM, Léo FERLIN SUTTON <lf...@mailjet.com.invalid> wrote:
> 
> Hello !
> 
> I have noticed something since I upgraded to cassandra 3.0.18.
> 
> Before all my Sstable used to be named this way :
> ```
> mc-130817-big-CompressionInfo.db
> mc-130817-big-Data.db
> mc-130817-big-Digest.crc32
> mc-130817-big-Filter.db
> mc-130817-big-Index.db
> mc-130817-big-Statistics.db
> mc-130817-big-Summary.db
> mc-130817-big-TOC.txt
> ```
> 
> Since the update I have a new type of files :
> 
> ```
> md-20631-big-Statistics.db
> md-20631-big-Filter.db
> md-20631-big-TOC.txt
> md-20631-big-Summary.db
> md-20631-big-CompressionInfo.db
> md-20631-big-Data.db
> md-20631-big-Digest.crc32
> md-20631-big-Index.db
> ```
> 
> Starting with `md` mixed with my the ancient format starting with "mc".
> 
> Other than the name these files seems identical to regular Sstables. I haven't seen any information about this in the changelog :
> ``` (lines with "sstables" from the changelog)
>  * Fix handling of collection tombstones for dropped columns from legacy sstables (CASSANDRA-14912)
>  * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 (CASSANDRA-14873)
>  * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
>  * Dropped columns can cause reverse sstable iteration to return prematurely (CASSANDRA-14838)
>  * Legacy sstables with  multi block range tombstones create invalid bound sequences (CASSANDRA-14823)
>  * Handle failures in parallelAllSSTableOperation (cleanup/upgradesstables/etc) (CASSANDRA-14657)
>  * sstableloader should use discovered broadcast address to connect intra-cluster (CASSANDRA-14522)
> ```
> 
> I am asking because I have read online that : The "mc" is the SSTable file version. This changes whenever a new release of Cassandra changes anything in the way data is stored in any of the files listed in the table above. https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
> 
> Does anyone have any information about this ?
>  
> Regards,
> 
> Leo

Re: SStable format change in 3.0.18 ?

Posted by Léo FERLIN SUTTON <lf...@mailjet.com.INVALID>.
Thank you guys !



On Thu, Apr 4, 2019 at 5:49 PM Dmitry Saprykin <sa...@gmail.com>
wrote:

> Hello,
>
> I think it was done in the following issue: Sstable min/max metadata can
> cause data loss (CASSANDRA-14861)
>
>
> https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f
> src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
> <https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f#diff-62875acfa21fb24c7167a0a2d761780e> :
> 129
> // md (3.0.18, 3.11.4): corrected sstable min/max clustering
>
> Dmitry Saprykin
>
> On Thu, Apr 4, 2019 at 11:23 AM Léo FERLIN SUTTON
> <lf...@mailjet.com.invalid> wrote:
>
>> Hello !
>>
>> I have noticed something since I upgraded to cassandra 3.0.18.
>>
>> Before all my Sstable used to be named this way :
>> ```
>> mc-130817-big-CompressionInfo.db
>> mc-130817-big-Data.db
>> mc-130817-big-Digest.crc32
>> mc-130817-big-Filter.db
>> mc-130817-big-Index.db
>> mc-130817-big-Statistics.db
>> mc-130817-big-Summary.db
>> mc-130817-big-TOC.txt
>> ```
>>
>> Since the update I have a new type of files :
>>
>> ```
>> md-20631-big-Statistics.db
>> md-20631-big-Filter.db
>> md-20631-big-TOC.txt
>> md-20631-big-Summary.db
>> md-20631-big-CompressionInfo.db
>> md-20631-big-Data.db
>> md-20631-big-Digest.crc32
>> md-20631-big-Index.db
>> ```
>>
>> Starting with `md` mixed with my the ancient format starting with "mc".
>>
>> Other than the name these files seems identical to regular Sstables. I
>> haven't seen any information about this in the changelog :
>> ``` (lines with "sstables" from the changelog)
>>
>>  * Fix handling of collection tombstones for dropped columns from legacy sstables (CASSANDRA-14912)
>>  * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 (CASSANDRA-14873)
>>  * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
>>  * Dropped columns can cause reverse sstable iteration to return prematurely (CASSANDRA-14838)
>>  * Legacy sstables with  multi block range tombstones create invalid bound sequences (CASSANDRA-14823)
>>  * Handle failures in parallelAllSSTableOperation (cleanup/upgradesstables/etc) (CASSANDRA-14657)
>>  * sstableloader should use discovered broadcast address to connect intra-cluster (CASSANDRA-14522)
>>
>> ```
>>
>> I am asking because I have read online that : The "mc" is the SSTable
>> file version. This changes whenever a new release of Cassandra changes
>> anything in the way data is stored in any of the files listed in the table
>> above.
>> https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
>>
>> Does anyone have any information about this ?
>>
>> Regards,
>>
>> Leo
>>
>

Re: SStable format change in 3.0.18 ?

Posted by Dmitry Saprykin <sa...@gmail.com>.
Hello,

I think it was done in the following issue: Sstable min/max metadata can
cause data loss (CASSANDRA-14861)

https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f
src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
<https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f#diff-62875acfa21fb24c7167a0a2d761780e>
:
129
// md (3.0.18, 3.11.4): corrected sstable min/max clustering

Dmitry Saprykin

On Thu, Apr 4, 2019 at 11:23 AM Léo FERLIN SUTTON
<lf...@mailjet.com.invalid> wrote:

> Hello !
>
> I have noticed something since I upgraded to cassandra 3.0.18.
>
> Before all my Sstable used to be named this way :
> ```
> mc-130817-big-CompressionInfo.db
> mc-130817-big-Data.db
> mc-130817-big-Digest.crc32
> mc-130817-big-Filter.db
> mc-130817-big-Index.db
> mc-130817-big-Statistics.db
> mc-130817-big-Summary.db
> mc-130817-big-TOC.txt
> ```
>
> Since the update I have a new type of files :
>
> ```
> md-20631-big-Statistics.db
> md-20631-big-Filter.db
> md-20631-big-TOC.txt
> md-20631-big-Summary.db
> md-20631-big-CompressionInfo.db
> md-20631-big-Data.db
> md-20631-big-Digest.crc32
> md-20631-big-Index.db
> ```
>
> Starting with `md` mixed with my the ancient format starting with "mc".
>
> Other than the name these files seems identical to regular Sstables. I
> haven't seen any information about this in the changelog :
> ``` (lines with "sstables" from the changelog)
>
>  * Fix handling of collection tombstones for dropped columns from legacy sstables (CASSANDRA-14912)
>  * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 (CASSANDRA-14873)
>  * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
>  * Dropped columns can cause reverse sstable iteration to return prematurely (CASSANDRA-14838)
>  * Legacy sstables with  multi block range tombstones create invalid bound sequences (CASSANDRA-14823)
>  * Handle failures in parallelAllSSTableOperation (cleanup/upgradesstables/etc) (CASSANDRA-14657)
>  * sstableloader should use discovered broadcast address to connect intra-cluster (CASSANDRA-14522)
>
> ```
>
> I am asking because I have read online that : The "mc" is the SSTable file
> version. This changes whenever a new release of Cassandra changes anything
> in the way data is stored in any of the files listed in the table above.
> https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
>
> Does anyone have any information about this ?
>
> Regards,
>
> Leo
>