You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by chandra Varahala <ha...@gmail.com> on 2013/07/11 20:34:08 UTC

merge sstables

Hello ,

 I have small size of sstables like 5mb around 2000 files. Is there a way i
can merge into  bigger size ?

thanks
chandra

Re: merge sstables

Posted by sankalp kohli <ko...@gmail.com>.
He has around 10G of data so should not be bad. This problem is if you have
lot of data.


On Thu, Jul 11, 2013 at 2:10 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Jul 11, 2013 at 1:52 PM, sankalp kohli <ko...@gmail.com>wrote:
>
>> Scrub will keep the file size same. YOu need to move all sstables to be
>> L0. the way to do this is to remove the json file which has level
>> information.
>>
>
> This will work, but I believe is subject to this?
>
> "./src/java/org/apache/cassandra/db/compaction/LeveledManifest.java" line
> 228 of 577
> "
>         // LevelDB gives each level a score of how much data it contains
> vs its ideal amount, and
>         // compacts the level with the highest score. But this falls apart
> spectacularly once you
>         // get behind.  Consider this set of levels:
>         // L0: 988 [ideal: 4]
>         // L1: 117 [ideal: 10]
>         // L2: 12  [ideal: 100]
>         //
>         // The problem is that L0 has a much higher score (almost 250)
> than L1 (11), so what we'll
>         // do is compact a batch of MAX_COMPACTING_L0 sstables with all
> 117 L1 sstables, and put the
>         // result (say, 120 sstables) in L1. Then we'll compact the next
> batch of MAX_COMPACTING_L0,
>         // and so forth.  So we spend most of our i/o rewriting the L1
> data with each batch.
>         //
>         // If we could just do *all* L0 a single time with L1, that would
> be ideal.  But we can't
>         // -- see the javadoc for MAX_COMPACTING_L0.
>         //
>         // LevelDB's way around this is to simply block writes if L0
> compaction falls behind.
>         // We don't have that luxury.
>         //
>         // So instead, we force compacting higher levels first.  This may
> not minimize the number
>         // of reads done as quickly in the short term, but it minimizes
> the i/o needed to compact
>         // optimially which gives us a long term win.
> "
>
> Ideal would be something like a major compaction for LCS which allows end
> user to change resulting SSTable sizes without forcing everything back to
> L0.
>
> =Rob
>
>

Re: merge sstables

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Jul 11, 2013 at 1:52 PM, sankalp kohli <ko...@gmail.com>wrote:

> Scrub will keep the file size same. YOu need to move all sstables to be
> L0. the way to do this is to remove the json file which has level
> information.
>

This will work, but I believe is subject to this?

"./src/java/org/apache/cassandra/db/compaction/LeveledManifest.java" line
228 of 577
"
        // LevelDB gives each level a score of how much data it contains vs
its ideal amount, and
        // compacts the level with the highest score. But this falls apart
spectacularly once you
        // get behind.  Consider this set of levels:
        // L0: 988 [ideal: 4]
        // L1: 117 [ideal: 10]
        // L2: 12  [ideal: 100]
        //
        // The problem is that L0 has a much higher score (almost 250) than
L1 (11), so what we'll
        // do is compact a batch of MAX_COMPACTING_L0 sstables with all 117
L1 sstables, and put the
        // result (say, 120 sstables) in L1. Then we'll compact the next
batch of MAX_COMPACTING_L0,
        // and so forth.  So we spend most of our i/o rewriting the L1 data
with each batch.
        //
        // If we could just do *all* L0 a single time with L1, that would
be ideal.  But we can't
        // -- see the javadoc for MAX_COMPACTING_L0.
        //
        // LevelDB's way around this is to simply block writes if L0
compaction falls behind.
        // We don't have that luxury.
        //
        // So instead, we force compacting higher levels first.  This may
not minimize the number
        // of reads done as quickly in the short term, but it minimizes the
i/o needed to compact
        // optimially which gives us a long term win.
"

Ideal would be something like a major compaction for LCS which allows end
user to change resulting SSTable sizes without forcing everything back to
L0.

=Rob

Re: merge sstables

Posted by sankalp kohli <ko...@gmail.com>.
Scrub will keep the file size same. YOu need to move all sstables to be L0.
the way to do this is to remove the json file which has level information.


On Thu, Jul 11, 2013 at 11:48 AM, chandra Varahala <
hadoopandcassandra@gmail.com> wrote:

> yes, but nodetool  scrub is not working ..
>
>
> thanks
> chandra
>
>
> On Thu, Jul 11, 2013 at 2:39 PM, Faraaz Sareshwala <
> fsareshwala@quantcast.com> wrote:
>
>> I assume you are using the leveled compaction strategy because you have
>> 5mb
>> sstables and 5mb is the default size for leveled compaction.
>>
>> To change this default, you can run the following in the cassandra-cli:
>>
>> update column family cf_name with compaction_strategy_options =
>> {sstable_size_in_mb: 256};
>>
>> To force the current sstables to be rewritten, I think you'll need to
>> issue a
>> nodetool scrub on each node. Someone please correct me if I'm wrong on
>> this.
>>
>> Faraaz
>>
>> On Thu, Jul 11, 2013 at 11:34:08AM -0700, chandra Varahala wrote:
>> > Hello ,
>> >
>> >  I have small size of sstables like 5mb around 2000 files. Is there a
>> way i can
>> > merge into  bigger size ?
>> >
>> > thanks
>> > chandra
>>
>
>

Re: merge sstables

Posted by chandra Varahala <ha...@gmail.com>.
yes, but nodetool  scrub is not working ..


thanks
chandra


On Thu, Jul 11, 2013 at 2:39 PM, Faraaz Sareshwala <
fsareshwala@quantcast.com> wrote:

> I assume you are using the leveled compaction strategy because you have 5mb
> sstables and 5mb is the default size for leveled compaction.
>
> To change this default, you can run the following in the cassandra-cli:
>
> update column family cf_name with compaction_strategy_options =
> {sstable_size_in_mb: 256};
>
> To force the current sstables to be rewritten, I think you'll need to
> issue a
> nodetool scrub on each node. Someone please correct me if I'm wrong on
> this.
>
> Faraaz
>
> On Thu, Jul 11, 2013 at 11:34:08AM -0700, chandra Varahala wrote:
> > Hello ,
> >
> >  I have small size of sstables like 5mb around 2000 files. Is there a
> way i can
> > merge into  bigger size ?
> >
> > thanks
> > chandra
>

Re: merge sstables

Posted by Faraaz Sareshwala <fs...@quantcast.com>.
I assume you are using the leveled compaction strategy because you have 5mb
sstables and 5mb is the default size for leveled compaction.

To change this default, you can run the following in the cassandra-cli:

update column family cf_name with compaction_strategy_options = {sstable_size_in_mb: 256};

To force the current sstables to be rewritten, I think you'll need to issue a
nodetool scrub on each node. Someone please correct me if I'm wrong on this.

Faraaz

On Thu, Jul 11, 2013 at 11:34:08AM -0700, chandra Varahala wrote:
> Hello ,
> 
>  I have small size of sstables like 5mb around 2000 files. Is there a way i can
> merge into  bigger size ?
> 
> thanks
> chandra