You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Franc Carter <fr...@sirca.org.au> on 2013/06/07 06:44:06 UTC

Large number of files for Leveled Compaction

Hi,

We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks like
it may be a win for us.

The first step of testing was to push a fairly large slab of data into the
Column Family - we did this much faster (> x100) than we would in a
production environment. This has left the Column Family with about 140,000
files in the Column Family directory which seems way too high. On two of
the nodes the CompactionStats show 2 outstanding tasks and on a third node
there are over 13,000 outstanding tasks. However from looking at the log
activity it looks like compaction has finished on all nodes.

Is this number of files expected/normal ?

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 <ma...@sirca.org.au>

franc.carter@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Large number of files for Leveled Compaction

Posted by Franc Carter <fr...@sirca.org.au>.
On Mon, Jun 17, 2013 at 2:59 PM, Manoj Mainali <ma...@gmail.com>wrote:

> Not in the case of LeveledCompaction. Only SizeTieredCompaction merges
> smaller sstables into large ones. With the LeveledCompaction, the sstables
> are always of fixed size but they are grouped into different levels.
>
> You can refer to this page
> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra on
> details of how LeveledCompaction works.
>
>
Yes, but it seems I've misinterpreted that page ;-(

I took this paragraph

In figure 3, new sstables are added to the first level, L0, and immediately
> compacted with the sstables in L1 (blue). When L1 fills up, extra sstables
> are promoted to L2 (violet). Subsequent sstables generated in L1 will be
> compacted with the sstables in L2 with which they overlap. As more data is
> added, leveled compaction results in a situation like the one shown in
> figure 4.
>

to mean that once a level fills up it gets compacted into a higher level

cheers


> Cheers
> Manoj
>
>
> On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter <fr...@sirca.org.au>wrote:
>
>> On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali <ma...@gmail.com>wrote:
>>
>>> With LeveledCompaction, each sstable size is fixed and is defined by
>>> sstable_size_in_mb in the compaction configuration of CF definition and
>>> default value is 5MB. In you case, you may have not defined your own value,
>>> that is why your each sstable is 5MB. And if you dataset is huge, you will
>>> see a lot of sstable counts.
>>>
>>
>>
>> Ok, seems like I do have (at least) an incomplete understanding. I
>> realise that the minimum size is 5MB, but I thought compaction would merge
>> these into a smaller number of larger sstables ?
>>
>> thanks
>>
>>
>>> Cheers
>>>
>>> Manoj
>>>
>>>
>>> On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter <fr...@sirca.org.au>wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
>>>> like it may be a win for us.
>>>>
>>>> The first step of testing was to push a fairly large slab of data into
>>>> the Column Family - we did this much faster (> x100) than we would in a
>>>> production environment. This has left the Column Family with about 140,000
>>>> files in the Column Family directory which seems way too high. On two of
>>>> the nodes the CompactionStats show 2 outstanding tasks and on a third node
>>>> there are over 13,000 outstanding tasks. However from looking at the log
>>>> activity it looks like compaction has finished on all nodes.
>>>>
>>>> Is this number of files expected/normal ?
>>>>
>>>> cheers
>>>>
>>>> --
>>>>
>>>> *Franc Carter* | Systems architect | Sirca Ltd
>>>>  <ma...@sirca.org.au>
>>>>
>>>> franc.carter@sirca.org.au | www.sirca.org.au
>>>>
>>>> Tel: +61 2 8355 2514
>>>>
>>>> Level 4, 55 Harrington St, The Rocks NSW 2000
>>>>
>>>> PO Box H58, Australia Square, Sydney NSW 1215
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  <ma...@sirca.org.au>
>>
>> franc.carter@sirca.org.au | www.sirca.org.au
>>
>> Tel: +61 2 8355 2514
>>
>> Level 4, 55 Harrington St, The Rocks NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 <ma...@sirca.org.au>

franc.carter@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Large number of files for Leveled Compaction

Posted by Manoj Mainali <ma...@gmail.com>.
Not in the case of LeveledCompaction. Only SizeTieredCompaction merges
smaller sstables into large ones. With the LeveledCompaction, the sstables
are always of fixed size but they are grouped into different levels.

You can refer to this page
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra on
details of how LeveledCompaction works.

Cheers
Manoj


On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter <fr...@sirca.org.au>wrote:

> On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali <ma...@gmail.com>wrote:
>
>> With LeveledCompaction, each sstable size is fixed and is defined by
>> sstable_size_in_mb in the compaction configuration of CF definition and
>> default value is 5MB. In you case, you may have not defined your own value,
>> that is why your each sstable is 5MB. And if you dataset is huge, you will
>> see a lot of sstable counts.
>>
>
>
> Ok, seems like I do have (at least) an incomplete understanding. I realise
> that the minimum size is 5MB, but I thought compaction would merge these
> into a smaller number of larger sstables ?
>
> thanks
>
>
>> Cheers
>>
>> Manoj
>>
>>
>> On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter <fr...@sirca.org.au>wrote:
>>
>>>
>>> Hi,
>>>
>>> We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
>>> like it may be a win for us.
>>>
>>> The first step of testing was to push a fairly large slab of data into
>>> the Column Family - we did this much faster (> x100) than we would in a
>>> production environment. This has left the Column Family with about 140,000
>>> files in the Column Family directory which seems way too high. On two of
>>> the nodes the CompactionStats show 2 outstanding tasks and on a third node
>>> there are over 13,000 outstanding tasks. However from looking at the log
>>> activity it looks like compaction has finished on all nodes.
>>>
>>> Is this number of files expected/normal ?
>>>
>>> cheers
>>>
>>> --
>>>
>>> *Franc Carter* | Systems architect | Sirca Ltd
>>>  <ma...@sirca.org.au>
>>>
>>> franc.carter@sirca.org.au | www.sirca.org.au
>>>
>>> Tel: +61 2 8355 2514
>>>
>>> Level 4, 55 Harrington St, The Rocks NSW 2000
>>>
>>> PO Box H58, Australia Square, Sydney NSW 1215
>>>
>>>
>>>
>>
>
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  <ma...@sirca.org.au>
>
> franc.carter@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 8355 2514
>
> Level 4, 55 Harrington St, The Rocks NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>
>

Re: Large number of files for Leveled Compaction

Posted by Franc Carter <fr...@sirca.org.au>.
On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali <ma...@gmail.com>wrote:

> With LeveledCompaction, each sstable size is fixed and is defined by
> sstable_size_in_mb in the compaction configuration of CF definition and
> default value is 5MB. In you case, you may have not defined your own value,
> that is why your each sstable is 5MB. And if you dataset is huge, you will
> see a lot of sstable counts.
>


Ok, seems like I do have (at least) an incomplete understanding. I realise
that the minimum size is 5MB, but I thought compaction would merge these
into a smaller number of larger sstables ?

thanks


> Cheers
>
> Manoj
>
>
> On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter <fr...@sirca.org.au>wrote:
>
>>
>> Hi,
>>
>> We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
>> like it may be a win for us.
>>
>> The first step of testing was to push a fairly large slab of data into
>> the Column Family - we did this much faster (> x100) than we would in a
>> production environment. This has left the Column Family with about 140,000
>> files in the Column Family directory which seems way too high. On two of
>> the nodes the CompactionStats show 2 outstanding tasks and on a third node
>> there are over 13,000 outstanding tasks. However from looking at the log
>> activity it looks like compaction has finished on all nodes.
>>
>> Is this number of files expected/normal ?
>>
>> cheers
>>
>> --
>>
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  <ma...@sirca.org.au>
>>
>> franc.carter@sirca.org.au | www.sirca.org.au
>>
>> Tel: +61 2 8355 2514
>>
>> Level 4, 55 Harrington St, The Rocks NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 <ma...@sirca.org.au>

franc.carter@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Large number of files for Leveled Compaction

Posted by Manoj Mainali <ma...@gmail.com>.
With LeveledCompaction, each sstable size is fixed and is defined by
sstable_size_in_mb in the compaction configuration of CF definition and
default value is 5MB. In you case, you may have not defined your own value,
that is why your each sstable is 5MB. And if you dataset is huge, you will
see a lot of sstable counts.

Cheers

Manoj


On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter <fr...@sirca.org.au>wrote:

>
> Hi,
>
> We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
> like it may be a win for us.
>
> The first step of testing was to push a fairly large slab of data into the
> Column Family - we did this much faster (> x100) than we would in a
> production environment. This has left the Column Family with about 140,000
> files in the Column Family directory which seems way too high. On two of
> the nodes the CompactionStats show 2 outstanding tasks and on a third node
> there are over 13,000 outstanding tasks. However from looking at the log
> activity it looks like compaction has finished on all nodes.
>
> Is this number of files expected/normal ?
>
> cheers
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  <ma...@sirca.org.au>
>
> franc.carter@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 8355 2514
>
> Level 4, 55 Harrington St, The Rocks NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>
>

Re: Large number of files for Leveled Compaction

Posted by Franc Carter <fr...@sirca.org.au>.
On Fri, Jun 7, 2013 at 2:44 PM, Franc Carter <fr...@sirca.org.au>wrote:

>
> Hi,
>
> We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
> like it may be a win for us.
>
> The first step of testing was to push a fairly large slab of data into the
> Column Family - we did this much faster (> x100) than we would in a
> production environment. This has left the Column Family with about 140,000
> files in the Column Family directory which seems way too high. On two of
> the nodes the CompactionStats show 2 outstanding tasks and on a third node
> there are over 13,000 outstanding tasks. However from looking at the log
> activity it looks like compaction has finished on all nodes.
>
> Is this number of files expected/normal ?
>

An addendum to this.

None of the files are *Data.db bigger than 5MB (including on the nodes that
have finished compaction). I'm wondering if I have misunderstood Leveled
Compaction, I thought that there should be data files of 50MB and 500MB
(the dataset is 190GB)

cheers


>
> cheers
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  <ma...@sirca.org.au>
>
> franc.carter@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 8355 2514
>
> Level 4, 55 Harrington St, The Rocks NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 <ma...@sirca.org.au>

franc.carter@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215