You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Yang <te...@gmail.com> on 2011/09/23 03:45:21 UTC

is it possible for light-traffic CF to hold down many commit logs?

in 1.0.0  we don't have memtable_throughput for each individual CF ,
and instead
which memtable/CF to flush is determined by "largest
getTotalMemtableLiveSize() ".
(MeteredFlusher.java line 81)


what would happen in the following case ? : I have only 2 CF, the
traffic for one CF is 1000 times that
of the second CF,
so the high-traffic CF constantly triggers total mem threshold , and
every time, the busy CF is flushed.

but the light-traffic CF is never flushed ( well, until we have
flushed about 1000 times the busy CF),
now we are left with many commit logs , each of them containing a few
entries for the light-traffic table. we have to keep these commit logs
because these entries are not flushed to sstable yet.

then there are 2 problems: 1) to persist the few records from the
light-traffic CF, you have to keep 1000 times the commit logs
necessary, taking up disk space 2) when you do a recover on server
restart, you'll have to read through all those commit logs .

does the above hypothesis sound right?

Thanks
Yang

Re: is it possible for light-traffic CF to hold down many commit logs?

Posted by Yang <te...@gmail.com>.
Thanks Sylvain, this is exactly what I need.




On Fri, Sep 23, 2011 at 12:10 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> In 1.0.0, you have:
>
> # Total space to use for commitlogs.
> # If space gets above this value (it will round up to the next nearest
> # segment multiple), Cassandra will flush every dirty CF in the oldest
> # segment and remove it.
> # commitlog_total_space_in_mb: 4096
>
> In 0.8, you're supposed to use the memtableFlushAfterMins property
> for each CF to avoid filling up your commit log partition. Which is a
> little more involved, but that is why we have improved that in 1.0.
>
> --
> Sylvain
>
>
> On Fri, Sep 23, 2011 at 7:47 AM, Yang <te...@gmail.com> wrote:
>> thanks for the input.
>>
>> if that's the case, I think the solution would be to sort the CFs to
>> flush by a more complex criteria than just size. for example the
>> number of dirty commit logs that contain this CF should be considered
>> as a score.
>>
>> Yang
>>
>> On Thu, Sep 22, 2011 at 10:40 PM, Philippe <wa...@gmail.com> wrote:
>>> It sure looks like what I'm seeing on my cluster where a 100G commit lot
>>> partition fills up in 12 hours (0.8.x)
>>>
>>> Le 23 sept. 2011 03:45, "Yang" <te...@gmail.com> a écrit :
>>>> in 1.0.0 we don't have memtable_throughput for each individual CF ,
>>>> and instead
>>>> which memtable/CF to flush is determined by "largest
>>>> getTotalMemtableLiveSize() ".
>>>> (MeteredFlusher.java line 81)
>>>>
>>>>
>>>> what would happen in the following case ? : I have only 2 CF, the
>>>> traffic for one CF is 1000 times that
>>>> of the second CF,
>>>> so the high-traffic CF constantly triggers total mem threshold , and
>>>> every time, the busy CF is flushed.
>>>>
>>>> but the light-traffic CF is never flushed ( well, until we have
>>>> flushed about 1000 times the busy CF),
>>>> now we are left with many commit logs , each of them containing a few
>>>> entries for the light-traffic table. we have to keep these commit logs
>>>> because these entries are not flushed to sstable yet.
>>>>
>>>> then there are 2 problems: 1) to persist the few records from the
>>>> light-traffic CF, you have to keep 1000 times the commit logs
>>>> necessary, taking up disk space 2) when you do a recover on server
>>>> restart, you'll have to read through all those commit logs .
>>>>
>>>> does the above hypothesis sound right?
>>>>
>>>> Thanks
>>>> Yang
>>>
>>
>

Re: is it possible for light-traffic CF to hold down many commit logs?

Posted by Sylvain Lebresne <sy...@datastax.com>.
In 1.0.0, you have:

# Total space to use for commitlogs.
# If space gets above this value (it will round up to the next nearest
# segment multiple), Cassandra will flush every dirty CF in the oldest
# segment and remove it.
# commitlog_total_space_in_mb: 4096

In 0.8, you're supposed to use the memtableFlushAfterMins property
for each CF to avoid filling up your commit log partition. Which is a
little more involved, but that is why we have improved that in 1.0.

--
Sylvain


On Fri, Sep 23, 2011 at 7:47 AM, Yang <te...@gmail.com> wrote:
> thanks for the input.
>
> if that's the case, I think the solution would be to sort the CFs to
> flush by a more complex criteria than just size. for example the
> number of dirty commit logs that contain this CF should be considered
> as a score.
>
> Yang
>
> On Thu, Sep 22, 2011 at 10:40 PM, Philippe <wa...@gmail.com> wrote:
>> It sure looks like what I'm seeing on my cluster where a 100G commit lot
>> partition fills up in 12 hours (0.8.x)
>>
>> Le 23 sept. 2011 03:45, "Yang" <te...@gmail.com> a écrit :
>>> in 1.0.0 we don't have memtable_throughput for each individual CF ,
>>> and instead
>>> which memtable/CF to flush is determined by "largest
>>> getTotalMemtableLiveSize() ".
>>> (MeteredFlusher.java line 81)
>>>
>>>
>>> what would happen in the following case ? : I have only 2 CF, the
>>> traffic for one CF is 1000 times that
>>> of the second CF,
>>> so the high-traffic CF constantly triggers total mem threshold , and
>>> every time, the busy CF is flushed.
>>>
>>> but the light-traffic CF is never flushed ( well, until we have
>>> flushed about 1000 times the busy CF),
>>> now we are left with many commit logs , each of them containing a few
>>> entries for the light-traffic table. we have to keep these commit logs
>>> because these entries are not flushed to sstable yet.
>>>
>>> then there are 2 problems: 1) to persist the few records from the
>>> light-traffic CF, you have to keep 1000 times the commit logs
>>> necessary, taking up disk space 2) when you do a recover on server
>>> restart, you'll have to read through all those commit logs .
>>>
>>> does the above hypothesis sound right?
>>>
>>> Thanks
>>> Yang
>>
>

Re: is it possible for light-traffic CF to hold down many commit logs?

Posted by Yang <te...@gmail.com>.
thanks for the input.

if that's the case, I think the solution would be to sort the CFs to
flush by a more complex criteria than just size. for example the
number of dirty commit logs that contain this CF should be considered
as a score.

Yang

On Thu, Sep 22, 2011 at 10:40 PM, Philippe <wa...@gmail.com> wrote:
> It sure looks like what I'm seeing on my cluster where a 100G commit lot
> partition fills up in 12 hours (0.8.x)
>
> Le 23 sept. 2011 03:45, "Yang" <te...@gmail.com> a écrit :
>> in 1.0.0 we don't have memtable_throughput for each individual CF ,
>> and instead
>> which memtable/CF to flush is determined by "largest
>> getTotalMemtableLiveSize() ".
>> (MeteredFlusher.java line 81)
>>
>>
>> what would happen in the following case ? : I have only 2 CF, the
>> traffic for one CF is 1000 times that
>> of the second CF,
>> so the high-traffic CF constantly triggers total mem threshold , and
>> every time, the busy CF is flushed.
>>
>> but the light-traffic CF is never flushed ( well, until we have
>> flushed about 1000 times the busy CF),
>> now we are left with many commit logs , each of them containing a few
>> entries for the light-traffic table. we have to keep these commit logs
>> because these entries are not flushed to sstable yet.
>>
>> then there are 2 problems: 1) to persist the few records from the
>> light-traffic CF, you have to keep 1000 times the commit logs
>> necessary, taking up disk space 2) when you do a recover on server
>> restart, you'll have to read through all those commit logs .
>>
>> does the above hypothesis sound right?
>>
>> Thanks
>> Yang
>

Re: is it possible for light-traffic CF to hold down many commit logs?

Posted by Philippe <wa...@gmail.com>.
It sure looks like what I'm seeing on my cluster where a 100G commit lot
partition fills up in 12 hours (0.8.x)
Le 23 sept. 2011 03:45, "Yang" <te...@gmail.com> a écrit :
> in 1.0.0 we don't have memtable_throughput for each individual CF ,
> and instead
> which memtable/CF to flush is determined by "largest
> getTotalMemtableLiveSize() ".
> (MeteredFlusher.java line 81)
>
>
> what would happen in the following case ? : I have only 2 CF, the
> traffic for one CF is 1000 times that
> of the second CF,
> so the high-traffic CF constantly triggers total mem threshold , and
> every time, the busy CF is flushed.
>
> but the light-traffic CF is never flushed ( well, until we have
> flushed about 1000 times the busy CF),
> now we are left with many commit logs , each of them containing a few
> entries for the light-traffic table. we have to keep these commit logs
> because these entries are not flushed to sstable yet.
>
> then there are 2 problems: 1) to persist the few records from the
> light-traffic CF, you have to keep 1000 times the commit logs
> necessary, taking up disk space 2) when you do a recover on server
> restart, you'll have to read through all those commit logs .
>
> does the above hypothesis sound right?
>
> Thanks
> Yang