You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ying Tang <iv...@gmail.com> on 2010/12/01 08:02:27 UTC

When to call the major compaction ?

Every time cassandra creates a new sstable , it will call the
CompactionManager.submitMinorIfNeeded  ? And if the number of memtables is
beyond  MinimumCompactionThreshold  , the minor compaction will be called.
And there is also a method named CompactionManager.submitMajor , and the
call relationship is :

NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
Table.forceCompaction -->CompactionManager.performMajor -->
CompactionManager.submitMajor

ColumnFamilyStore.forceMajorCompaction --> CompactionManager.performMajor
--> CompactionManager.submitMajor


HintedHandOffManager
 --> CompactionManager.submitMajor

So i have 3 questions:
1. Once a new sstable has been created ,
CompactionManager.submitMinorIfNeeded  will be called , minorCompaction
maybe called .
    But when will the majorCompaction be called ? Just the NodeCmd ?
2. Which jobs will minorCompaction and majorCompaction do ?
    Will minorCompaction delete the data that have been marked as deleted ?
    And how about the major compaction ?
3. When gc be called ? Every time compaction been called?



-- 
Best regards,

Ivy Tang

Re: When to call the major compaction ?

Posted by Nick Bailey <ni...@riptano.com>.
The part about gc refers to old sstable files on disk. After a compaction,
the old files on disk will be deleted when garbage collection happens.

On Wed, Dec 1, 2010 at 7:31 AM, Ying Tang <iv...@gmail.com> wrote:

> I'm confused , plz ingore the mail above.
> Here is my confusion ,
>    posterior to 0.6.6/0.7  , minor compaction and major compaction both
> can clean out rows 'tagged'  tombstones  , and generate a new , without
> tombstones , sstable .
>     And the tombstones remains in memory ,waiting to be removed by jvm gc .
> Am i right?
>
> On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang <iv...@gmail.com> wrote:
>
>> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction
>> both  can clean out rows 'tagged'  tombstones , this kind of clean out
>> doesn't mead remove it from the disk permanently.
>>     The real remove is done by the jvm GC ?
>> 2. The intence of compaction is merging multi sstables into one , clean
>> out the tombstone , let the un-tombstones  rows be into a new ordered
>> sstable ?
>>
>>
>>
>> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne <sy...@yakaz.com>wrote:
>>
>>> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang <iv...@gmail.com>
>>> wrote:
>>> > And i have another question , what's the difference between minor
>>> > compaction and major compaction?
>>>
>>> A major compaction is a compaction that compact *all* the SSTables of a
>>> given
>>> column family (compaction compacts one CF at a time).
>>>
>>> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
>>> (introduced in 0.6.6 and
>>> recent 0.7 betas/rcs), major compactions where the only ones that removed
>>> the
>>> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
>>> and this is the
>>> reason major compaction exists. Now, with #1074, minor compactions
>>> should remove most
>>> if not all tombstones, so major compaction are not or much less useful
>>> (it may depend on your
>>> workload though as minor can't always delete the tombstones).
>>>
>>> --
>>> Sylvain
>>>
>>> >
>>> > On 12/1/10, Chen Xinli <ch...@gmail.com> wrote:
>>> >> 2010/12/1 Ying Tang <iv...@gmail.com>
>>> >>
>>> >>> Every time cassandra creates a new sstable , it will call the
>>> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
>>> memtables is
>>> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
>>> called.
>>> >>> And there is also a method named CompactionManager.submitMajor , and
>>> the
>>> >>> call relationship is :
>>> >>>
>>> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>>> >>> Table.forceCompaction -->CompactionManager.performMajor -->
>>> >>> CompactionManager.submitMajor
>>> >>>
>>> >>> ColumnFamilyStore.forceMajorCompaction -->
>>> CompactionManager.performMajor
>>> >>> --> CompactionManager.submitMajor
>>> >>>
>>> >>>
>>> >>> HintedHandOffManager
>>> >>>  --> CompactionManager.submitMajor
>>> >>>
>>> >>> So i have 3 questions:
>>> >>> 1. Once a new sstable has been created ,
>>> >>> CompactionManager.submitMinorIfNeeded  will be called ,
>>> minorCompaction
>>> >>> maybe called .
>>> >>>     But when will the majorCompaction be called ? Just the NodeCmd ?
>>> >>>
>>> >>
>>> >> Yes, majorCompaction must be called manually from NodeCmd
>>> >>
>>> >>
>>> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
>>> >>>     Will minorCompaction delete the data that have been marked as
>>> deleted
>>> >>> ?
>>> >>>     And how about the major compaction ?
>>> >>>
>>> >>
>>> >> Compaction only mark sstables as deleted. Deletion will be done when
>>> there
>>> >> are full gc, or node restarted.
>>> >>
>>> >>
>>> >>> 3. When gc be called ? Every time compaction been called?
>>> >>>
>>> >>
>>> >> GC has nothing to do with compaction, you may mistake the two
>>> conceptions
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Best regards,
>>> >>>
>>> >>> Ivy Tang
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards,
>>> >> Chen Xinli
>>> >>
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> >
>>> > Ivy Tang
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>

Re: When to call the major compaction ?

Posted by Chen Xinli <ch...@gmail.com>.
You are right, jvm gc is for memory.
In cassandra, there is a small trick called *PhantomReference*, which will
be called when jvm gc. And deletion is actually done in PhantomReference.

2010/12/2 Ying Tang <iv...@gmail.com>

> @Chen Xinli
> "and mark old sstables as deleted which will be deleted while jvm gc."
> SSTable is on the harddisk , how could jvm gc delete it ? JVM GC is in
> charge the using of the space in the memory.
>
> @Nick
> The GC in cassandra doesn't refer to jvm gc ? This kind of gc is
> cassandda's gc , intend to remove the unused file on harddisk ?
>
>
>
> On Wed, Dec 1, 2010 at 10:54 PM, Chen Xinli <ch...@gmail.com> wrote:
>
>>
>>
>>  2010/12/1 Ying Tang <iv...@gmail.com>
>>
>>> I'm confused , plz ingore the mail above.
>>>  Here is my confusion ,
>>>    posterior to 0.6.6/0.7  , minor compaction and major compaction both
>>> can clean out rows 'tagged'  tombstones  , and generate a new , without
>>> tombstones , sstable .
>>>
>>
>> This is right.
>>
>>
>>>     And the tombstones remains in memory ,waiting to be removed by jvm gc
>>> .
>>> Am i right?
>>>
>>
>> No! Compactions merge several old sstables into one, and mark old sstables
>> as deleted which will be deleted while jvm gc.
>> SSTable are files on harddisk, nothing to do with memory. You'd better
>> have a look at Google's bigtable paper.
>>
>>
>>>
>>>   On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang <iv...@gmail.com>wrote:
>>>
>>>> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction
>>>> both  can clean out rows 'tagged'  tombstones , this kind of clean out
>>>> doesn't mead remove it from the disk permanently.
>>>>     The real remove is done by the jvm GC ?
>>>> 2. The intence of compaction is merging multi sstables into one , clean
>>>> out the tombstone , let the un-tombstones  rows be into a new ordered
>>>> sstable ?
>>>>
>>>>
>>>>
>>>> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne <sy...@yakaz.com>wrote:
>>>>
>>>>> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang <iv...@gmail.com>
>>>>> wrote:
>>>>> > And i have another question , what's the difference between minor
>>>>> > compaction and major compaction?
>>>>>
>>>>> A major compaction is a compaction that compact *all* the SSTables of a
>>>>> given
>>>>> column family (compaction compacts one CF at a time).
>>>>>
>>>>> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
>>>>> (introduced in 0.6.6 and
>>>>> recent 0.7 betas/rcs), major compactions where the only ones that
>>>>> removed the
>>>>> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
>>>>> and this is the
>>>>> reason major compaction exists. Now, with #1074, minor compactions
>>>>> should remove most
>>>>> if not all tombstones, so major compaction are not or much less useful
>>>>> (it may depend on your
>>>>> workload though as minor can't always delete the tombstones).
>>>>>
>>>>> --
>>>>> Sylvain
>>>>>
>>>>> >
>>>>> > On 12/1/10, Chen Xinli <ch...@gmail.com> wrote:
>>>>> >> 2010/12/1 Ying Tang <iv...@gmail.com>
>>>>> >>
>>>>> >>> Every time cassandra creates a new sstable , it will call the
>>>>> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
>>>>> memtables is
>>>>> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
>>>>> called.
>>>>> >>> And there is also a method named CompactionManager.submitMajor ,
>>>>> and the
>>>>> >>> call relationship is :
>>>>> >>>
>>>>> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>>>>> >>> Table.forceCompaction -->CompactionManager.performMajor -->
>>>>> >>> CompactionManager.submitMajor
>>>>> >>>
>>>>> >>> ColumnFamilyStore.forceMajorCompaction -->
>>>>> CompactionManager.performMajor
>>>>> >>> --> CompactionManager.submitMajor
>>>>> >>>
>>>>> >>>
>>>>> >>> HintedHandOffManager
>>>>> >>>  --> CompactionManager.submitMajor
>>>>> >>>
>>>>> >>> So i have 3 questions:
>>>>> >>> 1. Once a new sstable has been created ,
>>>>> >>> CompactionManager.submitMinorIfNeeded  will be called ,
>>>>> minorCompaction
>>>>> >>> maybe called .
>>>>> >>>     But when will the majorCompaction be called ? Just the NodeCmd
>>>>> ?
>>>>> >>>
>>>>> >>
>>>>> >> Yes, majorCompaction must be called manually from NodeCmd
>>>>> >>
>>>>> >>
>>>>> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
>>>>> >>>     Will minorCompaction delete the data that have been marked as
>>>>> deleted
>>>>> >>> ?
>>>>> >>>     And how about the major compaction ?
>>>>> >>>
>>>>> >>
>>>>> >> Compaction only mark sstables as deleted. Deletion will be done when
>>>>> there
>>>>> >> are full gc, or node restarted.
>>>>> >>
>>>>> >>
>>>>> >>> 3. When gc be called ? Every time compaction been called?
>>>>> >>>
>>>>> >>
>>>>> >> GC has nothing to do with compaction, you may mistake the two
>>>>> conceptions
>>>>> >>
>>>>> >>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Best regards,
>>>>> >>>
>>>>> >>> Ivy Tang
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Best Regards,
>>>>> >> Chen Xinli
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Best regards,
>>>>> >
>>>>> > Ivy Tang
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Ivy Tang
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Ivy Tang
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Chen Xinli
>>
>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best Regards,
Chen Xinli

Re: When to call the major compaction ?

Posted by Ying Tang <iv...@gmail.com>.
@Chen Xinli
"and mark old sstables as deleted which will be deleted while jvm gc."
SSTable is on the harddisk , how could jvm gc delete it ? JVM GC is in
charge the using of the space in the memory.

@Nick
The GC in cassandra doesn't refer to jvm gc ? This kind of gc is cassandda's
gc , intend to remove the unused file on harddisk ?



On Wed, Dec 1, 2010 at 10:54 PM, Chen Xinli <ch...@gmail.com> wrote:

>
>
>  2010/12/1 Ying Tang <iv...@gmail.com>
>
>> I'm confused , plz ingore the mail above.
>>  Here is my confusion ,
>>    posterior to 0.6.6/0.7  , minor compaction and major compaction both
>> can clean out rows 'tagged'  tombstones  , and generate a new , without
>> tombstones , sstable .
>>
>
> This is right.
>
>
>>     And the tombstones remains in memory ,waiting to be removed by jvm gc
>> .
>> Am i right?
>>
>
> No! Compactions merge several old sstables into one, and mark old sstables
> as deleted which will be deleted while jvm gc.
> SSTable are files on harddisk, nothing to do with memory. You'd better have
> a look at Google's bigtable paper.
>
>
>>
>>   On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang <iv...@gmail.com>wrote:
>>
>>> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction
>>> both  can clean out rows 'tagged'  tombstones , this kind of clean out
>>> doesn't mead remove it from the disk permanently.
>>>     The real remove is done by the jvm GC ?
>>> 2. The intence of compaction is merging multi sstables into one , clean
>>> out the tombstone , let the un-tombstones  rows be into a new ordered
>>> sstable ?
>>>
>>>
>>>
>>> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne <sy...@yakaz.com>wrote:
>>>
>>>> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang <iv...@gmail.com>
>>>> wrote:
>>>> > And i have another question , what's the difference between minor
>>>> > compaction and major compaction?
>>>>
>>>> A major compaction is a compaction that compact *all* the SSTables of a
>>>> given
>>>> column family (compaction compacts one CF at a time).
>>>>
>>>> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
>>>> (introduced in 0.6.6 and
>>>> recent 0.7 betas/rcs), major compactions where the only ones that
>>>> removed the
>>>> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
>>>> and this is the
>>>> reason major compaction exists. Now, with #1074, minor compactions
>>>> should remove most
>>>> if not all tombstones, so major compaction are not or much less useful
>>>> (it may depend on your
>>>> workload though as minor can't always delete the tombstones).
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>> >
>>>> > On 12/1/10, Chen Xinli <ch...@gmail.com> wrote:
>>>> >> 2010/12/1 Ying Tang <iv...@gmail.com>
>>>> >>
>>>> >>> Every time cassandra creates a new sstable , it will call the
>>>> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
>>>> memtables is
>>>> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
>>>> called.
>>>> >>> And there is also a method named CompactionManager.submitMajor , and
>>>> the
>>>> >>> call relationship is :
>>>> >>>
>>>> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>>>> >>> Table.forceCompaction -->CompactionManager.performMajor -->
>>>> >>> CompactionManager.submitMajor
>>>> >>>
>>>> >>> ColumnFamilyStore.forceMajorCompaction -->
>>>> CompactionManager.performMajor
>>>> >>> --> CompactionManager.submitMajor
>>>> >>>
>>>> >>>
>>>> >>> HintedHandOffManager
>>>> >>>  --> CompactionManager.submitMajor
>>>> >>>
>>>> >>> So i have 3 questions:
>>>> >>> 1. Once a new sstable has been created ,
>>>> >>> CompactionManager.submitMinorIfNeeded  will be called ,
>>>> minorCompaction
>>>> >>> maybe called .
>>>> >>>     But when will the majorCompaction be called ? Just the NodeCmd ?
>>>> >>>
>>>> >>
>>>> >> Yes, majorCompaction must be called manually from NodeCmd
>>>> >>
>>>> >>
>>>> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
>>>> >>>     Will minorCompaction delete the data that have been marked as
>>>> deleted
>>>> >>> ?
>>>> >>>     And how about the major compaction ?
>>>> >>>
>>>> >>
>>>> >> Compaction only mark sstables as deleted. Deletion will be done when
>>>> there
>>>> >> are full gc, or node restarted.
>>>> >>
>>>> >>
>>>> >>> 3. When gc be called ? Every time compaction been called?
>>>> >>>
>>>> >>
>>>> >> GC has nothing to do with compaction, you may mistake the two
>>>> conceptions
>>>> >>
>>>> >>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Best regards,
>>>> >>>
>>>> >>> Ivy Tang
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Best Regards,
>>>> >> Chen Xinli
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > Best regards,
>>>> >
>>>> > Ivy Tang
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Ivy Tang
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>>
>
>
> --
> Best Regards,
> Chen Xinli
>



-- 
Best regards,

Ivy Tang

Re: When to call the major compaction ?

Posted by Chen Xinli <ch...@gmail.com>.
2010/12/1 Ying Tang <iv...@gmail.com>

> I'm confused , plz ingore the mail above.
> Here is my confusion ,
>    posterior to 0.6.6/0.7  , minor compaction and major compaction both
> can clean out rows 'tagged'  tombstones  , and generate a new , without
> tombstones , sstable .
>

This is right.


>     And the tombstones remains in memory ,waiting to be removed by jvm gc .
> Am i right?
>

No! Compactions merge several old sstables into one, and mark old sstables
as deleted which will be deleted while jvm gc.
SSTable are files on harddisk, nothing to do with memory. You'd better have
a look at Google's bigtable paper.


>
> On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang <iv...@gmail.com> wrote:
>
>> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction
>> both  can clean out rows 'tagged'  tombstones , this kind of clean out
>> doesn't mead remove it from the disk permanently.
>>     The real remove is done by the jvm GC ?
>> 2. The intence of compaction is merging multi sstables into one , clean
>> out the tombstone , let the un-tombstones  rows be into a new ordered
>> sstable ?
>>
>>
>>
>> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne <sy...@yakaz.com>wrote:
>>
>>> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang <iv...@gmail.com>
>>> wrote:
>>> > And i have another question , what's the difference between minor
>>> > compaction and major compaction?
>>>
>>> A major compaction is a compaction that compact *all* the SSTables of a
>>> given
>>> column family (compaction compacts one CF at a time).
>>>
>>> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
>>> (introduced in 0.6.6 and
>>> recent 0.7 betas/rcs), major compactions where the only ones that removed
>>> the
>>> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
>>> and this is the
>>> reason major compaction exists. Now, with #1074, minor compactions
>>> should remove most
>>> if not all tombstones, so major compaction are not or much less useful
>>> (it may depend on your
>>> workload though as minor can't always delete the tombstones).
>>>
>>> --
>>> Sylvain
>>>
>>> >
>>> > On 12/1/10, Chen Xinli <ch...@gmail.com> wrote:
>>> >> 2010/12/1 Ying Tang <iv...@gmail.com>
>>> >>
>>> >>> Every time cassandra creates a new sstable , it will call the
>>> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
>>> memtables is
>>> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
>>> called.
>>> >>> And there is also a method named CompactionManager.submitMajor , and
>>> the
>>> >>> call relationship is :
>>> >>>
>>> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>>> >>> Table.forceCompaction -->CompactionManager.performMajor -->
>>> >>> CompactionManager.submitMajor
>>> >>>
>>> >>> ColumnFamilyStore.forceMajorCompaction -->
>>> CompactionManager.performMajor
>>> >>> --> CompactionManager.submitMajor
>>> >>>
>>> >>>
>>> >>> HintedHandOffManager
>>> >>>  --> CompactionManager.submitMajor
>>> >>>
>>> >>> So i have 3 questions:
>>> >>> 1. Once a new sstable has been created ,
>>> >>> CompactionManager.submitMinorIfNeeded  will be called ,
>>> minorCompaction
>>> >>> maybe called .
>>> >>>     But when will the majorCompaction be called ? Just the NodeCmd ?
>>> >>>
>>> >>
>>> >> Yes, majorCompaction must be called manually from NodeCmd
>>> >>
>>> >>
>>> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
>>> >>>     Will minorCompaction delete the data that have been marked as
>>> deleted
>>> >>> ?
>>> >>>     And how about the major compaction ?
>>> >>>
>>> >>
>>> >> Compaction only mark sstables as deleted. Deletion will be done when
>>> there
>>> >> are full gc, or node restarted.
>>> >>
>>> >>
>>> >>> 3. When gc be called ? Every time compaction been called?
>>> >>>
>>> >>
>>> >> GC has nothing to do with compaction, you may mistake the two
>>> conceptions
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Best regards,
>>> >>>
>>> >>> Ivy Tang
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards,
>>> >> Chen Xinli
>>> >>
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> >
>>> > Ivy Tang
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best Regards,
Chen Xinli

Re: When to call the major compaction ?

Posted by Ying Tang <iv...@gmail.com>.
I'm confused , plz ingore the mail above.
Here is my confusion ,
   posterior to 0.6.6/0.7  , minor compaction and major compaction both  can
clean out rows 'tagged'  tombstones  , and generate a new , without
tombstones , sstable .
    And the tombstones remains in memory ,waiting to be removed by jvm gc .
Am i right?

On Wed, Dec 1, 2010 at 9:10 PM, Ying Tang <iv...@gmail.com> wrote:

> 1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction both
> can clean out rows 'tagged'  tombstones , this kind of clean out doesn't
> mead remove it from the disk permanently.
>     The real remove is done by the jvm GC ?
> 2. The intence of compaction is merging multi sstables into one , clean out
> the tombstone , let the un-tombstones  rows be into a new ordered sstable ?
>
>
>
> On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne <sy...@yakaz.com>wrote:
>
>> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang <iv...@gmail.com> wrote:
>> > And i have another question , what's the difference between minor
>> > compaction and major compaction?
>>
>> A major compaction is a compaction that compact *all* the SSTables of a
>> given
>> column family (compaction compacts one CF at a time).
>>
>> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
>> (introduced in 0.6.6 and
>> recent 0.7 betas/rcs), major compactions where the only ones that removed
>> the
>> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
>> and this is the
>> reason major compaction exists. Now, with #1074, minor compactions
>> should remove most
>> if not all tombstones, so major compaction are not or much less useful
>> (it may depend on your
>> workload though as minor can't always delete the tombstones).
>>
>> --
>> Sylvain
>>
>> >
>> > On 12/1/10, Chen Xinli <ch...@gmail.com> wrote:
>> >> 2010/12/1 Ying Tang <iv...@gmail.com>
>> >>
>> >>> Every time cassandra creates a new sstable , it will call the
>> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of
>> memtables is
>> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
>> called.
>> >>> And there is also a method named CompactionManager.submitMajor , and
>> the
>> >>> call relationship is :
>> >>>
>> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>> >>> Table.forceCompaction -->CompactionManager.performMajor -->
>> >>> CompactionManager.submitMajor
>> >>>
>> >>> ColumnFamilyStore.forceMajorCompaction -->
>> CompactionManager.performMajor
>> >>> --> CompactionManager.submitMajor
>> >>>
>> >>>
>> >>> HintedHandOffManager
>> >>>  --> CompactionManager.submitMajor
>> >>>
>> >>> So i have 3 questions:
>> >>> 1. Once a new sstable has been created ,
>> >>> CompactionManager.submitMinorIfNeeded  will be called ,
>> minorCompaction
>> >>> maybe called .
>> >>>     But when will the majorCompaction be called ? Just the NodeCmd ?
>> >>>
>> >>
>> >> Yes, majorCompaction must be called manually from NodeCmd
>> >>
>> >>
>> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
>> >>>     Will minorCompaction delete the data that have been marked as
>> deleted
>> >>> ?
>> >>>     And how about the major compaction ?
>> >>>
>> >>
>> >> Compaction only mark sstables as deleted. Deletion will be done when
>> there
>> >> are full gc, or node restarted.
>> >>
>> >>
>> >>> 3. When gc be called ? Every time compaction been called?
>> >>>
>> >>
>> >> GC has nothing to do with compaction, you may mistake the two
>> conceptions
>> >>
>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best regards,
>> >>>
>> >>> Ivy Tang
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Best Regards,
>> >> Chen Xinli
>> >>
>> >
>> >
>> > --
>> > Best regards,
>> >
>> > Ivy Tang
>> >
>>
>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best regards,

Ivy Tang

Re: When to call the major compaction ?

Posted by Ying Tang <iv...@gmail.com>.
1. So posterior to 0.6.6/0.7 ,  minor compaction and major compaction both
can clean out rows 'tagged'  tombstones , this kind of clean out doesn't
mead remove it from the disk permanently.
    The real remove is done by the jvm GC ?
2. The intence of compaction is merging multi sstables into one , clean out
the tombstone , let the un-tombstones  rows be into a new ordered sstable ?



On Wed, Dec 1, 2010 at 7:30 PM, Sylvain Lebresne <sy...@yakaz.com> wrote:

> On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang <iv...@gmail.com> wrote:
> > And i have another question , what's the difference between minor
> > compaction and major compaction?
>
> A major compaction is a compaction that compact *all* the SSTables of a
> given
> column family (compaction compacts one CF at a time).
>
> Before https://issues.apache.org/jira/browse/CASSANDRA-1074
> (introduced in 0.6.6 and
> recent 0.7 betas/rcs), major compactions where the only ones that removed
> the
> tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
> and this is the
> reason major compaction exists. Now, with #1074, minor compactions
> should remove most
> if not all tombstones, so major compaction are not or much less useful
> (it may depend on your
> workload though as minor can't always delete the tombstones).
>
> --
> Sylvain
>
> >
> > On 12/1/10, Chen Xinli <ch...@gmail.com> wrote:
> >> 2010/12/1 Ying Tang <iv...@gmail.com>
> >>
> >>> Every time cassandra creates a new sstable , it will call the
> >>> CompactionManager.submitMinorIfNeeded  ? And if the number of memtables
> is
> >>> beyond  MinimumCompactionThreshold  , the minor compaction will be
> called.
> >>> And there is also a method named CompactionManager.submitMajor , and
> the
> >>> call relationship is :
> >>>
> >>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
> >>> Table.forceCompaction -->CompactionManager.performMajor -->
> >>> CompactionManager.submitMajor
> >>>
> >>> ColumnFamilyStore.forceMajorCompaction -->
> CompactionManager.performMajor
> >>> --> CompactionManager.submitMajor
> >>>
> >>>
> >>> HintedHandOffManager
> >>>  --> CompactionManager.submitMajor
> >>>
> >>> So i have 3 questions:
> >>> 1. Once a new sstable has been created ,
> >>> CompactionManager.submitMinorIfNeeded  will be called , minorCompaction
> >>> maybe called .
> >>>     But when will the majorCompaction be called ? Just the NodeCmd ?
> >>>
> >>
> >> Yes, majorCompaction must be called manually from NodeCmd
> >>
> >>
> >>> 2. Which jobs will minorCompaction and majorCompaction do ?
> >>>     Will minorCompaction delete the data that have been marked as
> deleted
> >>> ?
> >>>     And how about the major compaction ?
> >>>
> >>
> >> Compaction only mark sstables as deleted. Deletion will be done when
> there
> >> are full gc, or node restarted.
> >>
> >>
> >>> 3. When gc be called ? Every time compaction been called?
> >>>
> >>
> >> GC has nothing to do with compaction, you may mistake the two
> conceptions
> >>
> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>> Ivy Tang
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Best Regards,
> >> Chen Xinli
> >>
> >
> >
> > --
> > Best regards,
> >
> > Ivy Tang
> >
>



-- 
Best regards,

Ivy Tang

Re: When to call the major compaction ?

Posted by Sylvain Lebresne <sy...@yakaz.com>.
On Wed, Dec 1, 2010 at 12:11 PM, Ying Tang <iv...@gmail.com> wrote:
> And i have another question , what's the difference between minor
> compaction and major compaction?

A major compaction is a compaction that compact *all* the SSTables of a given
column family (compaction compacts one CF at a time).

Before https://issues.apache.org/jira/browse/CASSANDRA-1074
(introduced in 0.6.6 and
recent 0.7 betas/rcs), major compactions where the only ones that removed the
tombstones (see http://wiki.apache.org/cassandra/DistributedDeletes)
and this is the
reason major compaction exists. Now, with #1074, minor compactions
should remove most
if not all tombstones, so major compaction are not or much less useful
(it may depend on your
workload though as minor can't always delete the tombstones).

--
Sylvain

>
> On 12/1/10, Chen Xinli <ch...@gmail.com> wrote:
>> 2010/12/1 Ying Tang <iv...@gmail.com>
>>
>>> Every time cassandra creates a new sstable , it will call the
>>> CompactionManager.submitMinorIfNeeded  ? And if the number of memtables is
>>> beyond  MinimumCompactionThreshold  , the minor compaction will be called.
>>> And there is also a method named CompactionManager.submitMajor , and the
>>> call relationship is :
>>>
>>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>>> Table.forceCompaction -->CompactionManager.performMajor -->
>>> CompactionManager.submitMajor
>>>
>>> ColumnFamilyStore.forceMajorCompaction --> CompactionManager.performMajor
>>> --> CompactionManager.submitMajor
>>>
>>>
>>> HintedHandOffManager
>>>  --> CompactionManager.submitMajor
>>>
>>> So i have 3 questions:
>>> 1. Once a new sstable has been created ,
>>> CompactionManager.submitMinorIfNeeded  will be called , minorCompaction
>>> maybe called .
>>>     But when will the majorCompaction be called ? Just the NodeCmd ?
>>>
>>
>> Yes, majorCompaction must be called manually from NodeCmd
>>
>>
>>> 2. Which jobs will minorCompaction and majorCompaction do ?
>>>     Will minorCompaction delete the data that have been marked as deleted
>>> ?
>>>     And how about the major compaction ?
>>>
>>
>> Compaction only mark sstables as deleted. Deletion will be done when there
>> are full gc, or node restarted.
>>
>>
>>> 3. When gc be called ? Every time compaction been called?
>>>
>>
>> GC has nothing to do with compaction, you may mistake the two conceptions
>>
>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Ivy Tang
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Chen Xinli
>>
>
>
> --
> Best regards,
>
> Ivy Tang
>

Re: When to call the major compaction ?

Posted by Ying Tang <iv...@gmail.com>.
And i have another question , what's the difference between minor
compaction and major compaction?

On 12/1/10, Chen Xinli <ch...@gmail.com> wrote:
> 2010/12/1 Ying Tang <iv...@gmail.com>
>
>> Every time cassandra creates a new sstable , it will call the
>> CompactionManager.submitMinorIfNeeded  ? And if the number of memtables is
>> beyond  MinimumCompactionThreshold  , the minor compaction will be called.
>> And there is also a method named CompactionManager.submitMajor , and the
>> call relationship is :
>>
>> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
>> Table.forceCompaction -->CompactionManager.performMajor -->
>> CompactionManager.submitMajor
>>
>> ColumnFamilyStore.forceMajorCompaction --> CompactionManager.performMajor
>> --> CompactionManager.submitMajor
>>
>>
>> HintedHandOffManager
>>  --> CompactionManager.submitMajor
>>
>> So i have 3 questions:
>> 1. Once a new sstable has been created ,
>> CompactionManager.submitMinorIfNeeded  will be called , minorCompaction
>> maybe called .
>>     But when will the majorCompaction be called ? Just the NodeCmd ?
>>
>
> Yes, majorCompaction must be called manually from NodeCmd
>
>
>> 2. Which jobs will minorCompaction and majorCompaction do ?
>>     Will minorCompaction delete the data that have been marked as deleted
>> ?
>>     And how about the major compaction ?
>>
>
> Compaction only mark sstables as deleted. Deletion will be done when there
> are full gc, or node restarted.
>
>
>> 3. When gc be called ? Every time compaction been called?
>>
>
> GC has nothing to do with compaction, you may mistake the two conceptions
>
>
>>
>>
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>>
>
>
> --
> Best Regards,
> Chen Xinli
>


-- 
Best regards,

Ivy Tang

Re: When to call the major compaction ?

Posted by Chen Xinli <ch...@gmail.com>.
2010/12/1 Ying Tang <iv...@gmail.com>

> Every time cassandra creates a new sstable , it will call the
> CompactionManager.submitMinorIfNeeded  ? And if the number of memtables is
> beyond  MinimumCompactionThreshold  , the minor compaction will be called.
> And there is also a method named CompactionManager.submitMajor , and the
> call relationship is :
>
> NodeCmd -- > NodeProbe -->StorageService.forceTableCompaction -->
> Table.forceCompaction -->CompactionManager.performMajor -->
> CompactionManager.submitMajor
>
> ColumnFamilyStore.forceMajorCompaction --> CompactionManager.performMajor
> --> CompactionManager.submitMajor
>                                                                                                                                               HintedHandOffManager
>  --> CompactionManager.submitMajor
>
> So i have 3 questions:
> 1. Once a new sstable has been created ,
> CompactionManager.submitMinorIfNeeded  will be called , minorCompaction
> maybe called .
>     But when will the majorCompaction be called ? Just the NodeCmd ?
>

Yes, majorCompaction must be called manually from NodeCmd


> 2. Which jobs will minorCompaction and majorCompaction do ?
>     Will minorCompaction delete the data that have been marked as deleted ?
>     And how about the major compaction ?
>

Compaction only mark sstables as deleted. Deletion will be done when there
are full gc, or node restarted.


> 3. When gc be called ? Every time compaction been called?
>

GC has nothing to do with compaction, you may mistake the two conceptions


>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best Regards,
Chen Xinli