You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dikang Gu <di...@gmail.com> on 2016/03/12 07:05:29 UTC

Compaction Filter in Cassandra

Hello there,

RocksDB has the feature called "Compaction Filter" to allow application to
modify/delete a key-value during the background compaction.
https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226

I'm wondering is there a plan/value to add this into C* as well? Or is
there already a similar thing in C*?

Thanks

-- 
Dikang

Re: Compaction Filter in Cassandra

Posted by Robert Coli <rc...@eventbrite.com>.
On Fri, Mar 11, 2016 at 10:05 PM, Dikang Gu <di...@gmail.com> wrote:

> RocksDB has the feature called "Compaction Filter" to allow application to
> modify/delete a key-value during the background compaction.
> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>
> I'm wondering is there a plan/value to add this into C* as well? Or is
> there already a similar thing in C*?
>

I think it's far more reasonable to do this via an offline tool such as
"sstablefilter" :

https://issues.apache.org/jira/browse/CASSANDRA-1581

I used the internal Digg version of this to purge a bunch of obsolete keys
from a multi-tenancy CF (bad practice). It worked great.

=Rob

Re: Compaction Filter in Cassandra

Posted by Dikang Gu <di...@gmail.com>.
Fyi, this is the jira, https://issues.apache.org/jira/browse/CASSANDRA-11348
.

We can move the discussion to the jira if want.

On Thu, Mar 17, 2016 at 11:46 AM, Dikang Gu <di...@gmail.com> wrote:

> Hi Eric,
>
> Thanks for sharing the information!
>
> We also mainly want to use it for trimming data, either by the time or the
> number of columns in a row. We haven't started the work yet, do you mind to
> share some patches? We'd love to try it and test it in our environment.
>
> Thanks.
>
> On Tue, Mar 15, 2016 at 9:36 PM, Eric Stevens <mi...@gmail.com> wrote:
>
>> We have been working on filtering compaction for a month or so (though we
>> call it deleting compaction, its implementation is as a filtering
>> compaction strategy).  The feature is nearing completion, and we have used
>> it successfully in a limited production capacity against DSE 4.8 series.
>>
>> Our use case is that our records are written anywhere between a month, up
>> to several years before they are scheduled for deletion.  Tombstones are
>> too expensive, as we have tables with hundreds of billions of rows.  In
>> addition, traditional TTLs don't work for us because our customers are
>> permitted to change their retention policy such that already-written
>> records should not be deleted if they increase their retention after the
>> record was written (or vice versa).
>>
>> We can clean up data more cheaply and more quickly with filtered
>> compaction than with tombstones and traditional compaction.  Our
>> implementation is a wrapper compaction strategy for another underlying
>> strategy, so that you can have the characteristics of whichever strategy
>> makes sense in terms of managing your SSTables, while interceding and
>> removing records during compaction (including cleaning up secondary
>> indexes) that otherwise would have survived into the new SSTable.
>>
>> We are hoping to contribute it back to the community, so if you'd be
>> interested in helping test it out, I'd love to hear from you.
>>
>> On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson <kr...@gmail.com>
>> wrote:
>>
>>> We don't have anything like that, do you have a specific use case in
>>> mind?
>>>
>>> Could you create a JIRA ticket and we can discuss there?
>>>
>>> /Marcus
>>>
>>> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:
>>>
>>>> Hello there,
>>>>
>>>> RocksDB has the feature called "Compaction Filter" to allow application
>>>> to modify/delete a key-value during the background compaction.
>>>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>>>
>>>> I'm wondering is there a plan/value to add this into C* as well? Or is
>>>> there already a similar thing in C*?
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> Dikang
>>>>
>>>>
>>>
>
>
> --
> Dikang
>
>


-- 
Dikang

Re: Compaction Filter in Cassandra

Posted by Dikang Gu <di...@gmail.com>.
Fyi, this is the jira, https://issues.apache.org/jira/browse/CASSANDRA-11348
.

We can move the discussion to the jira if want.

On Thu, Mar 17, 2016 at 11:46 AM, Dikang Gu <di...@gmail.com> wrote:

> Hi Eric,
>
> Thanks for sharing the information!
>
> We also mainly want to use it for trimming data, either by the time or the
> number of columns in a row. We haven't started the work yet, do you mind to
> share some patches? We'd love to try it and test it in our environment.
>
> Thanks.
>
> On Tue, Mar 15, 2016 at 9:36 PM, Eric Stevens <mi...@gmail.com> wrote:
>
>> We have been working on filtering compaction for a month or so (though we
>> call it deleting compaction, its implementation is as a filtering
>> compaction strategy).  The feature is nearing completion, and we have used
>> it successfully in a limited production capacity against DSE 4.8 series.
>>
>> Our use case is that our records are written anywhere between a month, up
>> to several years before they are scheduled for deletion.  Tombstones are
>> too expensive, as we have tables with hundreds of billions of rows.  In
>> addition, traditional TTLs don't work for us because our customers are
>> permitted to change their retention policy such that already-written
>> records should not be deleted if they increase their retention after the
>> record was written (or vice versa).
>>
>> We can clean up data more cheaply and more quickly with filtered
>> compaction than with tombstones and traditional compaction.  Our
>> implementation is a wrapper compaction strategy for another underlying
>> strategy, so that you can have the characteristics of whichever strategy
>> makes sense in terms of managing your SSTables, while interceding and
>> removing records during compaction (including cleaning up secondary
>> indexes) that otherwise would have survived into the new SSTable.
>>
>> We are hoping to contribute it back to the community, so if you'd be
>> interested in helping test it out, I'd love to hear from you.
>>
>> On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson <kr...@gmail.com>
>> wrote:
>>
>>> We don't have anything like that, do you have a specific use case in
>>> mind?
>>>
>>> Could you create a JIRA ticket and we can discuss there?
>>>
>>> /Marcus
>>>
>>> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:
>>>
>>>> Hello there,
>>>>
>>>> RocksDB has the feature called "Compaction Filter" to allow application
>>>> to modify/delete a key-value during the background compaction.
>>>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>>>
>>>> I'm wondering is there a plan/value to add this into C* as well? Or is
>>>> there already a similar thing in C*?
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> Dikang
>>>>
>>>>
>>>
>
>
> --
> Dikang
>
>


-- 
Dikang

Re: Compaction Filter in Cassandra

Posted by Dikang Gu <di...@gmail.com>.
Hi Eric,

Thanks for sharing the information!

We also mainly want to use it for trimming data, either by the time or the
number of columns in a row. We haven't started the work yet, do you mind to
share some patches? We'd love to try it and test it in our environment.

Thanks.

On Tue, Mar 15, 2016 at 9:36 PM, Eric Stevens <mi...@gmail.com> wrote:

> We have been working on filtering compaction for a month or so (though we
> call it deleting compaction, its implementation is as a filtering
> compaction strategy).  The feature is nearing completion, and we have used
> it successfully in a limited production capacity against DSE 4.8 series.
>
> Our use case is that our records are written anywhere between a month, up
> to several years before they are scheduled for deletion.  Tombstones are
> too expensive, as we have tables with hundreds of billions of rows.  In
> addition, traditional TTLs don't work for us because our customers are
> permitted to change their retention policy such that already-written
> records should not be deleted if they increase their retention after the
> record was written (or vice versa).
>
> We can clean up data more cheaply and more quickly with filtered
> compaction than with tombstones and traditional compaction.  Our
> implementation is a wrapper compaction strategy for another underlying
> strategy, so that you can have the characteristics of whichever strategy
> makes sense in terms of managing your SSTables, while interceding and
> removing records during compaction (including cleaning up secondary
> indexes) that otherwise would have survived into the new SSTable.
>
> We are hoping to contribute it back to the community, so if you'd be
> interested in helping test it out, I'd love to hear from you.
>
> On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson <kr...@gmail.com> wrote:
>
>> We don't have anything like that, do you have a specific use case in mind?
>>
>> Could you create a JIRA ticket and we can discuss there?
>>
>> /Marcus
>>
>> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:
>>
>>> Hello there,
>>>
>>> RocksDB has the feature called "Compaction Filter" to allow application
>>> to modify/delete a key-value during the background compaction.
>>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>>
>>> I'm wondering is there a plan/value to add this into C* as well? Or is
>>> there already a similar thing in C*?
>>>
>>> Thanks
>>>
>>> --
>>> Dikang
>>>
>>>
>>


-- 
Dikang

Re: Compaction Filter in Cassandra

Posted by Clint Martin <cl...@coolfiretechnologies.com>.
I would definitely be interested in this.

Clint
On Mar 15, 2016 9:36 PM, "Eric Stevens" <mi...@gmail.com> wrote:

> We have been working on filtering compaction for a month or so (though we
> call it deleting compaction, its implementation is as a filtering
> compaction strategy).  The feature is nearing completion, and we have used
> it successfully in a limited production capacity against DSE 4.8 series.
>
> Our use case is that our records are written anywhere between a month, up
> to several years before they are scheduled for deletion.  Tombstones are
> too expensive, as we have tables with hundreds of billions of rows.  In
> addition, traditional TTLs don't work for us because our customers are
> permitted to change their retention policy such that already-written
> records should not be deleted if they increase their retention after the
> record was written (or vice versa).
>
> We can clean up data more cheaply and more quickly with filtered
> compaction than with tombstones and traditional compaction.  Our
> implementation is a wrapper compaction strategy for another underlying
> strategy, so that you can have the characteristics of whichever strategy
> makes sense in terms of managing your SSTables, while interceding and
> removing records during compaction (including cleaning up secondary
> indexes) that otherwise would have survived into the new SSTable.
>
> We are hoping to contribute it back to the community, so if you'd be
> interested in helping test it out, I'd love to hear from you.
>
> On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson <kr...@gmail.com> wrote:
>
>> We don't have anything like that, do you have a specific use case in mind?
>>
>> Could you create a JIRA ticket and we can discuss there?
>>
>> /Marcus
>>
>> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:
>>
>>> Hello there,
>>>
>>> RocksDB has the feature called "Compaction Filter" to allow application
>>> to modify/delete a key-value during the background compaction.
>>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>>
>>> I'm wondering is there a plan/value to add this into C* as well? Or is
>>> there already a similar thing in C*?
>>>
>>> Thanks
>>>
>>> --
>>> Dikang
>>>
>>>
>>

Re: Compaction Filter in Cassandra

Posted by Dikang Gu <di...@gmail.com>.
Hi Eric,

Thanks for sharing the information!

We also mainly want to use it for trimming data, either by the time or the
number of columns in a row. We haven't started the work yet, do you mind to
share some patches? We'd love to try it and test it in our environment.

Thanks.

On Tue, Mar 15, 2016 at 9:36 PM, Eric Stevens <mi...@gmail.com> wrote:

> We have been working on filtering compaction for a month or so (though we
> call it deleting compaction, its implementation is as a filtering
> compaction strategy).  The feature is nearing completion, and we have used
> it successfully in a limited production capacity against DSE 4.8 series.
>
> Our use case is that our records are written anywhere between a month, up
> to several years before they are scheduled for deletion.  Tombstones are
> too expensive, as we have tables with hundreds of billions of rows.  In
> addition, traditional TTLs don't work for us because our customers are
> permitted to change their retention policy such that already-written
> records should not be deleted if they increase their retention after the
> record was written (or vice versa).
>
> We can clean up data more cheaply and more quickly with filtered
> compaction than with tombstones and traditional compaction.  Our
> implementation is a wrapper compaction strategy for another underlying
> strategy, so that you can have the characteristics of whichever strategy
> makes sense in terms of managing your SSTables, while interceding and
> removing records during compaction (including cleaning up secondary
> indexes) that otherwise would have survived into the new SSTable.
>
> We are hoping to contribute it back to the community, so if you'd be
> interested in helping test it out, I'd love to hear from you.
>
> On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson <kr...@gmail.com> wrote:
>
>> We don't have anything like that, do you have a specific use case in mind?
>>
>> Could you create a JIRA ticket and we can discuss there?
>>
>> /Marcus
>>
>> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:
>>
>>> Hello there,
>>>
>>> RocksDB has the feature called "Compaction Filter" to allow application
>>> to modify/delete a key-value during the background compaction.
>>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>>
>>> I'm wondering is there a plan/value to add this into C* as well? Or is
>>> there already a similar thing in C*?
>>>
>>> Thanks
>>>
>>> --
>>> Dikang
>>>
>>>
>>


-- 
Dikang

Re: Compaction Filter in Cassandra

Posted by Eric Stevens <mi...@gmail.com>.
We have been working on filtering compaction for a month or so (though we
call it deleting compaction, its implementation is as a filtering
compaction strategy).  The feature is nearing completion, and we have used
it successfully in a limited production capacity against DSE 4.8 series.

Our use case is that our records are written anywhere between a month, up
to several years before they are scheduled for deletion.  Tombstones are
too expensive, as we have tables with hundreds of billions of rows.  In
addition, traditional TTLs don't work for us because our customers are
permitted to change their retention policy such that already-written
records should not be deleted if they increase their retention after the
record was written (or vice versa).

We can clean up data more cheaply and more quickly with filtered compaction
than with tombstones and traditional compaction.  Our implementation is a
wrapper compaction strategy for another underlying strategy, so that you
can have the characteristics of whichever strategy makes sense in terms of
managing your SSTables, while interceding and removing records during
compaction (including cleaning up secondary indexes) that otherwise would
have survived into the new SSTable.

We are hoping to contribute it back to the community, so if you'd be
interested in helping test it out, I'd love to hear from you.

On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson <kr...@gmail.com> wrote:

> We don't have anything like that, do you have a specific use case in mind?
>
> Could you create a JIRA ticket and we can discuss there?
>
> /Marcus
>
> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:
>
>> Hello there,
>>
>> RocksDB has the feature called "Compaction Filter" to allow application
>> to modify/delete a key-value during the background compaction.
>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>
>> I'm wondering is there a plan/value to add this into C* as well? Or is
>> there already a similar thing in C*?
>>
>> Thanks
>>
>> --
>> Dikang
>>
>>
>

Re: Compaction Filter in Cassandra

Posted by Eric Stevens <mi...@gmail.com>.
We have been working on filtering compaction for a month or so (though we
call it deleting compaction, its implementation is as a filtering
compaction strategy).  The feature is nearing completion, and we have used
it successfully in a limited production capacity against DSE 4.8 series.

Our use case is that our records are written anywhere between a month, up
to several years before they are scheduled for deletion.  Tombstones are
too expensive, as we have tables with hundreds of billions of rows.  In
addition, traditional TTLs don't work for us because our customers are
permitted to change their retention policy such that already-written
records should not be deleted if they increase their retention after the
record was written (or vice versa).

We can clean up data more cheaply and more quickly with filtered compaction
than with tombstones and traditional compaction.  Our implementation is a
wrapper compaction strategy for another underlying strategy, so that you
can have the characteristics of whichever strategy makes sense in terms of
managing your SSTables, while interceding and removing records during
compaction (including cleaning up secondary indexes) that otherwise would
have survived into the new SSTable.

We are hoping to contribute it back to the community, so if you'd be
interested in helping test it out, I'd love to hear from you.

On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson <kr...@gmail.com> wrote:

> We don't have anything like that, do you have a specific use case in mind?
>
> Could you create a JIRA ticket and we can discuss there?
>
> /Marcus
>
> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:
>
>> Hello there,
>>
>> RocksDB has the feature called "Compaction Filter" to allow application
>> to modify/delete a key-value during the background compaction.
>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>
>> I'm wondering is there a plan/value to add this into C* as well? Or is
>> there already a similar thing in C*?
>>
>> Thanks
>>
>> --
>> Dikang
>>
>>
>

Re: Compaction Filter in Cassandra

Posted by Marcus Eriksson <kr...@gmail.com>.
We don't have anything like that, do you have a specific use case in mind?

Could you create a JIRA ticket and we can discuss there?

/Marcus

On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:

> Hello there,
>
> RocksDB has the feature called "Compaction Filter" to allow application to
> modify/delete a key-value during the background compaction.
> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>
> I'm wondering is there a plan/value to add this into C* as well? Or is
> there already a similar thing in C*?
>
> Thanks
>
> --
> Dikang
>
>

Re: Compaction Filter in Cassandra

Posted by Robert Coli <rc...@eventbrite.com>.
On Fri, Mar 11, 2016 at 10:05 PM, Dikang Gu <di...@gmail.com> wrote:

> RocksDB has the feature called "Compaction Filter" to allow application to
> modify/delete a key-value during the background compaction.
> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>
> I'm wondering is there a plan/value to add this into C* as well? Or is
> there already a similar thing in C*?
>

I think it's far more reasonable to do this via an offline tool such as
"sstablefilter" :

https://issues.apache.org/jira/browse/CASSANDRA-1581

I used the internal Digg version of this to purge a bunch of obsolete keys
from a multi-tenancy CF (bad practice). It worked great.

=Rob

Re: Compaction Filter in Cassandra

Posted by Marcus Eriksson <kr...@gmail.com>.
We don't have anything like that, do you have a specific use case in mind?

Could you create a JIRA ticket and we can discuss there?

/Marcus

On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu <di...@gmail.com> wrote:

> Hello there,
>
> RocksDB has the feature called "Compaction Filter" to allow application to
> modify/delete a key-value during the background compaction.
> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>
> I'm wondering is there a plan/value to add this into C* as well? Or is
> there already a similar thing in C*?
>
> Thanks
>
> --
> Dikang
>
>