You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kauzki Aranami <ka...@gmail.com> on 2010/05/03 01:42:06 UTC

How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

Hi

This data structure recognizes to the way based on the idea of
Eventually Consistency of BASE
though Bloom filter is adopted for the data structure in Cassandra as
shape to allow no limited adjustment.

In a word, there is a problem of generating the false positive rate.
Moreover, data is deleted as a common problem to an existing
filesystem of OS and the distributed database including BigTable of
Google.

In the deletion of data, I think that I try to attempt solving by
especially using Interval Tree Clocks of Vector Clock that is a kind
of the logical clock of Lamport.

So question.

How in the Bloom filter to detect the false positive rate, or to
resolve the problem?
My guess is, Merkel Tree and I thought that Tombstone is concerned?


PS. Cassandra has contributed to the Wiki's poor ability to Japanese
translation. :-)

-----------------------------------------------------------
  Kazuki Aranami

 Twitter: http://twitter.com/kimtea
 Email: kazuki.aranami@gmail.com
 http://d.hatena.ne.jp/kazuki-aranami/
 -----------------------------------------------------------

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

Posted by vineet daniel <vi...@gmail.com>.
Reduce GCGraceSeconds in storage.conf, that should work.


On Tue, May 4, 2010 at 2:31 PM, vineet daniel <vi...@gmail.com>wrote:

> Only major compactions can clean out obsolete tombstones.
>
> On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami <ka...@gmail.com>
>> wrote:
>> > Let me rephrase my question.
>> >
>> > How does Cassandra deal with bloom filter's false positives on deleted
>> records?
>>
>> The same way it deals with tombstones that it encounters otherwise
>> (part of a row slice, or in a memtable).
>>
>> All the bloom filter does is keep you from having to check rows that
>> don't have any data at all for a given key.  Tombstones are not the
>> same as "no data at all," we do need to propagate tombstones during
>> replication.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

Posted by vineet daniel <vi...@gmail.com>.
Only major compactions can clean out obsolete tombstones.

On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami <ka...@gmail.com>
> wrote:
> > Let me rephrase my question.
> >
> > How does Cassandra deal with bloom filter's false positives on deleted
> records?
>
> The same way it deals with tombstones that it encounters otherwise
> (part of a row slice, or in a memtable).
>
> All the bloom filter does is keep you from having to check rows that
> don't have any data at all for a given key.  Tombstones are not the
> same as "no data at all," we do need to propagate tombstones during
> replication.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami <ka...@gmail.com> wrote:
> Let me rephrase my question.
>
> How does Cassandra deal with bloom filter's false positives on deleted records?

The same way it deals with tombstones that it encounters otherwise
(part of a row slice, or in a memtable).

All the bloom filter does is keep you from having to check rows that
don't have any data at all for a given key.  Tombstones are not the
same as "no data at all," we do need to propagate tombstones during
replication.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

Posted by Kauzki Aranami <ka...@gmail.com>.
Let me rephrase my question.


How does Cassandra deal with bloom filter's false positives on deleted records?


The bloom filters can answer false positives, especially for deleted
records. How does Cassandra detect them?
And, how does Cassandra remove those *detected* false positives from
the bloom filter?


-----------------------------------------------------------
  Kazuki Aranami

 Twitter: http://twitter.com/kimtea
 Email: kazuki.aranami@gmail.com
 http://d.hatena.ne.jp/kazuki-aranami/
 -----------------------------------------------------------



2010/5/3 Kauzki Aranami <ka...@gmail.com>:
> Hi
>
> This data structure recognizes to the way based on the idea of
> Eventually Consistency of BASE
> though Bloom filter is adopted for the data structure in Cassandra as
> shape to allow no limited adjustment.
>
> In a word, there is a problem of generating the false positive rate.
> Moreover, data is deleted as a common problem to an existing
> filesystem of OS and the distributed database including BigTable of
> Google.
>
> In the deletion of data, I think that I try to attempt solving by
> especially using Interval Tree Clocks of Vector Clock that is a kind
> of the logical clock of Lamport.
>
> So question.
>
> How in the Bloom filter to detect the false positive rate, or to
> resolve the problem?
> My guess is, Merkel Tree and I thought that Tombstone is concerned?
>
>
> PS. Cassandra has contributed to the Wiki's poor ability to Japanese
> translation. :-)
>
> -----------------------------------------------------------
>  Kazuki Aranami
>
>  Twitter: http://twitter.com/kimtea
>  Email: kazuki.aranami@gmail.com
>  http://d.hatena.ne.jp/kazuki-aranami/
>  -----------------------------------------------------------
>