You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Tobias Jungen <to...@gmail.com> on 2010/05/08 02:21:16 UTC

BinaryMemtable and collisions

Greetings,

Started getting my feet wet with Cassandra in earnest this week. I'm
building a custom inverted index of sorts on top of Cassandra, in part
inspired by the work of Jake Luciani in Lucandra. I've successfully loaded
nearly a million documents over a 3-node cluster, and initial query tests
look promising.

The problem is that our target use case has hundreds of millions of
documents (each document is very small however). Loading time will be an
important factor. I've investigated using the BinaryMemtable interface (as
found in contrib/bmt_example) to speed up bulk insertion. I have a prototype
up that successfully inserts data using BMT, but there is a problem.

If I perform multiple writes for the same row key & column family, the row
ends up containing only one of the writes. I'm guessing this is because with
BMT I need to group all writes for a given row key & column family into one
operation, rather than doing it incrementally as is possible with the thrift
interface. Hadoop obviously is the solution for doing such a grouping.
Unfortunately, we can't perform such a process over our entire dataset, we
will need to do it in increments.

So my question is: If I properly flush every node after performing a larger
bulk insert, can Cassandra merge multiple writes on a single row & column
family when using the BMT interface? Or is using BMT only feasible for
loading data on rows that don't exist yet?

Thanks in advance,
Toby Jungen

Re: BinaryMemtable and collisions

Posted by Tobias Jungen <to...@gmail.com>.

> Yes. When you flush from BMT, its like any other SSTable. Cassandra will
> merge them through compaction.
>
>
That's good news, thanks for clarifying!

A few more related questions:

Are there any problems with issuing the flush command directly from code at
the end up a bulk insert? The BMT example mentions running nodetool, but
poking around the Cassandra source seems to indicate it should be doable
programmatically.

Also, in my BMT prototype I've noticed that the JVM won't exit after
completion, so I have to hard kill it (ctrl-c). A thread dump shows that
some of Cassandra's network threads are still open, keeping the JVM from
exiting. Some digging revealed that Cassandra isn't designed with a "clean"
shutdown in mind, so perhaps such behavior is expected. Still, it is a bit
unsettling since the cluster nodes log an error after I kill the client
node. Is calling StorageService.stopClient enough to ensure that any
client-side buffers are flushed and writes are completed?

Finally, the wiki page on BMT (
http://wiki.apache.org/cassandra/BinaryMemtable) suggests using
StorageProxy, but the example in contrib does not. Under the hood, both
StorageProxy and the contrib example call MessagingService.sendOneWay. The
additional code in StorageProxy seems mostly related to the extra
bookkeeping associated with hinted handoff and waiting on write acks.
Perhaps that extra work may not be entirely necessary for a bulk load
operation?

That should be enough questions from me for a while. :)

-Toby

Re: BinaryMemtable and collisions

Posted by Chris Goffinet <go...@digg.com>.

> 
> So my question is: If I properly flush every node after performing a larger bulk insert, can Cassandra merge multiple writes on a single row & column family when using the BMT interface? Or is using BMT only feasible for loading data on rows that don't exist yet?
> 

Yes. When you flush from BMT, its like any other SSTable. Cassandra will merge them through compaction.

> Thanks in advance,
> Toby Jungen
> 
> 
>

Re: BinaryMemtable and collisions

Posted by Jake Luciani <ja...@gmail.com>.

Got it.  I'm working on making term vectors optional and just store
frequency in this case.  Just FYI.

On Sat, May 8, 2010 at 1:17 AM, Tobias Jungen <to...@gmail.com>wrote:

> Without going into too much depth: Our retrieval model is a bit more
> structured than standard lucene retrieval, and I'm trying to leverage that
> structure. Some of the terms we're going to retrieve against have high
> occurrence, and because of that I'm worried about getting killed by
> processing large term vectors. Instead I'm trying to index on term
> relationships, if that makes sense.
>
>
> On Sat, May 8, 2010 at 12:09 AM, Jake Luciani <ja...@gmail.com> wrote:
>
>> Any reason why you aren't using Lucandra directly?
>>
>>
>> On Fri, May 7, 2010 at 8:21 PM, Tobias Jungen <to...@gmail.com>wrote:
>>
>>> Greetings,
>>>
>>> Started getting my feet wet with Cassandra in earnest this week. I'm
>>> building a custom inverted index of sorts on top of Cassandra, in part
>>> inspired by the work of Jake Luciani in Lucandra. I've successfully loaded
>>> nearly a million documents over a 3-node cluster, and initial query tests
>>> look promising.
>>>
>>> The problem is that our target use case has hundreds of millions of
>>> documents (each document is very small however). Loading time will be an
>>> important factor. I've investigated using the BinaryMemtable interface (as
>>> found in contrib/bmt_example) to speed up bulk insertion. I have a prototype
>>> up that successfully inserts data using BMT, but there is a problem.
>>>
>>> If I perform multiple writes for the same row key & column family, the
>>> row ends up containing only one of the writes. I'm guessing this is because
>>> with BMT I need to group all writes for a given row key & column family into
>>> one operation, rather than doing it incrementally as is possible with the
>>> thrift interface. Hadoop obviously is the solution for doing such a
>>> grouping. Unfortunately, we can't perform such a process over our entire
>>> dataset, we will need to do it in increments.
>>>
>>> So my question is: If I properly flush every node after performing a
>>> larger bulk insert, can Cassandra merge multiple writes on a single row &
>>> column family when using the BMT interface? Or is using BMT only feasible
>>> for loading data on rows that don't exist yet?
>>>
>>> Thanks in advance,
>>> Toby Jungen
>>>
>>>
>>>
>>>
>>
>

Re: BinaryMemtable and collisions

Posted by Tobias Jungen <to...@gmail.com>.

Without going into too much depth: Our retrieval model is a bit more
structured than standard lucene retrieval, and I'm trying to leverage that
structure. Some of the terms we're going to retrieve against have high
occurrence, and because of that I'm worried about getting killed by
processing large term vectors. Instead I'm trying to index on term
relationships, if that makes sense.

On Sat, May 8, 2010 at 12:09 AM, Jake Luciani <ja...@gmail.com> wrote:

> Any reason why you aren't using Lucandra directly?
>
>
> On Fri, May 7, 2010 at 8:21 PM, Tobias Jungen <to...@gmail.com>wrote:
>
>> Greetings,
>>
>> Started getting my feet wet with Cassandra in earnest this week. I'm
>> building a custom inverted index of sorts on top of Cassandra, in part
>> inspired by the work of Jake Luciani in Lucandra. I've successfully loaded
>> nearly a million documents over a 3-node cluster, and initial query tests
>> look promising.
>>
>> The problem is that our target use case has hundreds of millions of
>> documents (each document is very small however). Loading time will be an
>> important factor. I've investigated using the BinaryMemtable interface (as
>> found in contrib/bmt_example) to speed up bulk insertion. I have a prototype
>> up that successfully inserts data using BMT, but there is a problem.
>>
>> If I perform multiple writes for the same row key & column family, the row
>> ends up containing only one of the writes. I'm guessing this is because with
>> BMT I need to group all writes for a given row key & column family into one
>> operation, rather than doing it incrementally as is possible with the thrift
>> interface. Hadoop obviously is the solution for doing such a grouping.
>> Unfortunately, we can't perform such a process over our entire dataset, we
>> will need to do it in increments.
>>
>> So my question is: If I properly flush every node after performing a
>> larger bulk insert, can Cassandra merge multiple writes on a single row &
>> column family when using the BMT interface? Or is using BMT only feasible
>> for loading data on rows that don't exist yet?
>>
>> Thanks in advance,
>> Toby Jungen
>>
>>
>>
>>
>

Re: BinaryMemtable and collisions

Posted by Jake Luciani <ja...@gmail.com>.

Any reason why you aren't using Lucandra directly?

On Fri, May 7, 2010 at 8:21 PM, Tobias Jungen <to...@gmail.com>wrote:

> Greetings,
>
> Started getting my feet wet with Cassandra in earnest this week. I'm
> building a custom inverted index of sorts on top of Cassandra, in part
> inspired by the work of Jake Luciani in Lucandra. I've successfully loaded
> nearly a million documents over a 3-node cluster, and initial query tests
> look promising.
>
> The problem is that our target use case has hundreds of millions of
> documents (each document is very small however). Loading time will be an
> important factor. I've investigated using the BinaryMemtable interface (as
> found in contrib/bmt_example) to speed up bulk insertion. I have a prototype
> up that successfully inserts data using BMT, but there is a problem.
>
> If I perform multiple writes for the same row key & column family, the row
> ends up containing only one of the writes. I'm guessing this is because with
> BMT I need to group all writes for a given row key & column family into one
> operation, rather than doing it incrementally as is possible with the thrift
> interface. Hadoop obviously is the solution for doing such a grouping.
> Unfortunately, we can't perform such a process over our entire dataset, we
> will need to do it in increments.
>
> So my question is: If I properly flush every node after performing a larger
> bulk insert, can Cassandra merge multiple writes on a single row & column
> family when using the BMT interface? Or is using BMT only feasible for
> loading data on rows that don't exist yet?
>
> Thanks in advance,
> Toby Jungen
>
>
>
>