You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Brent N. Chun" <bn...@nutanix.com> on 2010/07/08 09:21:44 UTC

Reading all rows in a column family in parallel

Hello,

I'm running Cassandra 0.6.0 on a cluster and have an application that 
needs to read all rows from a column family using the Cassandra Thrift 
API. Ideally, I'd like to be able to do this by having all nodes in the 
cluster read in parallel (i.e., each node reads a disjoint set of rows 
that are stored locally). I should also mention that I'm using the 
RandomPartitioner.

Here's what I was thinking:

   1. Have one node invoke describe_ring to find the token range on the 
ring that each node is responsible for.

   2. For each token range, have the node that owns that portion of the 
ring read the rows in that range using a sequence of get_range_slices 
calls (using start/end tokens, not keys).

This type of functionality seems to already be there in the tree with 
the recent Cassandra/Hadoop integration.

...
KeyRange keyRange = new KeyRange(batchRowCount)
         .setStart_token(startToken)
         .setEnd_token(split.getEndToken());
try
{
     rows = client.get_range_slices(new ColumnParent(cfName),
            predicate,
            keyRange,
            ConsistencyLevel.ONE);
      ...

     // prepare for the next slice to be read
     KeySlice lastRow = rows.get(rows.size() - 1);
     IPartitioner p = DatabaseDescriptor.getPartitioner();
     byte[] rowkey = lastRow.getKey();
     startToken = p.getTokenFactory().toString(p.getToken(rowkey));
...

The above snippet from ColumnFamilyRecordReader.java seems to suggest it 
is possible to scan an entire column family by reading disjoint sets of 
rows using token-based range queries (as opposed to key-based range 
queries). Is this possible in 0.6.0? (Note: for the next startToken, I 
was just planning on computing the MD5 digest of the last key directly 
since I'm accessing Cassandra through Thrift.)

Thoughts?

bnc

Re: Reading all rows in a column family in parallel

Posted by "Brent N. Chun" <bn...@nutanix.com>.

Jonathan Ellis wrote:
> There have been a number of bug fixes to this since 0.6.0 -- as Thomas
> said, it works in 0.6.3.  (Although there is one related bug scheduled
> to be fixed in 0.6.4,
> https://issues.apache.org/jira/browse/CASSANDRA-1042)

Ah, this is exactly one of the cases I've been seeing! Thanks, Jonathan.

bnc

Re: Reading all rows in a column family in parallel

Posted by Jonathan Ellis <jb...@gmail.com>.

There have been a number of bug fixes to this since 0.6.0 -- as Thomas
said, it works in 0.6.3.  (Although there is one related bug scheduled
to be fixed in 0.6.4,
https://issues.apache.org/jira/browse/CASSANDRA-1042)

On Thu, Jul 8, 2010 at 2:06 PM, Brent N. Chun <bn...@nutanix.com> wrote:
> Hi Jonathan,
>
> The code snippet below was from the repository. I mentioned 0.6.0
> specifically just to confirm that reading a CF using token-based range
> queries with the RandomPartitioner should (or shouldn't) also work in that
> version. I've seen discussions about whether range queries are now supported
> with the RandomPartitioner for example. Moreover, those discussions mostly
> seem to involve key-based range queries, though, not token-based range
> queries like CFRR uses. If you're saying that this functionality essentially
> works for everyone but me in 0.6.0, then that implies I have a bug in my
> code which would be great news for me. What I'm essentially seeing is either
> all rows, all rows + duplicate rows, or missing rows even when using a
> single node. Which of these I get is entirely deterministic. If I delete all
> the data and insert the same rows, the ranges returned by describe_ring
> changes but the end result of reading the CF is then one of those three
> cases.
>
> Thanks,
> bnc
>
> Jonathan Ellis wrote:
>>
>> "CFRR does this.  Is this possible?"
>>
>> I guess I don't understand the question. :)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Reading all rows in a column family in parallel

Posted by "Brent N. Chun" <bn...@nutanix.com>.

Hi Jonathan,

The code snippet below was from the repository. I mentioned 0.6.0 
specifically just to confirm that reading a CF using token-based range 
queries with the RandomPartitioner should (or shouldn't) also work in 
that version. I've seen discussions about whether range queries are now 
supported with the RandomPartitioner for example. Moreover, those 
discussions mostly seem to involve key-based range queries, though, not 
token-based range queries like CFRR uses. If you're saying that this 
functionality essentially works for everyone but me in 0.6.0, then that 
implies I have a bug in my code which would be great news for me. What 
I'm essentially seeing is either all rows, all rows + duplicate rows, or 
missing rows even when using a single node. Which of these I get is 
entirely deterministic. If I delete all the data and insert the same 
rows, the ranges returned by describe_ring changes but the end result of 
reading the CF is then one of those three cases.

Thanks,
bnc

Jonathan Ellis wrote:
> "CFRR does this.  Is this possible?"
> 
> I guess I don't understand the question. :)

Re: Reading all rows in a column family in parallel

Posted by Jonathan Ellis <jb...@gmail.com>.

"CFRR does this.  Is this possible?"

I guess I don't understand the question. :)

On Thu, Jul 8, 2010 at 2:21 AM, Brent N. Chun <bn...@nutanix.com> wrote:
> Hello,
>
> I'm running Cassandra 0.6.0 on a cluster and have an application that needs
> to read all rows from a column family using the Cassandra Thrift API.
> Ideally, I'd like to be able to do this by having all nodes in the cluster
> read in parallel (i.e., each node reads a disjoint set of rows that are
> stored locally). I should also mention that I'm using the RandomPartitioner.
>
> Here's what I was thinking:
>
>  1. Have one node invoke describe_ring to find the token range on the ring
> that each node is responsible for.
>
>  2. For each token range, have the node that owns that portion of the ring
> read the rows in that range using a sequence of get_range_slices calls
> (using start/end tokens, not keys).
>
> This type of functionality seems to already be there in the tree with the
> recent Cassandra/Hadoop integration.
>
> ...
> KeyRange keyRange = new KeyRange(batchRowCount)
>        .setStart_token(startToken)
>        .setEnd_token(split.getEndToken());
> try
> {
>    rows = client.get_range_slices(new ColumnParent(cfName),
>           predicate,
>           keyRange,
>           ConsistencyLevel.ONE);
>     ...
>
>    // prepare for the next slice to be read
>    KeySlice lastRow = rows.get(rows.size() - 1);
>    IPartitioner p = DatabaseDescriptor.getPartitioner();
>    byte[] rowkey = lastRow.getKey();
>    startToken = p.getTokenFactory().toString(p.getToken(rowkey));
> ...
>
> The above snippet from ColumnFamilyRecordReader.java seems to suggest it is
> possible to scan an entire column family by reading disjoint sets of rows
> using token-based range queries (as opposed to key-based range queries). Is
> this possible in 0.6.0? (Note: for the next startToken, I was just planning
> on computing the MD5 digest of the last key directly since I'm accessing
> Cassandra through Thrift.)
>
> Thoughts?
>
> bnc
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Reading all rows in a column family in parallel

Posted by "Brent N. Chun" <bn...@nutanix.com>.

Thomas Heller wrote:
> Hey,
> 
>> .... Is
>> this possible in 0.6.0? (Note: for the next startToken, I was just planning
>> on computing the MD5 digest of the last key directly since I'm accessing
>> Cassandra through Thrift.)
> 
> Can't speak for 0.6.0 but it works for 0.6.3.
> 
> Just implemented this in ruby (minus the parallel part).
> 
> Cheers,
> /thomas

Hm, I must be doing something fundamentally wrong then. I just tried 0.6.3, same 
result. In this example, I have a 1 node system and have 100 rows in a single 
CF. When trying to read it back using token-based range queries and a 
RandomPartitioner, I get the following below (only 33/100 rows returned).

Now the 100 rows have keys that hash to random points on the ring. In the 
example below, I'm reading rows in chunks of 20.

In the first range query, the initial range is the entire ring. The 20 rows 
returned have MD5 hashes in no particular order it seems and could be anywhere 
on the ring. Taking the MD5 hash of the last row's key, I start the second range 
query.

In the second range query ( 292996472659622939455744264432842142924, 
34571752641348786448680284622901156834 ], what's being returned below seems like 
exactly what it suggests: return rows in the above range of MD5 hashes. But some 
of the remaining 80 rows we want may be outside that range. Hence, only 33 rows 
below.

If the rows were being returned in the token-based range queries were in in MD5 
hash order (and handled wraps ideally), then it seems like this interface could 
work. But others seem to be using this functionality successfully, so that 
suggests this is somehow unnecessary. Can someone help me out here?

Thanks,
bnc

--------------------------------------------------------------------------------

Scanning range 0 ( 34571752641348786448680284622901156834, 
34571752641348786448680284622901156834 ]
Scanning chunk ( 34571752641348786448680284622901156834, 
34571752641348786448680284622901156834 ] in range 0
Read 20 rows
Read row 0, token 336932469034906281211924193433194809371, key 0_my_key62
Read row 1, token 5919946189209861803345840641668714978, key G_my_key16
Read row 2, token 6676056754427192599913432294390467082, key N_my_key85
Read row 3, token 330974738873996707017206868970060026330, key 6_my_key6
Read row 4, token 9595097897929687061907189837471352784, key E_my_key14
Read row 5, token 16575788966172751729835323651471549632, key a_my_key98
Read row 6, token 20927090112620661198733690835293074593, key 5_my_key67
Read row 7, token 28411545431179372696834683157677733478, key B_my_key73
Read row 8, token 29636277939148773659952116897998650776, key Q_my_key26
Read row 9, token 31186550159320208451777665196866508345, key j_my_key45
Read row 10, token 309081729348188654502493750295907191249, key D_my_key75
Read row 11, token 308480936859450293438865473928962136114, key W_my_key32
Read row 12, token 33060929359846763792204741553927689627, key Q_my_key88
Read row 13, token 36834373239213294576855495985365240744, key D_my_key13
Read row 14, token 302818545694924710056493830778421143168, key C_my_key12
Read row 15, token 39723252966237722984897584840501933181, key I_my_key18
Read row 16, token 297899763604776667052026292305780186395, key 2_my_key2
Read row 17, token 45994786947573748381278100108617428931, key U_my_key92
Read row 18, token 294076607175826631726358986726954934589, key T_my_key29
Read row 19, token 292996472659622939455744264432842142924, key M_my_key84
Scanning chunk ( 292996472659622939455744264432842142924, 
34571752641348786448680284622901156834 ] in range 0
Read 13 rows
Read row 20, token 336932469034906281211924193433194809371, key 0_my_key62
Read row 21, token 5919946189209861803345840641668714978, key G_my_key16
Read row 22, token 6676056754427192599913432294390467082, key N_my_key85
Read row 23, token 330974738873996707017206868970060026330, key 6_my_key6
Read row 24, token 9595097897929687061907189837471352784, key E_my_key14
Read row 25, token 16575788966172751729835323651471549632, key a_my_key98
Read row 26, token 20927090112620661198733690835293074593, key 5_my_key67
Read row 27, token 28411545431179372696834683157677733478, key B_my_key73
Read row 28, token 29636277939148773659952116897998650776, key Q_my_key26
Read row 29, token 31186550159320208451777665196866508345, key j_my_key45
Read row 30, token 309081729348188654502493750295907191249, key D_my_key75
Read row 31, token 308480936859450293438865473928962136114, key W_my_key32
Read row 32, token 33060929359846763792204741553927689627, key Q_my_key88
Scanning chunk ( 33060929359846763792204741553927689627, 
34571752641348786448680284622901156834 ] in range 0
Read 0 rows

--------------------------------------------------------------------------------

Re: Reading all rows in a column family in parallel

Posted by Thomas Heller <in...@zilence.net>.

Hey,

> .... Is
> this possible in 0.6.0? (Note: for the next startToken, I was just planning
> on computing the MD5 digest of the last key directly since I'm accessing
> Cassandra through Thrift.)

Can't speak for 0.6.0 but it works for 0.6.3.

Just implemented this in ruby (minus the parallel part).

Cheers,
/thomas