You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Tharindu Mathew <mc...@gmail.com> on 2011/09/24 09:34:55 UTC

Re: Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration

Would really appreciate any help on this.

On Thu, Sep 22, 2011 at 11:34 PM, Tharindu Mathew <mc...@gmail.com>wrote:

> Hi,
>
> I managed to modify the Hadoop-Cassandra integration to start with a column
> of a CF used for indexing. In the map phase, I get keys from different CFs
> and get the row I need. So this all works fine, for a single node. :)
>
> I'd like to effectively identify a set of nodes for a set of rows and get
> them efficiently into Hadoop. So my initial design was something like this.
>
> Have a new operation in the thrift interface that allows us to do,
>
> Map<(CF+key), List<endpoints>> client.get_endpoints ( List<CF+keys>)
>
> Functionality would be similar to node tools#getEndpoints.
>
> And, then when processing we can get the relevant endpoint relevant to each
> CF and key, through this without querying for node for each and every key.
> If the key is not found in the endpoint (due to node been added/ displaced
> while processing), only then we calculate the relevant end point again.
>
> I'd like to ask from the cassandra devs whether this method sounds the best
> way to do this or to point out any improvements/ flaws in the way I'm
> approaching this?
>
> Thanks in advance.
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/