You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Tharindu Mathew <mc...@gmail.com> on 2011/09/22 20:04:01 UTC

Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration

Hi,

I managed to modify the Hadoop-Cassandra integration to start with a column
of a CF used for indexing. In the map phase, I get keys from different CFs
and get the row I need. So this all works fine, for a single node. :)

I'd like to effectively identify a set of nodes for a set of rows and get
them efficiently into Hadoop. So my initial design was something like this.

Have a new operation in the thrift interface that allows us to do,

Map<(CF+key), List<endpoints>> client.get_endpoints ( List<CF+keys>)

Functionality would be similar to node tools#getEndpoints.

And, then when processing we can get the relevant endpoint relevant to each
CF and key, through this without querying for node for each and every key.
If the key is not found in the endpoint (due to node been added/ displaced
while processing), only then we calculate the relevant end point again.

I'd like to ask from the cassandra devs whether this method sounds the best
way to do this or to point out any improvements/ flaws in the way I'm
approaching this?

Thanks in advance.

-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration

Posted by Tharindu Mathew <mc...@gmail.com>.
Would really appreciate any help on this.

On Thu, Sep 22, 2011 at 11:34 PM, Tharindu Mathew <mc...@gmail.com>wrote:

> Hi,
>
> I managed to modify the Hadoop-Cassandra integration to start with a column
> of a CF used for indexing. In the map phase, I get keys from different CFs
> and get the row I need. So this all works fine, for a single node. :)
>
> I'd like to effectively identify a set of nodes for a set of rows and get
> them efficiently into Hadoop. So my initial design was something like this.
>
> Have a new operation in the thrift interface that allows us to do,
>
> Map<(CF+key), List<endpoints>> client.get_endpoints ( List<CF+keys>)
>
> Functionality would be similar to node tools#getEndpoints.
>
> And, then when processing we can get the relevant endpoint relevant to each
> CF and key, through this without querying for node for each and every key.
> If the key is not found in the endpoint (due to node been added/ displaced
> while processing), only then we calculate the relevant end point again.
>
> I'd like to ask from the cassandra devs whether this method sounds the best
> way to do this or to point out any improvements/ flaws in the way I'm
> approaching this?
>
> Thanks in advance.
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration

Posted by Tharindu Mathew <mc...@gmail.com>.
Would really appreciate any help on this.

On Thu, Sep 22, 2011 at 11:34 PM, Tharindu Mathew <mc...@gmail.com>wrote:

> Hi,
>
> I managed to modify the Hadoop-Cassandra integration to start with a column
> of a CF used for indexing. In the map phase, I get keys from different CFs
> and get the row I need. So this all works fine, for a single node. :)
>
> I'd like to effectively identify a set of nodes for a set of rows and get
> them efficiently into Hadoop. So my initial design was something like this.
>
> Have a new operation in the thrift interface that allows us to do,
>
> Map<(CF+key), List<endpoints>> client.get_endpoints ( List<CF+keys>)
>
> Functionality would be similar to node tools#getEndpoints.
>
> And, then when processing we can get the relevant endpoint relevant to each
> CF and key, through this without querying for node for each and every key.
> If the key is not found in the endpoint (due to node been added/ displaced
> while processing), only then we calculate the relevant end point again.
>
> I'd like to ask from the cassandra devs whether this method sounds the best
> way to do this or to point out any improvements/ flaws in the way I'm
> approaching this?
>
> Thanks in advance.
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration

Posted by Tharindu Mathew <mc...@gmail.com>.
No Mick. I wasn't exactly trying to do this.

But start from a wide row with a very large number of columns, and allow a
Hadoop integration based on that.

So basically, in the map phase, we'll be dealing with multiple CFs, and
hence I wanted to solve this problem of identifying which key/CF belongs to
which Cassandra node.

On Mon, Sep 26, 2011 at 12:37 AM, Mick Semb Wever <mc...@apache.org> wrote:

> On Thu, 2011-09-22 at 23:34 +0530, Tharindu Mathew wrote:
> > I managed to modify the Hadoop-Cassandra integration to start with a
> > column of a CF used for indexing.
>
> Are you chasing CASSANDRA-2878 here?
> The above issue is waiting on CASSANDRA-1600 which in turn in waiting on
> CASSANDRA-1034
>
> ~mck
>
> --
> "Although the Buddhists will tell you that desire is the root of
> suffering, my personal experience leads me to point the finger at system
> administration." Philip Greenspun
>
> | http://semb.wever.org | http://sesat.no |
> | http://tech.finn.no   | Java XSS Filter |
>



-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Re: Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration

Posted by Mick Semb Wever <mc...@apache.org>.
On Thu, 2011-09-22 at 23:34 +0530, Tharindu Mathew wrote:
> I managed to modify the Hadoop-Cassandra integration to start with a
> column of a CF used for indexing. 

Are you chasing CASSANDRA-2878 here?
The above issue is waiting on CASSANDRA-1600 which in turn in waiting on
CASSANDRA-1034

~mck

-- 
"Although the Buddhists will tell you that desire is the root of
suffering, my personal experience leads me to point the finger at system
administration." Philip Greenspun 

| http://semb.wever.org | http://sesat.no |
| http://tech.finn.no   | Java XSS Filter |