You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Naama Kraus <na...@gmail.com> on 2008/07/14 06:59:49 UTC

Moving HBase MR computation close to data

Hi,

I've peeked at HBase code, TableSplit#getLocations(). I noticed that the
method returns a random node for now. I was trying to think what should be
returned if one wishes to have computation close to data. As a table split
is per region, I could think of returning the node managing that region
(region server). That would get computation close to data at HBase level,
but not necessarily at file system level. As HStoreFiles are stored in HDFS,
their actual location could be on a remote node, and not the region server
node.
Can anyone comment on my flow of thinking ? Am I wrong somewhere ?

To sum up, I'd like to understand if there is any notion of bringing
computation close to data when working with HBase ? If so, are there any
plans to implement it in future releases ? Or is the current implementation
good enough (if so, can you explain why ?).

Thanks for any input, Naama

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Moving HBase MR computation close to data

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

I opened a ticket for 3.0 for the same issue
https://issues.apache.org/jira/browse/HBASE-675

vote for it if you thanks we need it.

Billy

"Naama Kraus" <na...@gmail.com> wrote in 
message news:643aa4870807141004q9011f72ubbbd94d9d3414884@mail.gmail.com...
> Well, both questions are around the same topic (sorry for being buggy 
> about
> it :-) though I am not sure they are identical.
>
> What I was trying to understand in this post - in a future implementation 
> of
> TableSplit#getLocations(), will  it return the HRegion server node of the
> split, or the nodes where the HStoreFiles (composing the split) blocks are
> actually located. (Or none of the above ...)
>
> Naama
>
> On Mon, Jul 14, 2008 at 5:04 PM, Jean-Daniel Cryans 
> <jd...@gmail.com>
> wrote:
>
>> Naama,
>>
>> I think that Jim already answered your questions in this thread
>> http://markmail.org/message/jdf2mfg3g2tsiswe
>>
>> J-D
>>
>> On Mon, Jul 14, 2008 at 12:59 AM, Naama Kraus 
>> <na...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I've peeked at HBase code, TableSplit#getLocations(). I noticed that 
>> > the
>> > method returns a random node for now. I was trying to think what should
>> be
>> > returned if one wishes to have computation close to data. As a table
>> split
>> > is per region, I could think of returning the node managing that region
>> > (region server). That would get computation close to data at HBase 
>> > level,
>> > but not necessarily at file system level. As HStoreFiles are stored in
>> > HDFS,
>> > their actual location could be on a remote node, and not the region
>> server
>> > node.
>> > Can anyone comment on my flow of thinking ? Am I wrong somewhere ?
>> >
>> > To sum up, I'd like to understand if there is any notion of bringing
>> > computation close to data when working with HBase ? If so, are there 
>> > any
>> > plans to implement it in future releases ? Or is the current
>> implementation
>> > good enough (if so, can you explain why ?).
>> >
>> > Thanks for any input, Naama
>> >
>> > --
>> > oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00
>> oo
>> > 00 oo 00 oo
>> > "If you want your children to be intelligent, read them fairy tales. If
>> you
>> > want them to be more intelligent, read them more fairy tales." (Albert
>> > Einstein)
>> >
>>
>
>
>
> -- 
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy tales. If 
> you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)
>

Re: Moving HBase MR computation close to data

Posted by Naama Kraus <na...@gmail.com>.

Well, both questions are around the same topic (sorry for being buggy about
it :-) though I am not sure they are identical.

What I was trying to understand in this post - in a future implementation of
TableSplit#getLocations(), will  it return the HRegion server node of the
split, or the nodes where the HStoreFiles (composing the split) blocks are
actually located. (Or none of the above ...)

Naama

On Mon, Jul 14, 2008 at 5:04 PM, Jean-Daniel Cryans <jd...@gmail.com>
wrote:

> Naama,
>
> I think that Jim already answered your questions in this thread
> http://markmail.org/message/jdf2mfg3g2tsiswe
>
> J-D
>
> On Mon, Jul 14, 2008 at 12:59 AM, Naama Kraus <na...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I've peeked at HBase code, TableSplit#getLocations(). I noticed that the
> > method returns a random node for now. I was trying to think what should
> be
> > returned if one wishes to have computation close to data. As a table
> split
> > is per region, I could think of returning the node managing that region
> > (region server). That would get computation close to data at HBase level,
> > but not necessarily at file system level. As HStoreFiles are stored in
> > HDFS,
> > their actual location could be on a remote node, and not the region
> server
> > node.
> > Can anyone comment on my flow of thinking ? Am I wrong somewhere ?
> >
> > To sum up, I'd like to understand if there is any notion of bringing
> > computation close to data when working with HBase ? If so, are there any
> > plans to implement it in future releases ? Or is the current
> implementation
> > good enough (if so, can you explain why ?).
> >
> > Thanks for any input, Naama
> >
> > --
> > oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00
> oo
> > 00 oo 00 oo
> > "If you want your children to be intelligent, read them fairy tales. If
> you
> > want them to be more intelligent, read them more fairy tales." (Albert
> > Einstein)
> >
>



-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Moving HBase MR computation close to data

Posted by Jean-Daniel Cryans <jd...@gmail.com>.

Naama,

I think that Jim already answered your questions in this thread
http://markmail.org/message/jdf2mfg3g2tsiswe

J-D

On Mon, Jul 14, 2008 at 12:59 AM, Naama Kraus <na...@gmail.com> wrote:

> Hi,
>
> I've peeked at HBase code, TableSplit#getLocations(). I noticed that the
> method returns a random node for now. I was trying to think what should be
> returned if one wishes to have computation close to data. As a table split
> is per region, I could think of returning the node managing that region
> (region server). That would get computation close to data at HBase level,
> but not necessarily at file system level. As HStoreFiles are stored in
> HDFS,
> their actual location could be on a remote node, and not the region server
> node.
> Can anyone comment on my flow of thinking ? Am I wrong somewhere ?
>
> To sum up, I'd like to understand if there is any notion of bringing
> computation close to data when working with HBase ? If so, are there any
> plans to implement it in future releases ? Or is the current implementation
> good enough (if so, can you explain why ?).
>
> Thanks for any input, Naama
>
> --
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy tales. If you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)
>