You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Biedermann,S.,Fa. Post Direkt" <S....@postdirekt.de> on 2011/04/08 13:18:23 UTC

data locality for reducer writes?

Hi,

 

we have a number of Reducer task each writing a bunch of rows into the
latest HBase via Puts.

What is working is that each Reducer only creates Puts for one single
Region by using HRegionPartionioner.

 

However, we are seeing that the Region flush itself is not local, but
going to some other node in the cluster. This puts load on the network.

We'd like to see that instead the Reducer would be run on the same node
where the region is served.

 

Is that possible?

Any ideas or suggestions?

 

Sven

AW: data locality for reducer writes?

Posted by "Biedermann,S.,Fa. Post Direkt" <S....@postdirekt.de>.

Many thanks for giving me this explanation of <data flow>.

I'll have a closer look at the incremental bulk load.



-----Ursprüngliche Nachricht-----
Von: jdcryans@gmail.com [mailto:jdcryans@gmail.com] Im Auftrag von Jean-Daniel Cryans
Gesendet: Mittwoch, 13. April 2011 19:10
An: user@hbase.apache.org
Betreff: Re: data locality for reducer writes?

It's not just a matter of transferring the data from the reducer to
the region server, you have to take into account that that data is
also replicated to other nodes.

So in a suboptimal setup you have:

Reducer  -> Network -> RegionServer -> Local Datanode -> Network ->
Remote Datanode1 -> Network -> Remote Datanode2

What you are trying to get is:

Reducer  -> Local RegionServer -> Local Datanode -> Network -> Remote
Datanode1 -> Network -> Remote Datanode2

Subsequent flushes of the inserted data will also follow the latest
pattern. That's what I meant earlier when I said the gain would be
marginal, you're only saving one network trip among many others. Also
I took a look at the JobTracker code and modifying it doesn't look so
easy.

Instead, since you already use the HRegionPartionioner, why don't you
do an incremental bulk load? http://hbase.apache.org/bulk-loads.html

J-D

On Wed, Apr 13, 2011 at 7:49 AM, Biedermann,S.,Fa. Post Direkt
<S....@postdirekt.de> wrote:
> Hi Jean-Daniel,
>
> thx for your reply.
>
> What I assume is that the total network load during reduce is O(n) with n the number of nodes in the cluster. We saw a major performance loss in the reduce step when our network degraded to 100Mbit by accident (1h vs. 13 minutes).
>
> With more nodes I see 2 options:
>
> 1) using switches with a higher switching capacity
> 2) improve hbase/hadoop's assignment of reduce task to those nodes which serve the corresponding hbase regions.
>
> What do you think?
>
> Sven
>
> -----Ursprüngliche Nachricht-----
> Von: jdcryans@gmail.com [mailto:jdcryans@gmail.com] Im Auftrag von Jean-Daniel Cryans
> Gesendet: Freitag, 8. April 2011 18:04
> An: user@hbase.apache.org
> Betreff: Re: data locality for reducer writes?
>
> Unfortunately it seems that there's nothing in the OutputFormat
> interface that we could implement (like getSplits in the InputFormat)
> to inform the JobTracker of the location of the regions. It kinda make
> sense, since when you're writing to HDFS in a "normal" MR job you
> always write to the local DataNode (well if there's one), but even
> then it is replicated to two other nodes. IMO even if we had that the
> gain would be marginal.
>
> J-D
>
> On Fri, Apr 8, 2011 at 4:18 AM, Biedermann,S.,Fa. Post Direkt
> <S....@postdirekt.de> wrote:
>> Hi,
>>
>>
>>
>> we have a number of Reducer task each writing a bunch of rows into the
>> latest HBase via Puts.
>>
>> What is working is that each Reducer only creates Puts for one single
>> Region by using HRegionPartionioner.
>>
>>
>>
>> However, we are seeing that the Region flush itself is not local, but
>> going to some other node in the cluster. This puts load on the network.
>>
>> We'd like to see that instead the Reducer would be run on the same node
>> where the region is served.
>>
>>
>>
>> Is that possible?
>>
>> Any ideas or suggestions?
>>
>>
>>
>> Sven
>>
>>
>

Re: data locality for reducer writes?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

It's not just a matter of transferring the data from the reducer to
the region server, you have to take into account that that data is
also replicated to other nodes.

So in a suboptimal setup you have:

Reducer  -> Network -> RegionServer -> Local Datanode -> Network ->
Remote Datanode1 -> Network -> Remote Datanode2

What you are trying to get is:

Reducer  -> Local RegionServer -> Local Datanode -> Network -> Remote
Datanode1 -> Network -> Remote Datanode2

Subsequent flushes of the inserted data will also follow the latest
pattern. That's what I meant earlier when I said the gain would be
marginal, you're only saving one network trip among many others. Also
I took a look at the JobTracker code and modifying it doesn't look so
easy.

Instead, since you already use the HRegionPartionioner, why don't you
do an incremental bulk load? http://hbase.apache.org/bulk-loads.html

J-D

On Wed, Apr 13, 2011 at 7:49 AM, Biedermann,S.,Fa. Post Direkt
<S....@postdirekt.de> wrote:
> Hi Jean-Daniel,
>
> thx for your reply.
>
> What I assume is that the total network load during reduce is O(n) with n the number of nodes in the cluster. We saw a major performance loss in the reduce step when our network degraded to 100Mbit by accident (1h vs. 13 minutes).
>
> With more nodes I see 2 options:
>
> 1) using switches with a higher switching capacity
> 2) improve hbase/hadoop's assignment of reduce task to those nodes which serve the corresponding hbase regions.
>
> What do you think?
>
> Sven
>
> -----Ursprüngliche Nachricht-----
> Von: jdcryans@gmail.com [mailto:jdcryans@gmail.com] Im Auftrag von Jean-Daniel Cryans
> Gesendet: Freitag, 8. April 2011 18:04
> An: user@hbase.apache.org
> Betreff: Re: data locality for reducer writes?
>
> Unfortunately it seems that there's nothing in the OutputFormat
> interface that we could implement (like getSplits in the InputFormat)
> to inform the JobTracker of the location of the regions. It kinda make
> sense, since when you're writing to HDFS in a "normal" MR job you
> always write to the local DataNode (well if there's one), but even
> then it is replicated to two other nodes. IMO even if we had that the
> gain would be marginal.
>
> J-D
>
> On Fri, Apr 8, 2011 at 4:18 AM, Biedermann,S.,Fa. Post Direkt
> <S....@postdirekt.de> wrote:
>> Hi,
>>
>>
>>
>> we have a number of Reducer task each writing a bunch of rows into the
>> latest HBase via Puts.
>>
>> What is working is that each Reducer only creates Puts for one single
>> Region by using HRegionPartionioner.
>>
>>
>>
>> However, we are seeing that the Region flush itself is not local, but
>> going to some other node in the cluster. This puts load on the network.
>>
>> We'd like to see that instead the Reducer would be run on the same node
>> where the region is served.
>>
>>
>>
>> Is that possible?
>>
>> Any ideas or suggestions?
>>
>>
>>
>> Sven
>>
>>
>

AW: data locality for reducer writes?

Posted by "Biedermann,S.,Fa. Post Direkt" <S....@postdirekt.de>.

Hi Jean-Daniel,

thx for your reply.

What I assume is that the total network load during reduce is O(n) with n the number of nodes in the cluster. We saw a major performance loss in the reduce step when our network degraded to 100Mbit by accident (1h vs. 13 minutes). 

With more nodes I see 2 options: 

1) using switches with a higher switching capacity 
2) improve hbase/hadoop's assignment of reduce task to those nodes which serve the corresponding hbase regions.

What do you think?

Sven

-----Ursprüngliche Nachricht-----
Von: jdcryans@gmail.com [mailto:jdcryans@gmail.com] Im Auftrag von Jean-Daniel Cryans
Gesendet: Freitag, 8. April 2011 18:04
An: user@hbase.apache.org
Betreff: Re: data locality for reducer writes?

Unfortunately it seems that there's nothing in the OutputFormat
interface that we could implement (like getSplits in the InputFormat)
to inform the JobTracker of the location of the regions. It kinda make
sense, since when you're writing to HDFS in a "normal" MR job you
always write to the local DataNode (well if there's one), but even
then it is replicated to two other nodes. IMO even if we had that the
gain would be marginal.

J-D

On Fri, Apr 8, 2011 at 4:18 AM, Biedermann,S.,Fa. Post Direkt
<S....@postdirekt.de> wrote:
> Hi,
>
>
>
> we have a number of Reducer task each writing a bunch of rows into the
> latest HBase via Puts.
>
> What is working is that each Reducer only creates Puts for one single
> Region by using HRegionPartionioner.
>
>
>
> However, we are seeing that the Region flush itself is not local, but
> going to some other node in the cluster. This puts load on the network.
>
> We'd like to see that instead the Reducer would be run on the same node
> where the region is served.
>
>
>
> Is that possible?
>
> Any ideas or suggestions?
>
>
>
> Sven
>
>

Re: data locality for reducer writes?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Unfortunately it seems that there's nothing in the OutputFormat
interface that we could implement (like getSplits in the InputFormat)
to inform the JobTracker of the location of the regions. It kinda make
sense, since when you're writing to HDFS in a "normal" MR job you
always write to the local DataNode (well if there's one), but even
then it is replicated to two other nodes. IMO even if we had that the
gain would be marginal.

J-D

On Fri, Apr 8, 2011 at 4:18 AM, Biedermann,S.,Fa. Post Direkt
<S....@postdirekt.de> wrote:
> Hi,
>
>
>
> we have a number of Reducer task each writing a bunch of rows into the
> latest HBase via Puts.
>
> What is working is that each Reducer only creates Puts for one single
> Region by using HRegionPartionioner.
>
>
>
> However, we are seeing that the Region flush itself is not local, but
> going to some other node in the cluster. This puts load on the network.
>
> We'd like to see that instead the Reducer would be run on the same node
> where the region is served.
>
>
>
> Is that possible?
>
> Any ideas or suggestions?
>
>
>
> Sven
>
>

Re: data locality for reducer writes?

Posted by Iulia Zidaru <iu...@1and1.ro>.

  I'm not sure I understand well your problem. Is it possible for you to 
use a combiner class(reducer which runs locally, where the map is executed)?
Iulia

On 04/08/2011 02:18 PM, Biedermann,S.,Fa. Post Direkt wrote:
> Hi,
>
>
>
> we have a number of Reducer task each writing a bunch of rows into the
> latest HBase via Puts.
>
> What is working is that each Reducer only creates Puts for one single
> Region by using HRegionPartionioner.
>
>
>
> However, we are seeing that the Region flush itself is not local, but
> going to some other node in the cluster. This puts load on the network.
>
> We'd like to see that instead the Reducer would be run on the same node
> where the region is served.
>
>
>
> Is that possible?
>
> Any ideas or suggestions?
>
>
>
> Sven
>