You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Lior Schachter <li...@infolinks.com> on 2011/08/14 18:32:02 UTC

M/R vs hbase problem in production

Hi,

cluster details:
hbase 0.90.2. 10 machines. 1GB switch.

use-case
M/R job that inserts about 10 million rows to hbase in the reducer, followed
by M/R that works with hdfs files.
When the first job maps finish the second job maps starts and region server
crushes.
please note, that when running the 2 jobs separately they both finish
successfully.

>From our monitoring we see that when the 2 jobs work together the network
load reaches to our max bandwidth - 1GB.

In the region server log we see these exceptions:
a.
2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4)
from 10.11.87.73:33737: output error
2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at
org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
        at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)

b.
2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_-8181634225601608891_579246java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)

c.
2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
recovery attempt #0 from primary datanode 10.11.87.72:50010
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
blk_-8181634225601608891_579246 is already commited, storedBlock == null.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

        at org.apache.hadoop.ipc.Client.call(Client.java:740)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy4.nextGenerationStamp(Unknown Source)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

        at org.apache.hadoop.ipc.Client.call(Client.java:740)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy9.recoverBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)

Few questions:
1. Can we configure hadoop/hbase not to consume all network resources (e.g.,
to specify upper limit for map/reduce network load)?
2. Should we increase the timeout for open connections ?
3. Can we assign different IPs for data transfer and region quorum check
protocol (zookeeper) ?

Thanks,
Lior

Re: M/R vs hbase problem in production

Posted by Lior Schachter <li...@infolinks.com>.

and in the gc.log of the region server we get CMS failures that cause full
gc (that fails to free memory):

11867.254: [Full GC 11867.254: [CMS: 3712638K->3712638K(3712640K), 4.7779250
secs] 4032614K->4032392K(4057664K), [CMS Perm : 20062K->19883K(33548K)]
icms_dc=100 , 4.7780440 secs] [Times: user=4.76 sys=0.02, real=4.78 secs]
11872.033: [GC [1 CMS-initial-mark: 3712638K(3712640K)] 4032392K(4057664K),
0.0734520 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]
11872.107: [CMS-concurrent-mark-start]
11872.107: [Full GC 11872.107: [CMS11872.693: [CMS-concurrent-mark:
0.584/0.586 secs] [Times: user=2.92 sys=0.00, real=0.59 secs]
 (concurrent mode failure): 3712638K->3712638K(3712640K), 5.3078630 secs]
4032392K->4032392K(4057664K), [CMS Perm : 19883K->19883K(33548K)]
icms_dc=100 , 5.3079940 secs] [Times: user=7.63 sys=0.00, real=5.31 secs]
11877.415: [Full GC 11877.415: [CMS: 3712638K->3712638K(3712640K), 4.6467720
secs] 4032392K->4032392K(4057664K), [CMS Perm : 19883K->19883K(33548K)]
icms_dc=100 , 4.6468910 secs] [Times: user=4.65 sys=0.00, real=4.65 secs]
11882.063: [GC [1 CMS-initial-mark: 3712638K(3712640K)] 4032402K(4057664K),
0.0730580 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]
11882.136: [CMS-concurrent-mark-start]
11882.300: [Full GC 11882.300: [CMS11882.784: [CMS-concurrent-mark:
0.628/0.648 secs] [Times: user=3.79 sys=0.12, real=0.65 secs]
 (concurrent mode failure): 3712638K->3712639K(3712640K), 7.2815000 secs]
4057662K->4044438K(4057664K), [CMS Perm : 20001K->20000K(33548K)]
icms_dc=100 , 7.2816440 secs] [Times: user=9.19 sys=0.01, real=7.28 secs]




On Sun, Aug 14, 2011 at 7:32 PM, Lior Schachter <li...@infolinks.com> wrote:

> Hi,
>
> cluster details:
> hbase 0.90.2. 10 machines. 1GB switch.
>
> use-case
> M/R job that inserts about 10 million rows to hbase in the reducer,
> followed by M/R that works with hdfs files.
> When the first job maps finish the second job maps starts and region server
> crushes.
> please note, that when running the 2 jobs separately they both finish
> successfully.
>
> From our monitoring we see that when the 2 jobs work together the network
> load reaches to our max bandwidth - 1GB.
>
> In the region server log we see these exceptions:
> a.
> 2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4)
> from 10.11.87.73:33737: output error
> 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
>
> b.
> 2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_-8181634225601608891_579246java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)
>
> c.
> 2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> recovery attempt #0 from primary datanode 10.11.87.72:50010
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> blk_-8181634225601608891_579246 is already commited, storedBlock == null.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy4.nextGenerationStamp(Unknown Source)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy9.recoverBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)
>
> Few questions:
> 1. Can we configure hadoop/hbase not to consume all network resources
> (e.g., to specify upper limit for map/reduce network load)?
> 2. Should we increase the timeout for open connections ?
> 3. Can we assign different IPs for data transfer and region quorum check
> protocol (zookeeper) ?
>
> Thanks,
> Lior
>
>
>
>

Re: M/R vs hbase problem in production

Posted by Oleg Ruchovets <or...@gmail.com>.

On Tue, Aug 16, 2011 at 5:50 AM, Michael Segel <mi...@hotmail.com>wrote:

>
> It could be that its the results from the reducer.
>
>     Yes , End result from m/r we persist it to hbase.


> My guess is that he's got an issue where he's over extending his system.
> Sounds like a tuning issue.
>
> How much memory on the system?
>

We have 10 machine grid:
  master has 48G ram
  slaves machine has 16G ram.


> What's being used by HBase?
>
Region Server process has 4G ram
Zookeeper process has 2G ram


> How many reducers, How many mappers?
>

 We have 4map/2reducer per machine


How large is the cache on DN, and how much cache does each job have
> allocated?
>
>
I am not sure that I understand correct DN cache. Where can I see this
parameter?
In case you mean about DataNode java process  , it has 1G ram.

> That's the first place to look.
>
>
Thanks In advance
Oleg.

>
> > From: buttler1@llnl.gov
> > To: user@hbase.apache.org
> > Date: Mon, 15 Aug 2011 13:20:30 -0700
> > Subject: RE: M/R vs hbase problem in production
> >
> > Are you sure you need to use a reducer to put rows into hbase?  You can
> save a lot of time if you can put the rows into hbase directly in the
> mappers.
> >
> > Dave
> >
> > -----Original Message-----
> > From: Lior Schachter [mailto:liors@infolinks.com]
> > Sent: Sunday, August 14, 2011 9:32 AM
> > To: user@hbase.apache.org; mapreduce-user@hadoop.apache.org
> > Subject: M/R vs hbase problem in production
> >
> > Hi,
> >
> > cluster details:
> > hbase 0.90.2. 10 machines. 1GB switch.
> >
> > use-case
> > M/R job that inserts about 10 million rows to hbase in the reducer,
> followed
> > by M/R that works with hdfs files.
> > When the first job maps finish the second job maps starts and region
> server
> > crushes.
> > please note, that when running the 2 jobs separately they both finish
> > successfully.
> >
> > From our monitoring we see that when the 2 jobs work together the network
> > load reaches to our max bandwidth - 1GB.
> >
> > In the region server log we see these exceptions:
> > a.
> > 2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4
> )
> > from 10.11.87.73:33737: output error
> > 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
> >         at
> > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
> >         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
> >
> > b.
> > 2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
> > DFSOutputStream ResponseProcessor exception  for block
> > blk_-8181634225601608891_579246java.io.EOFException
> >         at java.io.DataInputStream.readFully(DataInputStream.java:180)
> >         at java.io.DataInputStream.readLong(DataInputStream.java:399)
> >         at
> >
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)
> >
> > c.
> > 2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> > recovery attempt #0 from primary datanode 10.11.87.72:50010
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> > blk_-8181634225601608891_579246 is already commited, storedBlock == null.
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> >
> >         at org.apache.hadoop.ipc.Client.call(Client.java:740)
> >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >         at $Proxy4.nextGenerationStamp(Unknown Source)
> >         at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
> >         at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
> >         at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> >
> >         at org.apache.hadoop.ipc.Client.call(Client.java:740)
> >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >         at $Proxy9.recoverBlock(Unknown Source)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)
> >
> > Few questions:
> > 1. Can we configure hadoop/hbase not to consume all network resources
> (e.g.,
> > to specify upper limit for map/reduce network load)?
> > 2. Should we increase the timeout for open connections ?
> > 3. Can we assign different IPs for data transfer and region quorum check
> > protocol (zookeeper) ?
> >
> > Thanks,
> > Lior
>
>

RE: M/R vs hbase problem in production

Posted by Michael Segel <mi...@hotmail.com>.

It could be that its the results from the reducer.

My guess is that he's got an issue where he's over extending his system. 
Sounds like a tuning issue.

How much memory on the system? 
What's being used by HBase?
How many reducers, How many mappers?
How large is the cache on DN, and how much cache does each job have allocated?

That's the first place to look.


> From: buttler1@llnl.gov
> To: user@hbase.apache.org
> Date: Mon, 15 Aug 2011 13:20:30 -0700
> Subject: RE: M/R vs hbase problem in production
> 
> Are you sure you need to use a reducer to put rows into hbase?  You can save a lot of time if you can put the rows into hbase directly in the mappers.
> 
> Dave
> 
> -----Original Message-----
> From: Lior Schachter [mailto:liors@infolinks.com] 
> Sent: Sunday, August 14, 2011 9:32 AM
> To: user@hbase.apache.org; mapreduce-user@hadoop.apache.org
> Subject: M/R vs hbase problem in production
> 
> Hi,
> 
> cluster details:
> hbase 0.90.2. 10 machines. 1GB switch.
> 
> use-case
> M/R job that inserts about 10 million rows to hbase in the reducer, followed
> by M/R that works with hdfs files.
> When the first job maps finish the second job maps starts and region server
> crushes.
> please note, that when running the 2 jobs separately they both finish
> successfully.
> 
> From our monitoring we see that when the 2 jobs work together the network
> load reaches to our max bandwidth - 1GB.
> 
> In the region server log we see these exceptions:
> a.
> 2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4)
> from 10.11.87.73:33737: output error
> 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
> 
> b.
> 2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_-8181634225601608891_579246java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)
> 
> c.
> 2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> recovery attempt #0 from primary datanode 10.11.87.72:50010
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> blk_-8181634225601608891_579246 is already commited, storedBlock == null.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> 
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy4.nextGenerationStamp(Unknown Source)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> 
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy9.recoverBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)
> 
> Few questions:
> 1. Can we configure hadoop/hbase not to consume all network resources (e.g.,
> to specify upper limit for map/reduce network load)?
> 2. Should we increase the timeout for open connections ?
> 3. Can we assign different IPs for data transfer and region quorum check
> protocol (zookeeper) ?
> 
> Thanks,
> Lior

Re: M/R vs hbase problem in production

Posted by Lior Schachter <li...@infolinks.com>.

yes I'm sure, the map stage is used to aggregate data for the reduce stage.

On Mon, Aug 15, 2011 at 11:20 PM, Buttler, David <bu...@llnl.gov> wrote:

> Are you sure you need to use a reducer to put rows into hbase?  You can
> save a lot of time if you can put the rows into hbase directly in the
> mappers.
>
> Dave
>
> -----Original Message-----
> From: Lior Schachter [mailto:liors@infolinks.com]
> Sent: Sunday, August 14, 2011 9:32 AM
> To: user@hbase.apache.org; mapreduce-user@hadoop.apache.org
> Subject: M/R vs hbase problem in production
>
> Hi,
>
> cluster details:
> hbase 0.90.2. 10 machines. 1GB switch.
>
> use-case
> M/R job that inserts about 10 million rows to hbase in the reducer,
> followed
> by M/R that works with hdfs files.
> When the first job maps finish the second job maps starts and region server
> crushes.
> please note, that when running the 2 jobs separately they both finish
> successfully.
>
> From our monitoring we see that when the 2 jobs work together the network
> load reaches to our max bandwidth - 1GB.
>
> In the region server log we see these exceptions:
> a.
> 2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4)
> from 10.11.87.73:33737: output error
> 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
>        at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
>        at
>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
>        at
>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
>
> b.
> 2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_-8181634225601608891_579246java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at java.io.DataInputStream.readLong(DataInputStream.java:399)
>        at
>
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)
>
> c.
> 2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> recovery attempt #0 from primary datanode 10.11.87.72:50010
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> blk_-8181634225601608891_579246 is already commited, storedBlock == null.
>        at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
>        at
>
> org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:740)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy4.nextGenerationStamp(Unknown Source)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:740)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy9.recoverBlock(Unknown Source)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)
>
> Few questions:
> 1. Can we configure hadoop/hbase not to consume all network resources
> (e.g.,
> to specify upper limit for map/reduce network load)?
> 2. Should we increase the timeout for open connections ?
> 3. Can we assign different IPs for data transfer and region quorum check
> protocol (zookeeper) ?
>
> Thanks,
> Lior
>

RE: M/R vs hbase problem in production

Posted by "Buttler, David" <bu...@llnl.gov>.

Are you sure you need to use a reducer to put rows into hbase?  You can save a lot of time if you can put the rows into hbase directly in the mappers.

Dave

-----Original Message-----
From: Lior Schachter [mailto:liors@infolinks.com] 
Sent: Sunday, August 14, 2011 9:32 AM
To: user@hbase.apache.org; mapreduce-user@hadoop.apache.org
Subject: M/R vs hbase problem in production

Hi,

cluster details:
hbase 0.90.2. 10 machines. 1GB switch.

use-case
M/R job that inserts about 10 million rows to hbase in the reducer, followed
by M/R that works with hdfs files.
When the first job maps finish the second job maps starts and region server
crushes.
please note, that when running the 2 jobs separately they both finish
successfully.

>From our monitoring we see that when the 2 jobs work together the network
load reaches to our max bandwidth - 1GB.

In the region server log we see these exceptions:
a.
2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4)
from 10.11.87.73:33737: output error
2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at
org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
        at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)

b.
2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_-8181634225601608891_579246java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)

c.
2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
recovery attempt #0 from primary datanode 10.11.87.72:50010
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
blk_-8181634225601608891_579246 is already commited, storedBlock == null.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

        at org.apache.hadoop.ipc.Client.call(Client.java:740)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy4.nextGenerationStamp(Unknown Source)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

        at org.apache.hadoop.ipc.Client.call(Client.java:740)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy9.recoverBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)

Few questions:
1. Can we configure hadoop/hbase not to consume all network resources (e.g.,
to specify upper limit for map/reduce network load)?
2. Should we increase the timeout for open connections ?
3. Can we assign different IPs for data transfer and region quorum check
protocol (zookeeper) ?

Thanks,
Lior

Re: M/R vs hbase problem in production

Posted by Lior Schachter <li...@infolinks.com>.

and in the gc.log of the region server we get CMS failures that cause full
gc (that fails to free memory):

11867.254: [Full GC 11867.254: [CMS: 3712638K->3712638K(3712640K), 4.7779250
secs] 4032614K->4032392K(4057664K), [CMS Perm : 20062K->19883K(33548K)]
icms_dc=100 , 4.7780440 secs] [Times: user=4.76 sys=0.02, real=4.78 secs]
11872.033: [GC [1 CMS-initial-mark: 3712638K(3712640K)] 4032392K(4057664K),
0.0734520 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]
11872.107: [CMS-concurrent-mark-start]
11872.107: [Full GC 11872.107: [CMS11872.693: [CMS-concurrent-mark:
0.584/0.586 secs] [Times: user=2.92 sys=0.00, real=0.59 secs]
 (concurrent mode failure): 3712638K->3712638K(3712640K), 5.3078630 secs]
4032392K->4032392K(4057664K), [CMS Perm : 19883K->19883K(33548K)]
icms_dc=100 , 5.3079940 secs] [Times: user=7.63 sys=0.00, real=5.31 secs]
11877.415: [Full GC 11877.415: [CMS: 3712638K->3712638K(3712640K), 4.6467720
secs] 4032392K->4032392K(4057664K), [CMS Perm : 19883K->19883K(33548K)]
icms_dc=100 , 4.6468910 secs] [Times: user=4.65 sys=0.00, real=4.65 secs]
11882.063: [GC [1 CMS-initial-mark: 3712638K(3712640K)] 4032402K(4057664K),
0.0730580 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]
11882.136: [CMS-concurrent-mark-start]
11882.300: [Full GC 11882.300: [CMS11882.784: [CMS-concurrent-mark:
0.628/0.648 secs] [Times: user=3.79 sys=0.12, real=0.65 secs]
 (concurrent mode failure): 3712638K->3712639K(3712640K), 7.2815000 secs]
4057662K->4044438K(4057664K), [CMS Perm : 20001K->20000K(33548K)]
icms_dc=100 , 7.2816440 secs] [Times: user=9.19 sys=0.01, real=7.28 secs]




On Sun, Aug 14, 2011 at 7:32 PM, Lior Schachter <li...@infolinks.com> wrote:

> Hi,
>
> cluster details:
> hbase 0.90.2. 10 machines. 1GB switch.
>
> use-case
> M/R job that inserts about 10 million rows to hbase in the reducer,
> followed by M/R that works with hdfs files.
> When the first job maps finish the second job maps starts and region server
> crushes.
> please note, that when running the 2 jobs separately they both finish
> successfully.
>
> From our monitoring we see that when the 2 jobs work together the network
> load reaches to our max bandwidth - 1GB.
>
> In the region server log we see these exceptions:
> a.
> 2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4)
> from 10.11.87.73:33737: output error
> 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
>
> b.
> 2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_-8181634225601608891_579246java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)
>
> c.
> 2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> recovery attempt #0 from primary datanode 10.11.87.72:50010
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> blk_-8181634225601608891_579246 is already commited, storedBlock == null.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy4.nextGenerationStamp(Unknown Source)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy9.recoverBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)
>
> Few questions:
> 1. Can we configure hadoop/hbase not to consume all network resources
> (e.g., to specify upper limit for map/reduce network load)?
> 2. Should we increase the timeout for open connections ?
> 3. Can we assign different IPs for data transfer and region quorum check
> protocol (zookeeper) ?
>
> Thanks,
> Lior
>
>
>
>