You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by David Koch <og...@googlemail.com> on 2013/07/12 12:09:52 UTC

HBase issues since upgrade from 0.92.4 to 0.94.6

Hello,

NOTE: I posted the same message in the the Cloudera group.

Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we
systematically experience problems with region servers crashing silently
under workloads which used to pass without problems. More specifically, we
run about 30 Mapper jobs in parallel which read from HDFS and insert in
HBase.

region server log
NOTE: no trace of crash, but server is down and shows up as such in
Cloudera Manager.

2013-07-12 10:22:12,050 WARN
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
might be still open, length is 0
2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
Recovering file
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
t%2C60020%2C1373616547696.1373617004286
2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
Finished lease recover attempt for
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
...
2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
< -- last log entry, region server is down here -- >


datanode log, same machine

2013-07-12 10:22:04,811 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: XXXXXXX:50010:DataXceiver
error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
/XXX.XX.XXX.XX:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:724)
< -- many repetitions of this -- >

What could have caused this difference in stability?

We did not change any configuration settings with respect to the previous
CDH 4.0.1 setup. In particular, we left ulimit and
dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete
log/configuration information.

Thank you,

/David

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by Azuryy Yu <az...@gmail.com>.

David,

you can set -Xmx1g  if your JDK is 6 or above. dont need to set specify
bytes.
On Jul 13, 2013 12:16 AM, "David Koch" <og...@googlemail.com> wrote:

> Hello,
>
> This is the command that is used to launch the region servers:
>
> /usr/java/jdk1.7.0_25/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
> -Djava.net.preferIPv4Stack=true -Xmx1073741824 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -Dhbase.log.dir=/var/log/hbase
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-big-4.ezakus.net.log.out
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase
> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA -Djava.library.path=<... libs
> ...>
>
> so garbage collection logging is not activated it seems. I can try and
> re-launch with the -verbose:gc flag
>
> All HBase settings are left at their (CDH 4.3) default for example:
> hfile.block.cache.size=0.25
> hbase.hregion.max.filesize=1GB
>
> except:
> hbase.hregion.majorcompaction=0
>
> speculative execution is off.
>
> The only solution we have found so far is lowering the workload by running
> less jobs in parallel.
>
> /David
>
>
> On Fri, Jul 12, 2013 at 1:48 PM, Azuryy Yu <az...@gmail.com> wrote:
>
> > I do think your JVM on the RS crashed. do you have GC log?
> >
> > do you set MR *mapred*.map.tasks.*speculative.execution=false *when you
> > using map jobs to read or write HBASE?
> >
> > and if you have a heavy read/write load, how did you tune the HBase? such
> > as block cache size, compaction, memstore etc.
> >
> >
> > On Fri, Jul 12, 2013 at 7:42 PM, David Koch <og...@googlemail.com>
> wrote:
> >
> > > Thank you for your responses. With respect to the version of Java I
> found
> > > that Cloudera recommend<
> > >
> >
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
> > > >1.7.x
> > > for CDH4.3.
> > >
> > >
> > > On Fri, Jul 12, 2013 at 1:32 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Might want to run memtest also, just to be sure there is no memory
> > issue.
> > > > It should not since it was working fine with 0.92.4, but costs
> > nothing...
> > > >
> > > > the last version of Java 6 is 45... Might also worst to give it a try
> > if
> > > > you are running with 1.6.
> > > >
> > > > 2013/7/12 Asaf Mesika <as...@gmail.com>
> > > >
> > > > > You need to see the jvm crash in .out log file and see if maybe its
> > the
> > > > .so
> > > > > native Hadoop code that making the problem. In our case we
> > > > > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
> > > > >
> > > > >
> > > > > On Friday, July 12, 2013, David Koch wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > NOTE: I posted the same message in the the Cloudera group.
> > > > > >
> > > > > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase
> > 0.94.6)
> > > > we
> > > > > > systematically experience problems with region servers crashing
> > > > silently
> > > > > > under workloads which used to pass without problems. More
> > > specifically,
> > > > > we
> > > > > > run about 30 Mapper jobs in parallel which read from HDFS and
> > insert
> > > in
> > > > > > HBase.
> > > > > >
> > > > > > region server log
> > > > > > NOTE: no trace of crash, but server is down and shows up as such
> in
> > > > > > Cloudera Manager.
> > > > > >
> > > > > > 2013-07-12 10:22:12,050 WARN
> > > > > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > > might be still open, length is 0
> > > > > > 2013-07-12 10:22:12,051 INFO
> > > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > > Recovering file
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > > > > > t%2C60020%2C1373616547696.1373617004286
> > > > > > 2013-07-12 10:22:13,064 INFO
> > > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > > Finished lease recover attempt for
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > > 2013-07-12 10:22:14,819 INFO
> > org.apache.hadoop.io.compress.CodecPool:
> > > > Got
> > > > > > brand-new compressor [.deflate]
> > > > > > 2013-07-12 10:22:14,824 INFO
> > org.apache.hadoop.io.compress.CodecPool:
> > > > Got
> > > > > > brand-new compressor [.deflate]
> > > > > > ...
> > > > > > 2013-07-12 10:22:14,850 INFO
> > org.apache.hadoop.io.compress.CodecPool:
> > > > Got
> > > > > > brand-new compressor [.deflate]
> > > > > > 2013-07-12 10:22:15,530 INFO
> > org.apache.hadoop.io.compress.CodecPool:
> > > > Got
> > > > > > brand-new compressor [.deflate]
> > > > > > < -- last log entry, region server is down here -- >
> > > > > >
> > > > > >
> > > > > > datanode log, same machine
> > > > > >
> > > > > > 2013-07-12 10:22:04,811 ERROR
> > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > > XXXXXXX:50010:DataXceiver
> > > > > > error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024
> > > dest:
> > > > > > /XXX.XX.XXX.XX:50010
> > > > > > java.io.IOException: Premature EOF from inputStream
> > > > > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > > > > > at java.lang.Thread.run(Thread.java:724)
> > > > > > < -- many repetitions of this -- >
> > > > > >
> > > > > > What could have caused this difference in stability?
> > > > > >
> > > > > > We did not change any configuration settings with respect to the
> > > > previous
> > > > > > CDH 4.0.1 setup. In particular, we left ulimit and
> > > > > > dfs.datanode.max.xcievers at 32k. If need be, I can provide more
> > > > complete
> > > > > > log/configuration information.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > /David
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by David Koch <og...@googlemail.com>.

In the end we increased the heap allocation for HBase region servers to 4GB
(from it's default 1GB) and it seems to work now.


On Mon, Jul 15, 2013 at 1:28 PM, Jamal B <jm...@gmail.com> wrote:

> I believe that your workload after the upgrade caused the process to exceed
> it's 1 GB memory allocation, and your jvm flag -XX:OnOutOfMemoryError=kill
> -9 %p worked as expected and killed it.  I would remove the kill hook, or
> at least put out some sort of log entry to the syslog or something before
> it kills the pid, otherwise you have no log entry to point back to when the
> pid abruptly dies, like in this case.
>
> Also, someone please correct me if I'm wrong, but I thought that the
> hbase.hregion.max.filesize config property does not enforce the max size of
> a region, but only a max size before compaction is required.
>
>
> On Fri, Jul 12, 2013 at 12:15 PM, David Koch <og...@googlemail.com>
> wrote:
>
> > Hello,
> >
> > This is the command that is used to launch the region servers:
> >
> > /usr/java/jdk1.7.0_25/bin/java -XX:OnOutOfMemoryError=kill -9 %p
> -Xmx1000m
> > -Djava.net.preferIPv4Stack=true -Xmx1073741824 -XX:+UseParNewGC
> > -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> > -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> > -Dhbase.log.dir=/var/log/hbase
> > -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-big-4.ezakus.net.log.out
> >
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase
> > -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA -Djava.library.path=<...
> libs
> > ...>
> >
> > so garbage collection logging is not activated it seems. I can try and
> > re-launch with the -verbose:gc flag
> >
> > All HBase settings are left at their (CDH 4.3) default for example:
> > hfile.block.cache.size=0.25
> > hbase.hregion.max.filesize=1GB
> >
> > except:
> > hbase.hregion.majorcompaction=0
> >
> > speculative execution is off.
> >
> > The only solution we have found so far is lowering the workload by
> running
> > less jobs in parallel.
> >
> > /David
> >
> >
> > On Fri, Jul 12, 2013 at 1:48 PM, Azuryy Yu <az...@gmail.com> wrote:
> >
> > > I do think your JVM on the RS crashed. do you have GC log?
> > >
> > > do you set MR *mapred*.map.tasks.*speculative.execution=false *when you
> > > using map jobs to read or write HBASE?
> > >
> > > and if you have a heavy read/write load, how did you tune the HBase?
> such
> > > as block cache size, compaction, memstore etc.
> > >
> > >
> > > On Fri, Jul 12, 2013 at 7:42 PM, David Koch <og...@googlemail.com>
> > wrote:
> > >
> > > > Thank you for your responses. With respect to the version of Java I
> > found
> > > > that Cloudera recommend<
> > > >
> > >
> >
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
> > > > >1.7.x
> > > > for CDH4.3.
> > > >
> > > >
> > > > On Fri, Jul 12, 2013 at 1:32 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Might want to run memtest also, just to be sure there is no memory
> > > issue.
> > > > > It should not since it was working fine with 0.92.4, but costs
> > > nothing...
> > > > >
> > > > > the last version of Java 6 is 45... Might also worst to give it a
> try
> > > if
> > > > > you are running with 1.6.
> > > > >
> > > > > 2013/7/12 Asaf Mesika <as...@gmail.com>
> > > > >
> > > > > > You need to see the jvm crash in .out log file and see if maybe
> its
> > > the
> > > > > .so
> > > > > > native Hadoop code that making the problem. In our case we
> > > > > > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
> > > > > >
> > > > > >
> > > > > > On Friday, July 12, 2013, David Koch wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > NOTE: I posted the same message in the the Cloudera group.
> > > > > > >
> > > > > > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase
> > > 0.94.6)
> > > > > we
> > > > > > > systematically experience problems with region servers crashing
> > > > > silently
> > > > > > > under workloads which used to pass without problems. More
> > > > specifically,
> > > > > > we
> > > > > > > run about 30 Mapper jobs in parallel which read from HDFS and
> > > insert
> > > > in
> > > > > > > HBase.
> > > > > > >
> > > > > > > region server log
> > > > > > > NOTE: no trace of crash, but server is down and shows up as
> such
> > in
> > > > > > > Cloudera Manager.
> > > > > > >
> > > > > > > 2013-07-12 10:22:12,050 WARN
> > > > > > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > > > might be still open, length is 0
> > > > > > > 2013-07-12 10:22:12,051 INFO
> > > > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > > > Recovering file
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > > > > > > t%2C60020%2C1373616547696.1373617004286
> > > > > > > 2013-07-12 10:22:13,064 INFO
> > > > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > > > Finished lease recover attempt for
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > > > 2013-07-12 10:22:14,819 INFO
> > > org.apache.hadoop.io.compress.CodecPool:
> > > > > Got
> > > > > > > brand-new compressor [.deflate]
> > > > > > > 2013-07-12 10:22:14,824 INFO
> > > org.apache.hadoop.io.compress.CodecPool:
> > > > > Got
> > > > > > > brand-new compressor [.deflate]
> > > > > > > ...
> > > > > > > 2013-07-12 10:22:14,850 INFO
> > > org.apache.hadoop.io.compress.CodecPool:
> > > > > Got
> > > > > > > brand-new compressor [.deflate]
> > > > > > > 2013-07-12 10:22:15,530 INFO
> > > org.apache.hadoop.io.compress.CodecPool:
> > > > > Got
> > > > > > > brand-new compressor [.deflate]
> > > > > > > < -- last log entry, region server is down here -- >
> > > > > > >
> > > > > > >
> > > > > > > datanode log, same machine
> > > > > > >
> > > > > > > 2013-07-12 10:22:04,811 ERROR
> > > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > > > XXXXXXX:50010:DataXceiver
> > > > > > > error processing WRITE_BLOCK operation  src:
> /YYY.YY.YYY.YY:36024
> > > > dest:
> > > > > > > /XXX.XX.XXX.XX:50010
> > > > > > > java.io.IOException: Premature EOF from inputStream
> > > > > > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > > > > > > at java.lang.Thread.run(Thread.java:724)
> > > > > > > < -- many repetitions of this -- >
> > > > > > >
> > > > > > > What could have caused this difference in stability?
> > > > > > >
> > > > > > > We did not change any configuration settings with respect to
> the
> > > > > previous
> > > > > > > CDH 4.0.1 setup. In particular, we left ulimit and
> > > > > > > dfs.datanode.max.xcievers at 32k. If need be, I can provide
> more
> > > > > complete
> > > > > > > log/configuration information.
> > > > > > >
> > > > > > > Thank you,
> > > > > > >
> > > > > > > /David
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by Jamal B <jm...@gmail.com>.

I believe that your workload after the upgrade caused the process to exceed
it's 1 GB memory allocation, and your jvm flag -XX:OnOutOfMemoryError=kill
-9 %p worked as expected and killed it.  I would remove the kill hook, or
at least put out some sort of log entry to the syslog or something before
it kills the pid, otherwise you have no log entry to point back to when the
pid abruptly dies, like in this case.

Also, someone please correct me if I'm wrong, but I thought that the
hbase.hregion.max.filesize config property does not enforce the max size of
a region, but only a max size before compaction is required.


On Fri, Jul 12, 2013 at 12:15 PM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> This is the command that is used to launch the region servers:
>
> /usr/java/jdk1.7.0_25/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
> -Djava.net.preferIPv4Stack=true -Xmx1073741824 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -Dhbase.log.dir=/var/log/hbase
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-big-4.ezakus.net.log.out
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase
> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA -Djava.library.path=<... libs
> ...>
>
> so garbage collection logging is not activated it seems. I can try and
> re-launch with the -verbose:gc flag
>
> All HBase settings are left at their (CDH 4.3) default for example:
> hfile.block.cache.size=0.25
> hbase.hregion.max.filesize=1GB
>
> except:
> hbase.hregion.majorcompaction=0
>
> speculative execution is off.
>
> The only solution we have found so far is lowering the workload by running
> less jobs in parallel.
>
> /David
>
>
> On Fri, Jul 12, 2013 at 1:48 PM, Azuryy Yu <az...@gmail.com> wrote:
>
> > I do think your JVM on the RS crashed. do you have GC log?
> >
> > do you set MR *mapred*.map.tasks.*speculative.execution=false *when you
> > using map jobs to read or write HBASE?
> >
> > and if you have a heavy read/write load, how did you tune the HBase? such
> > as block cache size, compaction, memstore etc.
> >
> >
> > On Fri, Jul 12, 2013 at 7:42 PM, David Koch <og...@googlemail.com>
> wrote:
> >
> > > Thank you for your responses. With respect to the version of Java I
> found
> > > that Cloudera recommend<
> > >
> >
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
> > > >1.7.x
> > > for CDH4.3.
> > >
> > >
> > > On Fri, Jul 12, 2013 at 1:32 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Might want to run memtest also, just to be sure there is no memory
> > issue.
> > > > It should not since it was working fine with 0.92.4, but costs
> > nothing...
> > > >
> > > > the last version of Java 6 is 45... Might also worst to give it a try
> > if
> > > > you are running with 1.6.
> > > >
> > > > 2013/7/12 Asaf Mesika <as...@gmail.com>
> > > >
> > > > > You need to see the jvm crash in .out log file and see if maybe its
> > the
> > > > .so
> > > > > native Hadoop code that making the problem. In our case we
> > > > > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
> > > > >
> > > > >
> > > > > On Friday, July 12, 2013, David Koch wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > NOTE: I posted the same message in the the Cloudera group.
> > > > > >
> > > > > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase
> > 0.94.6)
> > > > we
> > > > > > systematically experience problems with region servers crashing
> > > > silently
> > > > > > under workloads which used to pass without problems. More
> > > specifically,
> > > > > we
> > > > > > run about 30 Mapper jobs in parallel which read from HDFS and
> > insert
> > > in
> > > > > > HBase.
> > > > > >
> > > > > > region server log
> > > > > > NOTE: no trace of crash, but server is down and shows up as such
> in
> > > > > > Cloudera Manager.
> > > > > >
> > > > > > 2013-07-12 10:22:12,050 WARN
> > > > > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > > might be still open, length is 0
> > > > > > 2013-07-12 10:22:12,051 INFO
> > > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > > Recovering file
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > > > > > t%2C60020%2C1373616547696.1373617004286
> > > > > > 2013-07-12 10:22:13,064 INFO
> > > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > > Finished lease recover attempt for
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > > 2013-07-12 10:22:14,819 INFO
> > org.apache.hadoop.io.compress.CodecPool:
> > > > Got
> > > > > > brand-new compressor [.deflate]
> > > > > > 2013-07-12 10:22:14,824 INFO
> > org.apache.hadoop.io.compress.CodecPool:
> > > > Got
> > > > > > brand-new compressor [.deflate]
> > > > > > ...
> > > > > > 2013-07-12 10:22:14,850 INFO
> > org.apache.hadoop.io.compress.CodecPool:
> > > > Got
> > > > > > brand-new compressor [.deflate]
> > > > > > 2013-07-12 10:22:15,530 INFO
> > org.apache.hadoop.io.compress.CodecPool:
> > > > Got
> > > > > > brand-new compressor [.deflate]
> > > > > > < -- last log entry, region server is down here -- >
> > > > > >
> > > > > >
> > > > > > datanode log, same machine
> > > > > >
> > > > > > 2013-07-12 10:22:04,811 ERROR
> > > > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > > XXXXXXX:50010:DataXceiver
> > > > > > error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024
> > > dest:
> > > > > > /XXX.XX.XXX.XX:50010
> > > > > > java.io.IOException: Premature EOF from inputStream
> > > > > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > > > > > at java.lang.Thread.run(Thread.java:724)
> > > > > > < -- many repetitions of this -- >
> > > > > >
> > > > > > What could have caused this difference in stability?
> > > > > >
> > > > > > We did not change any configuration settings with respect to the
> > > > previous
> > > > > > CDH 4.0.1 setup. In particular, we left ulimit and
> > > > > > dfs.datanode.max.xcievers at 32k. If need be, I can provide more
> > > > complete
> > > > > > log/configuration information.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > /David
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by David Koch <og...@googlemail.com>.

Hello,

This is the command that is used to launch the region servers:

/usr/java/jdk1.7.0_25/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
-Djava.net.preferIPv4Stack=true -Xmx1073741824 -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
-Dhbase.log.dir=/var/log/hbase
-Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-big-4.ezakus.net.log.out
-Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase
-Dhbase.id.str= -Dhbase.root.logger=INFO,RFA -Djava.library.path=<... libs
...>

so garbage collection logging is not activated it seems. I can try and
re-launch with the -verbose:gc flag

All HBase settings are left at their (CDH 4.3) default for example:
hfile.block.cache.size=0.25
hbase.hregion.max.filesize=1GB

except:
hbase.hregion.majorcompaction=0

speculative execution is off.

The only solution we have found so far is lowering the workload by running
less jobs in parallel.

/David


On Fri, Jul 12, 2013 at 1:48 PM, Azuryy Yu <az...@gmail.com> wrote:

> I do think your JVM on the RS crashed. do you have GC log?
>
> do you set MR *mapred*.map.tasks.*speculative.execution=false *when you
> using map jobs to read or write HBASE?
>
> and if you have a heavy read/write load, how did you tune the HBase? such
> as block cache size, compaction, memstore etc.
>
>
> On Fri, Jul 12, 2013 at 7:42 PM, David Koch <og...@googlemail.com> wrote:
>
> > Thank you for your responses. With respect to the version of Java I found
> > that Cloudera recommend<
> >
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
> > >1.7.x
> > for CDH4.3.
> >
> >
> > On Fri, Jul 12, 2013 at 1:32 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Might want to run memtest also, just to be sure there is no memory
> issue.
> > > It should not since it was working fine with 0.92.4, but costs
> nothing...
> > >
> > > the last version of Java 6 is 45... Might also worst to give it a try
> if
> > > you are running with 1.6.
> > >
> > > 2013/7/12 Asaf Mesika <as...@gmail.com>
> > >
> > > > You need to see the jvm crash in .out log file and see if maybe its
> the
> > > .so
> > > > native Hadoop code that making the problem. In our case we
> > > > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
> > > >
> > > >
> > > > On Friday, July 12, 2013, David Koch wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > NOTE: I posted the same message in the the Cloudera group.
> > > > >
> > > > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase
> 0.94.6)
> > > we
> > > > > systematically experience problems with region servers crashing
> > > silently
> > > > > under workloads which used to pass without problems. More
> > specifically,
> > > > we
> > > > > run about 30 Mapper jobs in parallel which read from HDFS and
> insert
> > in
> > > > > HBase.
> > > > >
> > > > > region server log
> > > > > NOTE: no trace of crash, but server is down and shows up as such in
> > > > > Cloudera Manager.
> > > > >
> > > > > 2013-07-12 10:22:12,050 WARN
> > > > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > might be still open, length is 0
> > > > > 2013-07-12 10:22:12,051 INFO
> > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > Recovering file
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > > > > t%2C60020%2C1373616547696.1373617004286
> > > > > 2013-07-12 10:22:13,064 INFO
> > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > Finished lease recover attempt for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > 2013-07-12 10:22:14,819 INFO
> org.apache.hadoop.io.compress.CodecPool:
> > > Got
> > > > > brand-new compressor [.deflate]
> > > > > 2013-07-12 10:22:14,824 INFO
> org.apache.hadoop.io.compress.CodecPool:
> > > Got
> > > > > brand-new compressor [.deflate]
> > > > > ...
> > > > > 2013-07-12 10:22:14,850 INFO
> org.apache.hadoop.io.compress.CodecPool:
> > > Got
> > > > > brand-new compressor [.deflate]
> > > > > 2013-07-12 10:22:15,530 INFO
> org.apache.hadoop.io.compress.CodecPool:
> > > Got
> > > > > brand-new compressor [.deflate]
> > > > > < -- last log entry, region server is down here -- >
> > > > >
> > > > >
> > > > > datanode log, same machine
> > > > >
> > > > > 2013-07-12 10:22:04,811 ERROR
> > > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > XXXXXXX:50010:DataXceiver
> > > > > error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024
> > dest:
> > > > > /XXX.XX.XXX.XX:50010
> > > > > java.io.IOException: Premature EOF from inputStream
> > > > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > > > > at java.lang.Thread.run(Thread.java:724)
> > > > > < -- many repetitions of this -- >
> > > > >
> > > > > What could have caused this difference in stability?
> > > > >
> > > > > We did not change any configuration settings with respect to the
> > > previous
> > > > > CDH 4.0.1 setup. In particular, we left ulimit and
> > > > > dfs.datanode.max.xcievers at 32k. If need be, I can provide more
> > > complete
> > > > > log/configuration information.
> > > > >
> > > > > Thank you,
> > > > >
> > > > > /David
> > > > >
> > > >
> > >
> >
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by Azuryy Yu <az...@gmail.com>.

I do think your JVM on the RS crashed. do you have GC log?

do you set MR *mapred*.map.tasks.*speculative.execution=false *when you
using map jobs to read or write HBASE?

and if you have a heavy read/write load, how did you tune the HBase? such
as block cache size, compaction, memstore etc.


On Fri, Jul 12, 2013 at 7:42 PM, David Koch <og...@googlemail.com> wrote:

> Thank you for your responses. With respect to the version of Java I found
> that Cloudera recommend<
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
> >1.7.x
> for CDH4.3.
>
>
> On Fri, Jul 12, 2013 at 1:32 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Might want to run memtest also, just to be sure there is no memory issue.
> > It should not since it was working fine with 0.92.4, but costs nothing...
> >
> > the last version of Java 6 is 45... Might also worst to give it a try if
> > you are running with 1.6.
> >
> > 2013/7/12 Asaf Mesika <as...@gmail.com>
> >
> > > You need to see the jvm crash in .out log file and see if maybe its the
> > .so
> > > native Hadoop code that making the problem. In our case we
> > > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
> > >
> > >
> > > On Friday, July 12, 2013, David Koch wrote:
> > >
> > > > Hello,
> > > >
> > > > NOTE: I posted the same message in the the Cloudera group.
> > > >
> > > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6)
> > we
> > > > systematically experience problems with region servers crashing
> > silently
> > > > under workloads which used to pass without problems. More
> specifically,
> > > we
> > > > run about 30 Mapper jobs in parallel which read from HDFS and insert
> in
> > > > HBase.
> > > >
> > > > region server log
> > > > NOTE: no trace of crash, but server is down and shows up as such in
> > > > Cloudera Manager.
> > > >
> > > > 2013-07-12 10:22:12,050 WARN
> > > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > might be still open, length is 0
> > > > 2013-07-12 10:22:12,051 INFO
> org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > Recovering file
> > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > > > t%2C60020%2C1373616547696.1373617004286
> > > > 2013-07-12 10:22:13,064 INFO
> org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > Finished lease recover attempt for
> > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool:
> > Got
> > > > brand-new compressor [.deflate]
> > > > 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool:
> > Got
> > > > brand-new compressor [.deflate]
> > > > ...
> > > > 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool:
> > Got
> > > > brand-new compressor [.deflate]
> > > > 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool:
> > Got
> > > > brand-new compressor [.deflate]
> > > > < -- last log entry, region server is down here -- >
> > > >
> > > >
> > > > datanode log, same machine
> > > >
> > > > 2013-07-12 10:22:04,811 ERROR
> > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > XXXXXXX:50010:DataXceiver
> > > > error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024
> dest:
> > > > /XXX.XX.XXX.XX:50010
> > > > java.io.IOException: Premature EOF from inputStream
> > > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > > > at java.lang.Thread.run(Thread.java:724)
> > > > < -- many repetitions of this -- >
> > > >
> > > > What could have caused this difference in stability?
> > > >
> > > > We did not change any configuration settings with respect to the
> > previous
> > > > CDH 4.0.1 setup. In particular, we left ulimit and
> > > > dfs.datanode.max.xcievers at 32k. If need be, I can provide more
> > complete
> > > > log/configuration information.
> > > >
> > > > Thank you,
> > > >
> > > > /David
> > > >
> > >
> >
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by David Koch <og...@googlemail.com>.

Thank you for your responses. With respect to the version of Java I found
that Cloudera recommend<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html>1.7.x
for CDH4.3.


On Fri, Jul 12, 2013 at 1:32 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Might want to run memtest also, just to be sure there is no memory issue.
> It should not since it was working fine with 0.92.4, but costs nothing...
>
> the last version of Java 6 is 45... Might also worst to give it a try if
> you are running with 1.6.
>
> 2013/7/12 Asaf Mesika <as...@gmail.com>
>
> > You need to see the jvm crash in .out log file and see if maybe its the
> .so
> > native Hadoop code that making the problem. In our case we
> > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
> >
> >
> > On Friday, July 12, 2013, David Koch wrote:
> >
> > > Hello,
> > >
> > > NOTE: I posted the same message in the the Cloudera group.
> > >
> > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6)
> we
> > > systematically experience problems with region servers crashing
> silently
> > > under workloads which used to pass without problems. More specifically,
> > we
> > > run about 30 Mapper jobs in parallel which read from HDFS and insert in
> > > HBase.
> > >
> > > region server log
> > > NOTE: no trace of crash, but server is down and shows up as such in
> > > Cloudera Manager.
> > >
> > > 2013-07-12 10:22:12,050 WARN
> > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > might be still open, length is 0
> > > 2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > Recovering file
> > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > > t%2C60020%2C1373616547696.1373617004286
> > > 2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > Finished lease recover attempt for
> > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new compressor [.deflate]
> > > 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new compressor [.deflate]
> > > ...
> > > 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new compressor [.deflate]
> > > 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new compressor [.deflate]
> > > < -- last log entry, region server is down here -- >
> > >
> > >
> > > datanode log, same machine
> > >
> > > 2013-07-12 10:22:04,811 ERROR
> > > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > XXXXXXX:50010:DataXceiver
> > > error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
> > > /XXX.XX.XXX.XX:50010
> > > java.io.IOException: Premature EOF from inputStream
> > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > > at java.lang.Thread.run(Thread.java:724)
> > > < -- many repetitions of this -- >
> > >
> > > What could have caused this difference in stability?
> > >
> > > We did not change any configuration settings with respect to the
> previous
> > > CDH 4.0.1 setup. In particular, we left ulimit and
> > > dfs.datanode.max.xcievers at 32k. If need be, I can provide more
> complete
> > > log/configuration information.
> > >
> > > Thank you,
> > >
> > > /David
> > >
> >
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by Azuryy Yu <az...@gmail.com>.

David,
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)

for this error, generally client always ask for bytes from the stream, but
sever has been shut down, so there maybe network issue or JVM crashed or
some others. I don't think this is releate to the HBase upgrade.




On Fri, Jul 12, 2013 at 7:32 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Might want to run memtest also, just to be sure there is no memory issue.
> It should not since it was working fine with 0.92.4, but costs nothing...
>
> the last version of Java 6 is 45... Might also worst to give it a try if
> you are running with 1.6.
>
> 2013/7/12 Asaf Mesika <as...@gmail.com>
>
> > You need to see the jvm crash in .out log file and see if maybe its the
> .so
> > native Hadoop code that making the problem. In our case we
> > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
> >
> >
> > On Friday, July 12, 2013, David Koch wrote:
> >
> > > Hello,
> > >
> > > NOTE: I posted the same message in the the Cloudera group.
> > >
> > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6)
> we
> > > systematically experience problems with region servers crashing
> silently
> > > under workloads which used to pass without problems. More specifically,
> > we
> > > run about 30 Mapper jobs in parallel which read from HDFS and insert in
> > > HBase.
> > >
> > > region server log
> > > NOTE: no trace of crash, but server is down and shows up as such in
> > > Cloudera Manager.
> > >
> > > 2013-07-12 10:22:12,050 WARN
> > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > might be still open, length is 0
> > > 2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > Recovering file
> > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > > t%2C60020%2C1373616547696.1373617004286
> > > 2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > Finished lease recover attempt for
> > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new compressor [.deflate]
> > > 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new compressor [.deflate]
> > > ...
> > > 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new compressor [.deflate]
> > > 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > > brand-new compressor [.deflate]
> > > < -- last log entry, region server is down here -- >
> > >
> > >
> > > datanode log, same machine
> > >
> > > 2013-07-12 10:22:04,811 ERROR
> > > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > XXXXXXX:50010:DataXceiver
> > > error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
> > > /XXX.XX.XXX.XX:50010
> > > java.io.IOException: Premature EOF from inputStream
> > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > > at java.lang.Thread.run(Thread.java:724)
> > > < -- many repetitions of this -- >
> > >
> > > What could have caused this difference in stability?
> > >
> > > We did not change any configuration settings with respect to the
> previous
> > > CDH 4.0.1 setup. In particular, we left ulimit and
> > > dfs.datanode.max.xcievers at 32k. If need be, I can provide more
> complete
> > > log/configuration information.
> > >
> > > Thank you,
> > >
> > > /David
> > >
> >
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Might want to run memtest also, just to be sure there is no memory issue.
It should not since it was working fine with 0.92.4, but costs nothing...

the last version of Java 6 is 45... Might also worst to give it a try if
you are running with 1.6.

2013/7/12 Asaf Mesika <as...@gmail.com>

> You need to see the jvm crash in .out log file and see if maybe its the .so
> native Hadoop code that making the problem. In our case we
> Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
>
>
> On Friday, July 12, 2013, David Koch wrote:
>
> > Hello,
> >
> > NOTE: I posted the same message in the the Cloudera group.
> >
> > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we
> > systematically experience problems with region servers crashing silently
> > under workloads which used to pass without problems. More specifically,
> we
> > run about 30 Mapper jobs in parallel which read from HDFS and insert in
> > HBase.
> >
> > region server log
> > NOTE: no trace of crash, but server is down and shows up as such in
> > Cloudera Manager.
> >
> > 2013-07-12 10:22:12,050 WARN
> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > might be still open, length is 0
> > 2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> > Recovering file
> >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > t%2C60020%2C1373616547696.1373617004286
> > 2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> > Finished lease recover attempt for
> >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor [.deflate]
> > 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor [.deflate]
> > ...
> > 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor [.deflate]
> > 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor [.deflate]
> > < -- last log entry, region server is down here -- >
> >
> >
> > datanode log, same machine
> >
> > 2013-07-12 10:22:04,811 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> XXXXXXX:50010:DataXceiver
> > error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
> > /XXX.XX.XXX.XX:50010
> > java.io.IOException: Premature EOF from inputStream
> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > at java.lang.Thread.run(Thread.java:724)
> > < -- many repetitions of this -- >
> >
> > What could have caused this difference in stability?
> >
> > We did not change any configuration settings with respect to the previous
> > CDH 4.0.1 setup. In particular, we left ulimit and
> > dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete
> > log/configuration information.
> >
> > Thank you,
> >
> > /David
> >
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by Asaf Mesika <as...@gmail.com>.

You need to see the jvm crash in .out log file and see if maybe its the .so
native Hadoop code that making the problem. In our case we
Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.


On Friday, July 12, 2013, David Koch wrote:

> Hello,
>
> NOTE: I posted the same message in the the Cloudera group.
>
> Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we
> systematically experience problems with region servers crashing silently
> under workloads which used to pass without problems. More specifically, we
> run about 30 Mapper jobs in parallel which read from HDFS and insert in
> HBase.
>
> region server log
> NOTE: no trace of crash, but server is down and shows up as such in
> Cloudera Manager.
>
> 2013-07-12 10:22:12,050 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
>
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> might be still open, length is 0
> 2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> Recovering file
>
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> t%2C60020%2C1373616547696.1373617004286
> 2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> Finished lease recover attempt for
>
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.deflate]
> 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.deflate]
> ...
> 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.deflate]
> 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.deflate]
> < -- last log entry, region server is down here -- >
>
>
> datanode log, same machine
>
> 2013-07-12 10:22:04,811 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: XXXXXXX:50010:DataXceiver
> error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
> /XXX.XX.XXX.XX:50010
> java.io.IOException: Premature EOF from inputStream
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:724)
> < -- many repetitions of this -- >
>
> What could have caused this difference in stability?
>
> We did not change any configuration settings with respect to the previous
> CDH 4.0.1 setup. In particular, we left ulimit and
> dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete
> log/configuration information.
>
> Thank you,
>
> /David
>

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by lars hofhansl <la...@apache.org>.

Checked now. It is 0.94.6.1




----- Original Message -----
From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: 
Sent: Sunday, July 14, 2013 6:55 AM
Subject: Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Didn't check, but I sincerely hope that CDH 4.3.0 ships with HBase 0.94.6.1 (and not 0.94.6).
________________________________
From: David Koch <og...@googlemail.com>
To: user@hbase.apache.org 
Sent: Friday, July 12, 2013 3:09 AM
Subject: HBase issues since upgrade from 0.92.4 to 0.94.6


Hello,

NOTE: I posted the same message in the the Cloudera group.

Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we
systematically experience problems with region servers crashing silently
under workloads which used to pass without problems. More specifically, we
run about 30 Mapper jobs in parallel which read from HDFS and insert in
HBase.

region server log
NOTE: no trace of crash, but server is down and shows up as such in
Cloudera Manager.

2013-07-12 10:22:12,050 WARN
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
might be still open, length is 0
2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
Recovering file
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
t%2C60020%2C1373616547696.1373617004286
2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
Finished lease recover attempt for
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
...
2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
< -- last log entry, region server is down here -- >


datanode log, same machine

2013-07-12 10:22:04,811 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: XXXXXXX:50010:DataXceiver
error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
/XXX.XX.XXX.XX:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:724)
< -- many repetitions of this -- >

What could have caused this difference in stability?

We did not change any configuration settings with respect to the previous
CDH 4.0.1 setup. In particular, we left ulimit and
dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete
log/configuration information.

Thank you,

/David

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by lars hofhansl <la...@apache.org>.

Didn't check, but I sincerely hope that CDH 4.3.0 ships with HBase 0.94.6.1 (and not 0.94.6).
________________________________
From: David Koch <og...@googlemail.com>
To: user@hbase.apache.org 
Sent: Friday, July 12, 2013 3:09 AM
Subject: HBase issues since upgrade from 0.92.4 to 0.94.6

Hello,

NOTE: I posted the same message in the the Cloudera group.

Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we
systematically experience problems with region servers crashing silently
under workloads which used to pass without problems. More specifically, we
run about 30 Mapper jobs in parallel which read from HDFS and insert in
HBase.

region server log
NOTE: no trace of crash, but server is down and shows up as such in
Cloudera Manager.

2013-07-12 10:22:12,050 WARN
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
might be still open, length is 0
2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
Recovering file
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
t%2C60020%2C1373616547696.1373617004286
2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
Finished lease recover attempt for
hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
...
2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.deflate]
< -- last log entry, region server is down here -- >

datanode log, same machine

2013-07-12 10:22:04,811 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: XXXXXXX:50010:DataXceiver
error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
/XXX.XX.XXX.XX:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:724)
< -- many repetitions of this -- >

What could have caused this difference in stability?

We did not change any configuration settings with respect to the previous
CDH 4.0.1 setup. In particular, we left ulimit and
dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete
log/configuration information.

Thank you,

/David

Re: HBase issues since upgrade from 0.92.4 to 0.94.6

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi David,

I will recommand you to run:
- FSCK from your os (fsck.ext4) on this node;
- FSCK from Hadoop on your HDFS
- HBCK from HBase

Seems your node has some troubles to read something, just want to see if
there is related issues.

JM

2013/7/12 David Koch <og...@googlemail.com>

> Hello,
>
> NOTE: I posted the same message in the the Cloudera group.
>
> Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we
> systematically experience problems with region servers crashing silently
> under workloads which used to pass without problems. More specifically, we
> run about 30 Mapper jobs in parallel which read from HDFS and insert in
> HBase.
>
> region server log
> NOTE: no trace of crash, but server is down and shows up as such in
> Cloudera Manager.
>
> 2013-07-12 10:22:12,050 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
>
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> might be still open, length is 0
> 2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> Recovering file
>
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> t%2C60020%2C1373616547696.1373617004286
> 2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> Finished lease recover attempt for
>
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.deflate]
> 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.deflate]
> ...
> 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.deflate]
> 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor [.deflate]
> < -- last log entry, region server is down here -- >
>
>
> datanode log, same machine
>
> 2013-07-12 10:22:04,811 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: XXXXXXX:50010:DataXceiver
> error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
> /XXX.XX.XXX.XX:50010
> java.io.IOException: Premature EOF from inputStream
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:724)
> < -- many repetitions of this -- >
>
> What could have caused this difference in stability?
>
> We did not change any configuration settings with respect to the previous
> CDH 4.0.1 setup. In particular, we left ulimit and
> dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete
> log/configuration information.
>
> Thank you,
>
> /David
>