You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by dva <dv...@gmail.com> on 2012/08/30 08:26:48 UTC
Re: Occasional regionserver crashes following socket errors writing
to HDFS
Eran Kutner wrote:
>
> Hi,
> We're seeing occasional regionserver crashes during heavy write operations
> to Hbase (at the reduce phase of large M/R jobs). I have increased the
> file
> descriptors, HDFS xceivers, HDFS threads to the recommended settings and
> actually way above.
>
> Here is an example of the HBase log (showing only errors):
>
> 2012-05-10 03:34:54,291 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception for block
> blk_-8928911185099340956_5189425java.io.IOException: Bad response 1 for
> block blk_-8928911185099340956_5189425 from datanode 10.1.104.6:50010
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2986)
>
> 2012-05-10 03:34:54,494 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer
> Exception: java.io.InterruptedIOException: Interruped while waiting for IO
> on channel java.nio.channels.SocketChannel[connected
> local=/10.1.104.9:59642remote=/
> 10.1.104.9:50010]. 0 millis timeout left.
> at
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2848)
>
> 2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-8928911185099340956_5189425 bad datanode[2]
> 10.1.104.6:50010
> 2012-05-10 03:34:54,531 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-8928911185099340956_5189425 in pipeline
> 10.1.104.9:50010, 10.1.104.8:50010, 10.1.104.6:50010: bad datanode
> 10.1.104.6:50010
> 2012-05-10 03:48:30,174 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=hadoop1-s09.farm-ny.gigya.com,60020,1336476100422,
> load=(requests=15741, regions=789, usedHeap=6822, maxHeap=7983):
> regionserver:60020-0x2372c0e8a2f0008 regionserver:60020-0x2372c0e8a2f0008
> received expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> java.io.InterruptedIOException: Aborting compaction of store properties in
> region
> gs_users,6155551|QoCW/euBIKuMat/nRC5Xtw==,1334983658004.878522ea91f41cd76b903ea06ccd17f9.
> because user requested stop.
> at
> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:998)
> at
> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:779)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:776)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:721)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:81)
>
>
> This is from 10.1.104.9 (same machine running the region server that
> crashed):
> 2012-05-10 03:31:16,785 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8928911185099340956_5189425 src: /10.1.104.9:59642 dest: /
> 10.1.104.9:50010
> 2012-05-10 03:35:39,000 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Connection reset
> 2012-05-10 03:35:39,052 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
> for block blk_-8928911185099340956_5189425
> java.nio.channels.ClosedByInterruptException
> 2012-05-10 03:35:39,053 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-8928911185099340956_5189425 received exception java.io.IOException:
> Interrupted receiveBlock
> 2012-05-10 03:35:39,055 ERROR
> org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException
> as:hdfs (auth:SIMPLE) cause:java.io.IOException: Block
> blk_-8928911185099340956_5189425 length is 24384000 does not match block
> file length 24449024
> 2012-05-10 03:35:39,055 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 50020, call
> startBlockRecovery(blk_-8928911185099340956_5189425) from
> 10.1.104.8:50251:
> error: java.io.IOException: Block blk_-8928911185099340956_5189425 length
> is 24384000 does not match block file length 24449024
> java.io.IOException: Block blk_-8928911185099340956_5189425 length is
> 24384000 does not match block file length 24449024
> 2012-05-10 03:35:39,077 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Broken pipe
> 2012-05-10 03:35:39,077 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
> 2012-05-10 03:35:39,108 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
> 2012-05-10 03:35:39,136 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
> 2012-05-10 03:35:39,165 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
> 2012-05-10 03:35:39,196 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
> 2012-05-10 03:35:39,221 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
> 2012-05-10 03:35:39,246 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
> 2012-05-10 03:35:39,271 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
> 2012-05-10 03:35:39,296 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_-8928911185099340956_5189425 2 Exception java.net.SocketException:
> Socket closed
>
> This is the log from 10.1.104.6 datanode for
> "blk_-8928911185099340956_5189425":
> 2012-05-10 03:31:16,772 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8928911185099340956_5189425 src: /10.1.104.8:43828 dest: /
> 10.1.104.6:50010
> 2012-05-10 03:35:39,041 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
> for block blk_-8928911185099340956_5189425 java.net.SocketException:
> Connection reset
> 2012-05-10 03:35:39,043 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_-8928911185099340956_5189425 Interrupted.
> 2012-05-10 03:35:39,043 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_-8928911185099340956_5189425 terminating
> 2012-05-10 03:35:39,043 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-8928911185099340956_5189425 received exception
> java.net.SocketException: Connection reset
>
>
> Any idea why is this happening?
>
> Thanks.
>
> -eran
>
>
I wrote following program
using namespace std;
#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"
#include "libpq-fe.h"
extern "C" { #include "traverser.h"}
class IndexMap:public HadoopPipes::Mapper {
public: IndexMap(HadoopPipes::TaskContext & context) { }
void map(HadoopPipes::MapContext & context) {
std::vector<std::string> paths =
HadoopUtils::splitString(context.getInputValue(), "/r/n"); unsigned
int k = 4;
unsigned int l = 0;
string concatpaths[k];
if (paths.size() % k == 0)
{
for (unsigned int i = 0; i < k; ++i)
{
concatpaths[i] = paths[l];
l = l + paths.size() / k;
}
for (unsigned int i = 0; i < k; ++i)
{
for (unsigned int j = 1; j < paths.size() / k; ++j)
{
concatpaths[i] = +" " + paths[i * paths.size()
/ k + j];
}
}
}
else
{
l = 0;
for (unsigned int i = 0; i < k; ++i)
{
concatpaths[i] = paths[l];
l = l + paths.size() / (k - 1);
}
for (unsigned int i = 0; i < k - 1; ++i)
{
for (unsigned int j = 1; j < paths.size() / (k -
1); ++j)
{
concatpaths[i] = +" " + paths[i *
paths.size() / (k - 1)+j];
}
}
for (unsigned int j = 1; j < paths.size() - paths.size() / (k -
1) * (k - 1); ++j)
{
concatpaths[k - 1] = +" " + paths[(k - 1) *
paths.size() / (k - 1) + j];
}
for (unsigned int i = 0; i < k; ++i)
{
context.emit(concatpaths[i], "0");
}
}
}
};
class IndexReduce:public HadoopPipes::Reducer {
public: IndexReduce(HadoopPipes::TaskContext & context) { }
void reduce(HadoopPipes::ReduceContext & context) {
long int count = 0;
long int result = 0;
std::vector<std::string> processedpaths =
HadoopUtils::splitString(context.getInputValue(), " ");
result =
Configure("/export/hadoop-1.0.1/src/c++/hadoopc++/src/nsindexer.conf");
for (unsigned int i = 0; i < processedpaths.size(); ++i)
{
count = Traverser(processedpaths[i].c_str());
}
context.emit(processedpaths[processedpaths.size() - 1],
HadoopUtils::toString(count));
}
};
int main(int argc, char *argv[]){
return HadoopPipes::runTask(HadoopPipes::TemplateFactory<IndexMap,
IndexReduce> ());
}
I started it
root@one: /export/hadoop-1.0.1/bin# ./hadoop pipes -D
hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true
-input paths.txt -output out.txt -program bin/parindex
according to
http://www.google.ru/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=2&cad=rja&ved=0CC0QFjAB&url=http%3A%2F%2Fcs.smith.edu%2Fdftwiki%2Findex.php%2FHadoop_Tutorial_2.2_--_Running_C%252B%252B_Programs_on_Hadoop&ei=Swc_UIyfBoPj4QTC-YBg&usg=AFQjCNGIVyzsLACb65APibo_hafV0UnQ-w&sig2=2BxaE1JSiKVWq0TdTaTpHw.
I got the following error
12/08/29 08:02:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/08/29 08:02:10 WARN mapred.JobClient: No job jar file set. User classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
12/08/29 08:02:10 INFO mapred.JobClient: Cleaning up the staging area
file:/tmp/hadoop-root/mapred/staging/root-2093574148/.staging/job_local_0001
12/08/29 08:02:10 ERROR security.UserGroupInformation:
PriviledgedActionException as:root
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
file:/export/hadoop-1.0.1/bin/out.txt already exists
Exception in thread "main"
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
file:/export/hadoop-1.0.1/bin/out.txt already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs
(FileOutputFormat.java:121)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)
--
View this message in context: http://old.nabble.com/Occasional-regionserver-crashes-following-socket-errors-writing-to-HDFS-tp33825488p34348273.html
Sent from the HBase User mailing list archive at Nabble.com.
Re: Occasional regionserver crashes following socket errors writing
to HDFS
Posted by Stack <st...@duboce.net>.
On Wed, Aug 29, 2012 at 11:26 PM, dva <dv...@gmail.com> wrote:
> 12/08/29 08:02:10 ERROR security.UserGroupInformation:
> PriviledgedActionException as:root
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> file:/export/hadoop-1.0.1/bin/out.txt already exists
Your message is hard to read. Your problem seems pretty basic. See
the above. Fix that (move aside te file) and retry?
St.Ack