You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/08/03 15:33:18 UTC

Never ending distributed log split

Hi,

I'm using HBase 0.94.0.

I stopped the cluster for some maintenance, and I'm have some troubles
to restart it.

I'm getting one line every about

Start Time 	Description 	State 	Status
Fri Aug 03 08:59:54 EDT 2012 	Doing distributed log split in
[hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
	RUNNING (since 3sec ago) 	Waiting for distributed tasks to finish.
scheduled=1 done=0 error=0 (since 0sec ago)

If I let it run, it will run like that for hours. Adding lines and
lines and lines until I stop it.


On the master logs, I can see that:
2012-08-03 09:02:49,788 INFO
org.apache.hadoop.hbase.master.SplitLogManager: task
/hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
entered state err node4,60020,1343998592129
2012-08-03 09:02:49,788 WARN
org.apache.hadoop.hbase.master.SplitLogManager: Error splitting
/hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
2012-08-03 09:02:49,788 WARN
org.apache.hadoop.hbase.master.SplitLogManager: error while splitting
logs in [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
installed = 1 but only 0 done
2012-08-03 09:02:49,788 WARN
org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of
[latitude,60020,1343908057839, latitude,60020,1343998595290,
node1,60020,1343908057567, node1,60020,1343939284240,
node1,60020,1343998593757, node2,60020,1343908059614,
node2,60020,1343939286369, node2,60020,1343998595830,
node3,60020,1343908054414, node3,60020,1343939282294,
node3,60020,1343998590612, node4,60020,1343908056186,
node4,60020,1343939282889, node4,60020,1343998592129,
node5,60020,1343908059158, node5,60020,1343998594856,
phenom,60020,1343908053256, phenom,60020,1343939281065,
phenom,60020,1343998580375]
java.io.IOException: error or interrupt while splitting logs in
[hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
Task = installed = 1 done = 0 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:269)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:277)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:219)
        at org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:577)
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:522)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:343)
        at java.lang.Thread.run(Thread.java:722)
2012-08-03 09:02:49,891 DEBUG
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback:
deleted /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297

I would like to try with 0.94.1 but I don't know where to find the
files. Does any one have any idea where this is coming from and where
I can found 0.94.1RC1?

Thanks,

JM

Re: Never ending distributed log split

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
2012/8/3, Jean-Daniel Cryans <jd...@apache.org>:
> On Fri, Aug 3, 2012 at 8:15 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>> Me again ;)
>>
>> I did some more investigation.
>
> It would really help to see the region server log although the fsck
> output might be enough.

I looked under evey directory and only one is containing a file.

http://pastebin.com/8Fea2EnA

It seems to be related to node1. On this server, seems that everything
is started correctly:
hadoop@node1:~$ /usr/local/jdk1.7.0_05/bin/jps
2211 DataNode
2938 Jps
2136 TaskTracker

hbase@node1:~$ /usr/local/jdk1.7.0_05/bin/jps
2419 HRegionServer
3708 Jps

On the Node1 region server logs, I can see the same information, which
is, the file is not hosted anywhere.

2012-08-03 15:01:31,216 WARN org.apache.hadoop.hdfs.DFSClient: DFS
Read: java.io.IOException: Could not obtain block:
blk_4965382127800577452_15852
file=/hbase/.logs/node1,60020,1343908057567-splitting/node1%2C60020%2C1343908057567.1343914548297
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2266)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2060)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2221)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readFully(DataInputStream.java:169)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:688)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:850)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:763)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:384)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:351)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:266)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
        at java.lang.Thread.run(Thread.java:722)

> BTW you'll find 0.94.1 RC1 here:
> http://people.apache.org/~larsh/hbase-0.94.1-rc1/

Super, thanks! I will most probably try it instead of the 0.94.0


>> And I found that:
>>
>> http://pastebin.com/Bedm6Ldy
>>
>> Seems that no region is serving my logs. That's strange because all my
>> servers are up and fsck is telling me that FS is clean.
>
> I don't get the "Seems that no region is serving my logs" part. A
> region doesn't serve logs, it serves HFiles. You meant to say
> DataNode?

I was talking about the files under /hbase/.logs . Base on the
directory name I thought it was some logs. What ever this file is
supposed to be for, it seems it's not served by any datanode.


>> Can I just delete those files? What's the impact of such delete? I
>> don't really worrie about loosing some data. It's a test environment.
>> But I really need it to start again.
>
> I wonder if it's related to:
> https://issues.apache.org/jira/browse/HBASE-6401
>
> Did you remove a datanode from the cluster as part of the maintenance?

It might be related to this Jira. You, I stopped all the datanodes for
the maintenance (Had to work on the power suply...). I had to do that
promptly so I "just" stopped everything with init 0.

>
> If you want you can probably move that folder aside but whatever was
> in those logs is lost (if there ever was anything) until it gets
> replayed properly.

That's fine. Nothing was appening in the cluster for hours. So I'm not
really expecting to loose anything. So I will try to delete the
file...


> Kinda weird that a file wouldn't have any blocks like that, would be
> interesting to see the log of the region server that created it.
Here are the logs where we can see the file creation:
http://pastebin.com/HBc28zab Nothing weird in it I think.

When I removed the file, the region server crashed and had to be restarted.

Restart was not working:
2012-08-03 16:07:49,119 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: remote error
telling master we are up
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.PleaseHoldException: Server
serverName=node1,60020

2012-08-03 16:07:46,112 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: remote error
telling master we are up
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.PleaseHoldException: Server
serverName=node1,60020,1344024290513 rejected; we already have
node1,60020,1343998593757 registered with same hostname and port
        at org.apache.hadoop.hbase.master.ServerManager.checkAlreadySameHostPort(ServerManager.java:194)
        at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:153)
        at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:860)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376)

        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:918)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
        at $Proxy8.regionServerStartup(Unknown Source)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1847)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:666)
        at java.lang.Thread.run(Thread.java:722)

So I had to restart the master. I'm still not able to start it but I
thing I just need to stop everything nicely and restart...

JM

Re: Never ending distributed log split

Posted by Jean-Daniel Cryans <jd...@apache.org>.
On Fri, Aug 3, 2012 at 8:15 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Me again ;)
>
> I did some more investigation.

It would really help to see the region server log although the fsck
output might be enough.

BTW you'll find 0.94.1 RC1 here:
http://people.apache.org/~larsh/hbase-0.94.1-rc1/

>
> And I found that:
>
> http://pastebin.com/Bedm6Ldy
>
> Seems that no region is serving my logs. That's strange because all my
> servers are up and fsck is telling me that FS is clean.

I don't get the "Seems that no region is serving my logs" part. A
region doesn't serve logs, it serves HFiles. You meant to say
DataNode?

>
> Can I just delete those files? What's the impact of such delete? I
> don't really worrie about loosing some data. It's a test environment.
> But I really need it to start again.

I wonder if it's related to: https://issues.apache.org/jira/browse/HBASE-6401

Did you remove a datanode from the cluster as part of the maintenance?

If you want you can probably move that folder aside but whatever was
in those logs is lost (if there ever was anything) until it gets
replayed properly.

Kinda weird that a file wouldn't have any blocks like that, would be
interesting to see the log of the region server that created it.

J-D

Re: Never ending distributed log split

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Me again ;)

I did some more investigation.

And I found that:

http://pastebin.com/Bedm6Ldy

Seems that no region is serving my logs. That's strange because all my
servers are up and fsck is telling me that FS is clean.

Can I just delete those files? What's the impact of such delete? I
don't really worrie about loosing some data. It's a test environment.
But I really need it to start again.

Thanks,

JM

2012/8/3, Jean-Marc Spaggiari <je...@spaggiari.org>:
> Here us the complete log. And seems it's every 30 seconds and not
> every 20 seconds...
>
> http://pastebin.com/gMiURnnj
>
> 2012/8/3, Jean-Marc Spaggiari <je...@spaggiari.org>:
>> Hi,
>>
>> I'm using HBase 0.94.0.
>>
>> I stopped the cluster for some maintenance, and I'm have some troubles
>> to restart it.
>>
>> I'm getting one line every about
>>
>> Start Time 	Description 	State 	Status
>> Fri Aug 03 08:59:54 EDT 2012 	Doing distributed log split in
>> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
>> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
>> 	RUNNING (since 3sec ago) 	Waiting for distributed tasks to finish.
>> scheduled=1 done=0 error=0 (since 0sec ago)
>>
>> If I let it run, it will run like that for hours. Adding lines and
>> lines and lines until I stop it.
>>
>>
>> On the master logs, I can see that:
>> 2012-08-03 09:02:49,788 INFO
>> org.apache.hadoop.hbase.master.SplitLogManager: task
>> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
>> entered state err node4,60020,1343998592129
>> 2012-08-03 09:02:49,788 WARN
>> org.apache.hadoop.hbase.master.SplitLogManager: Error splitting
>> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
>> 2012-08-03 09:02:49,788 WARN
>> org.apache.hadoop.hbase.master.SplitLogManager: error while splitting
>> logs in
>> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
>> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
>> installed = 1 but only 0 done
>> 2012-08-03 09:02:49,788 WARN
>> org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of
>> [latitude,60020,1343908057839, latitude,60020,1343998595290,
>> node1,60020,1343908057567, node1,60020,1343939284240,
>> node1,60020,1343998593757, node2,60020,1343908059614,
>> node2,60020,1343939286369, node2,60020,1343998595830,
>> node3,60020,1343908054414, node3,60020,1343939282294,
>> node3,60020,1343998590612, node4,60020,1343908056186,
>> node4,60020,1343939282889, node4,60020,1343998592129,
>> node5,60020,1343908059158, node5,60020,1343998594856,
>> phenom,60020,1343908053256, phenom,60020,1343939281065,
>> phenom,60020,1343998580375]
>> java.io.IOException: error or interrupt while splitting logs in
>> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
>> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
>> Task = installed = 1 done = 0 error = 1
>>         at
>> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:269)
>>         at
>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:277)
>>         at
>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:219)
>>         at
>> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:577)
>>         at
>> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:522)
>>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:343)
>>         at java.lang.Thread.run(Thread.java:722)
>> 2012-08-03 09:02:49,891 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback:
>> deleted
>> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
>>
>> I would like to try with 0.94.1 but I don't know where to find the
>> files. Does any one have any idea where this is coming from and where
>> I can found 0.94.1RC1?
>>
>> Thanks,
>>
>> JM
>>
>

Re: Never ending distributed log split

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Here us the complete log. And seems it's every 30 seconds and not
every 20 seconds...

http://pastebin.com/gMiURnnj

2012/8/3, Jean-Marc Spaggiari <je...@spaggiari.org>:
> Hi,
>
> I'm using HBase 0.94.0.
>
> I stopped the cluster for some maintenance, and I'm have some troubles
> to restart it.
>
> I'm getting one line every about
>
> Start Time 	Description 	State 	Status
> Fri Aug 03 08:59:54 EDT 2012 	Doing distributed log split in
> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
> 	RUNNING (since 3sec ago) 	Waiting for distributed tasks to finish.
> scheduled=1 done=0 error=0 (since 0sec ago)
>
> If I let it run, it will run like that for hours. Adding lines and
> lines and lines until I stop it.
>
>
> On the master logs, I can see that:
> 2012-08-03 09:02:49,788 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: task
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
> entered state err node4,60020,1343998592129
> 2012-08-03 09:02:49,788 WARN
> org.apache.hadoop.hbase.master.SplitLogManager: Error splitting
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
> 2012-08-03 09:02:49,788 WARN
> org.apache.hadoop.hbase.master.SplitLogManager: error while splitting
> logs in
> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
> installed = 1 but only 0 done
> 2012-08-03 09:02:49,788 WARN
> org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of
> [latitude,60020,1343908057839, latitude,60020,1343998595290,
> node1,60020,1343908057567, node1,60020,1343939284240,
> node1,60020,1343998593757, node2,60020,1343908059614,
> node2,60020,1343939286369, node2,60020,1343998595830,
> node3,60020,1343908054414, node3,60020,1343939282294,
> node3,60020,1343998590612, node4,60020,1343908056186,
> node4,60020,1343939282889, node4,60020,1343998592129,
> node5,60020,1343908059158, node5,60020,1343998594856,
> phenom,60020,1343908053256, phenom,60020,1343939281065,
> phenom,60020,1343998580375]
> java.io.IOException: error or interrupt while splitting logs in
> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
> Task = installed = 1 done = 0 error = 1
>         at
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:269)
>         at
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:277)
>         at
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:219)
>         at
> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:577)
>         at
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:522)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:343)
>         at java.lang.Thread.run(Thread.java:722)
> 2012-08-03 09:02:49,891 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback:
> deleted
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
>
> I would like to try with 0.94.1 but I don't know where to find the
> files. Does any one have any idea where this is coming from and where
> I can found 0.94.1RC1?
>
> Thanks,
>
> JM
>