You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Scott Bush <sb...@yahoo-inc.com> on 2011/10/04 21:40:10 UTC
0.20.205.0 Release Candidate 1 Testing
Matt,
Yahoo testing has found 1 issue so far that we believe is a blocker in 0.20.205.0 RC-1:
https://issues.apache.org/jira/browse/HDFS-2392 – distcp with hftp is failing
If there is an RC2 we would also like the following 2 JIRA to be considered as candidates:
1. https://issues.apache.org/jira/browse/MAPREDUCE-2777 – CPU usage counters
2. https://issues.apache.org/jira/browse/HDFS-2358 - NPE when the default filesystem's uri has no authority
MAPREDUCE-2777 was originally included in 205 RC1 but pulled due to (I believe) unrelated unit test failures that are no longer an issue.
Thanks,
Scott Bush
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Matt Foley <mf...@hortonworks.com>.
Thanks for the good catch!
On Thu, Oct 6, 2011 at 8:05 AM, Stack <st...@duboce.net> wrote:
> Thanks lads,
> St.Ack
>
> On Thu, Oct 6, 2011 at 12:50 AM, Suresh Srinivas <su...@hortonworks.com>
> wrote:
> > This change has been already committed to 205 and 20-security (thanks
> > Aaron).
> >
> > On Wed, Oct 5, 2011 at 11:09 PM, Stack <st...@duboce.net> wrote:
> >
> >> I found what I think a critical issue w/ RC1. May we get
> >> https://issues.apache.org/jira/browse/HDFS-2408 into RC2?
> >>
> >> Thanks,
> >> St.Ack
> >>
> >>
> >>
> >> On Tue, Oct 4, 2011 at 11:24 PM, Stack <st...@duboce.net> wrote:
> >> > On Tue, Oct 4, 2011 at 10:05 PM, Stack <st...@duboce.net> wrote:
> >> >> On Tue, Oct 4, 2011 at 8:59 PM, Suresh Srinivas <
> suresh@hortonworks.com>
> >> wrote:
> >> >>> Have you turned on the append supported flag in the configuration of
> >> HDFS?
> >> >>> Also can you please provide steps to duplicate this problem?
> >> >>>
> >> >
> >> > That seems to have been it. I see logs reporting 0 size but I'm
> >> > getting edits out of them now. Thanks. Will keep banging on it.
> >> > St.Ack
> >> >
> >>
> >
>
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Stack <st...@duboce.net>.
Thanks lads,
St.Ack
On Thu, Oct 6, 2011 at 12:50 AM, Suresh Srinivas <su...@hortonworks.com> wrote:
> This change has been already committed to 205 and 20-security (thanks
> Aaron).
>
> On Wed, Oct 5, 2011 at 11:09 PM, Stack <st...@duboce.net> wrote:
>
>> I found what I think a critical issue w/ RC1. May we get
>> https://issues.apache.org/jira/browse/HDFS-2408 into RC2?
>>
>> Thanks,
>> St.Ack
>>
>>
>>
>> On Tue, Oct 4, 2011 at 11:24 PM, Stack <st...@duboce.net> wrote:
>> > On Tue, Oct 4, 2011 at 10:05 PM, Stack <st...@duboce.net> wrote:
>> >> On Tue, Oct 4, 2011 at 8:59 PM, Suresh Srinivas <su...@hortonworks.com>
>> wrote:
>> >>> Have you turned on the append supported flag in the configuration of
>> HDFS?
>> >>> Also can you please provide steps to duplicate this problem?
>> >>>
>> >
>> > That seems to have been it. I see logs reporting 0 size but I'm
>> > getting edits out of them now. Thanks. Will keep banging on it.
>> > St.Ack
>> >
>>
>
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Suresh Srinivas <su...@hortonworks.com>.
This change has been already committed to 205 and 20-security (thanks
Aaron).
On Wed, Oct 5, 2011 at 11:09 PM, Stack <st...@duboce.net> wrote:
> I found what I think a critical issue w/ RC1. May we get
> https://issues.apache.org/jira/browse/HDFS-2408 into RC2?
>
> Thanks,
> St.Ack
>
>
>
> On Tue, Oct 4, 2011 at 11:24 PM, Stack <st...@duboce.net> wrote:
> > On Tue, Oct 4, 2011 at 10:05 PM, Stack <st...@duboce.net> wrote:
> >> On Tue, Oct 4, 2011 at 8:59 PM, Suresh Srinivas <su...@hortonworks.com>
> wrote:
> >>> Have you turned on the append supported flag in the configuration of
> HDFS?
> >>> Also can you please provide steps to duplicate this problem?
> >>>
> >
> > That seems to have been it. I see logs reporting 0 size but I'm
> > getting edits out of them now. Thanks. Will keep banging on it.
> > St.Ack
> >
>
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Stack <st...@duboce.net>.
I found what I think a critical issue w/ RC1. May we get
https://issues.apache.org/jira/browse/HDFS-2408 into RC2?
Thanks,
St.Ack
On Tue, Oct 4, 2011 at 11:24 PM, Stack <st...@duboce.net> wrote:
> On Tue, Oct 4, 2011 at 10:05 PM, Stack <st...@duboce.net> wrote:
>> On Tue, Oct 4, 2011 at 8:59 PM, Suresh Srinivas <su...@hortonworks.com> wrote:
>>> Have you turned on the append supported flag in the configuration of HDFS?
>>> Also can you please provide steps to duplicate this problem?
>>>
>
> That seems to have been it. I see logs reporting 0 size but I'm
> getting edits out of them now. Thanks. Will keep banging on it.
> St.Ack
>
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Stack <st...@duboce.net>.
On Tue, Oct 4, 2011 at 10:05 PM, Stack <st...@duboce.net> wrote:
> On Tue, Oct 4, 2011 at 8:59 PM, Suresh Srinivas <su...@hortonworks.com> wrote:
>> Have you turned on the append supported flag in the configuration of HDFS?
>> Also can you please provide steps to duplicate this problem?
>>
That seems to have been it. I see logs reporting 0 size but I'm
getting edits out of them now. Thanks. Will keep banging on it.
St.Ack
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Stack <st...@duboce.net>.
On Tue, Oct 4, 2011 at 8:59 PM, Suresh Srinivas <su...@hortonworks.com> wrote:
> Have you turned on the append supported flag in the configuration of HDFS?
> Also can you please provide steps to duplicate this problem?
>
Thanks Suresh:
I had it client-side only. Let me turn it on all-around. I'll be back....
St.Ack
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Suresh Srinivas <su...@hortonworks.com>.
Stack,
Have you turned on the append supported flag in the configuration of HDFS?
Also can you please provide steps to duplicate this problem?
Regards,
Suresh
On Tue, Oct 4, 2011 at 4:40 PM, Stack <st...@duboce.net> wrote:
> On Tue, Oct 4, 2011 at 3:41 PM, Matt Foley <mf...@hortonworks.com> wrote:
> > I am going to spin an RC2 early tomorrow. Does anyone have other issues
> > they consider critical for 205.0?
>
> I've been playing with it.
>
> Recovering the lease on an open file (An HBase WAL) the length is
> always zero and I don't seem to be able to recover any edits from the
> file we writing at time of the crash:
>
> 2011-10-04 21:17:04,486 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog
> 34 of 34:
> hdfs://sv4r11s38:7000/hbase/.logs/sv4r8s38,7003,1317760866490/sv4r8s38%3A7003.1317762914728,
> length=0
> 2011-10-04 21:17:04,486 INFO org.apache.hadoop.hbase.util.FSUtils:
> Recovering file
>
> hdfs://sv4r11s38:7000/hbase/.logs/sv4r8s38,7003,1317760866490/sv4r8s38%3A7003.1317762914728
> 2011-10-04 21:17:05,487 INFO org.apache.hadoop.hbase.util.FSUtils:
> Finished lease recover attempt for
>
> hdfs://sv4r11s38:7000/hbase/.logs/sv4r8s38,7003,1317760866490/sv4r8s38%3A7003.1317762914728
> 2011-10-04 21:17:05,488 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
>
> hdfs://sv4r11s38:7000/hbase/.logs/sv4r8s38,7003,1317760866490/sv4r8s38%3A7003.1317762914728
> might be still open, length is 0
>
> Its probably me misconfiguring 205 compared to 0.20-append.
>
> I got some of these tooo though I'd just opened the file a few seconds
> earlier:
>
> 2011-10-04 21:16:28,439 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer
>
> path=hdfs://sv4r11s38:7000/hbase/TestTable/62ff2cb514838519e5fa4282a8af4c35/recovered.edits/0000000000000008111
> region=62ff2cb514838519e5fa4282a8af4c35
>
> ....
>
> 2011-10-04 21:17:06,883 ERROR
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Couldn't close
> log at
> hdfs://sv4r11s38:7000/hbase/TestTable/62ff2cb514838519e5fa4282a8af4c35/recovered.edits/0000000000000008111
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> on
> /hbase/TestTable/62ff2cb514838519e5fa4282a8af4c35/recovered.edits/0000000000000008111
> File does not exist. [Lease. Holder:
> DFSClient_hb_m_sv4r11s38:7001_1317760883384, pendingcreates: 3]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1604)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1595)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1650)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1638)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:682)
> at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1066)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> at $Proxy6.complete(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy6.complete(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3711)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
> at
> org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:138)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeStreams(HLogSplitter.java:768)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:753)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:300)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188)
> at
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:201)
> at
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:153)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
> I'll keep banging at it.
>
> St.Ack
>
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Stack <st...@duboce.net>.
On Tue, Oct 4, 2011 at 3:41 PM, Matt Foley <mf...@hortonworks.com> wrote:
> I am going to spin an RC2 early tomorrow. Does anyone have other issues
> they consider critical for 205.0?
I've been playing with it.
Recovering the lease on an open file (An HBase WAL) the length is
always zero and I don't seem to be able to recover any edits from the
file we writing at time of the crash:
2011-10-04 21:17:04,486 DEBUG
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog
34 of 34: hdfs://sv4r11s38:7000/hbase/.logs/sv4r8s38,7003,1317760866490/sv4r8s38%3A7003.1317762914728,
length=0
2011-10-04 21:17:04,486 INFO org.apache.hadoop.hbase.util.FSUtils:
Recovering file
hdfs://sv4r11s38:7000/hbase/.logs/sv4r8s38,7003,1317760866490/sv4r8s38%3A7003.1317762914728
2011-10-04 21:17:05,487 INFO org.apache.hadoop.hbase.util.FSUtils:
Finished lease recover attempt for
hdfs://sv4r11s38:7000/hbase/.logs/sv4r8s38,7003,1317760866490/sv4r8s38%3A7003.1317762914728
2011-10-04 21:17:05,488 WARN
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
hdfs://sv4r11s38:7000/hbase/.logs/sv4r8s38,7003,1317760866490/sv4r8s38%3A7003.1317762914728
might be still open, length is 0
Its probably me misconfiguring 205 compared to 0.20-append.
I got some of these tooo though I'd just opened the file a few seconds earlier:
2011-10-04 21:16:28,439 DEBUG
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer
path=hdfs://sv4r11s38:7000/hbase/TestTable/62ff2cb514838519e5fa4282a8af4c35/recovered.edits/0000000000000008111
region=62ff2cb514838519e5fa4282a8af4c35
....
2011-10-04 21:17:06,883 ERROR
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Couldn't close
log at hdfs://sv4r11s38:7000/hbase/TestTable/62ff2cb514838519e5fa4282a8af4c35/recovered.edits/0000000000000008111
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
on /hbase/TestTable/62ff2cb514838519e5fa4282a8af4c35/recovered.edits/0000000000000008111
File does not exist. [Lease. Holder:
DFSClient_hb_m_sv4r11s38:7001_1317760883384, pendingcreates: 3]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1604)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1595)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1650)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1638)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:682)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy6.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy6.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3711)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:138)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeStreams(HLogSplitter.java:768)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:753)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:300)
at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:201)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:153)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I'll keep banging at it.
St.Ack
Re: 0.20.205.0 Release Candidate 1 Testing
Posted by Matt Foley <mf...@hortonworks.com>.
Thanks, Scott. We have also found a couple critical issues that should be
fixed in 205.0:
HDFS-2392 - Dist with hftp is failing again
HDFS-2395 - webhdfs api's should return a root element in the json
response
I am going to spin an RC2 early tomorrow. Does anyone have other issues
they consider critical for 205.0?
--Matt
On Tue, Oct 4, 2011 at 12:40 PM, Scott Bush <sb...@yahoo-inc.com> wrote:
> Matt,
>
> Yahoo testing has found 1 issue so far that we believe is a blocker in
> 0.20.205.0 RC-1:
> https://issues.apache.org/jira/browse/HDFS-2392 – distcp with hftp is
> failing
>
> If there is an RC2 we would also like the following 2 JIRA to be considered
> as candidates:
>
> 1. https://issues.apache.org/jira/browse/MAPREDUCE-2777 – CPU usage
> counters
> 2. https://issues.apache.org/jira/browse/HDFS-2358 - NPE when the
> default filesystem's uri has no authority
>
> MAPREDUCE-2777 was originally included in 205 RC1 but pulled due to (I
> believe) unrelated unit test failures that are no longer an issue.
>
> Thanks,
> Scott Bush
>