You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by David Parks <da...@yahoo.com> on 2013/05/21 09:30:10 UTC

Recovering the namenode from failure

I'm on CDH4, and trying to recover both the namenode and cloudera manager
VMs from HDFS after losing the namenode.

 

All of our backup VMs are on HDFS, so for the moment I just want to hack
something together, copy the backup VMs off HDFS and get on with properly
reconfiguring via CDH Manger.

 

So I've installed a plain 'ol namenode on one of my cluster nodes and
started it with -importCheckpoint (with the data from the secondary NN),
this seems to have worked, I have a namenode web UI up which expects to find
32178 blocks.

 

But my plain namenode (on the same hostname and IP as the old namenode) says
that there are no datanodes in the cluster.

 

What do I need in order to configure the datanodes to report their blocks
into this new namenode (same IP & hostname)?

 

Thanks,

David

Re: Recovering the namenode from failure

Posted by Harsh J <ha...@cloudera.com>.

I think he's mentioned the new NN is the same IP and Hostname as the old
one, and uses an actual checkpoint. All he has to do is start the DNs back
up again and they should report in fine.


On Tue, May 21, 2013 at 10:03 PM, Michael Segel
<mi...@hotmail.com>wrote:

> I think what he's missing is to change the configurations to point to the
> new name node.
>
> It sounds like the new NN has a different IP address from the old NN so
> the DNs don't know who to report to...
>
> On May 21, 2013, at 11:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> Hi David,
>
> You shouldn't need to do anything to get your DNs to report in -- as best
> they can tell, it's the same NN. Do you see any error messages in the DN
> logs?
>
> -Todd
>
> On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com>wrote:
>
>> I’m on CDH4, and trying to recover both the namenode and cloudera manager
>> VMs from HDFS after losing the namenode.****
>>
>> ** **
>>
>> All of our backup VMs are on HDFS, so for the moment I just want to hack
>> something together, copy the backup VMs off HDFS and get on with properly
>> reconfiguring via CDH Manger.****
>>
>> ** **
>>
>> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and
>> started it with –importCheckpoint (with the data from the secondary NN),
>> this seems to have worked, I have a namenode web UI up which expects to
>> find 32178 blocks.****
>>
>> ** **
>>
>> But my plain namenode (on the same hostname and IP as the old namenode)
>> says that there are no datanodes in the cluster.****
>>
>> ** **
>>
>> What do I need in order to configure the datanodes to report their blocks
>> into this new namenode (same IP & hostname)?****
>>
>> ** **
>>
>> Thanks,****
>>
>> David****
>>
>> ** **
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>


-- 
Harsh J

Re: Recovering the namenode from failure

Posted by Harsh J <ha...@cloudera.com>.

I think he's mentioned the new NN is the same IP and Hostname as the old
one, and uses an actual checkpoint. All he has to do is start the DNs back
up again and they should report in fine.


On Tue, May 21, 2013 at 10:03 PM, Michael Segel
<mi...@hotmail.com>wrote:

> I think what he's missing is to change the configurations to point to the
> new name node.
>
> It sounds like the new NN has a different IP address from the old NN so
> the DNs don't know who to report to...
>
> On May 21, 2013, at 11:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> Hi David,
>
> You shouldn't need to do anything to get your DNs to report in -- as best
> they can tell, it's the same NN. Do you see any error messages in the DN
> logs?
>
> -Todd
>
> On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com>wrote:
>
>> I’m on CDH4, and trying to recover both the namenode and cloudera manager
>> VMs from HDFS after losing the namenode.****
>>
>> ** **
>>
>> All of our backup VMs are on HDFS, so for the moment I just want to hack
>> something together, copy the backup VMs off HDFS and get on with properly
>> reconfiguring via CDH Manger.****
>>
>> ** **
>>
>> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and
>> started it with –importCheckpoint (with the data from the secondary NN),
>> this seems to have worked, I have a namenode web UI up which expects to
>> find 32178 blocks.****
>>
>> ** **
>>
>> But my plain namenode (on the same hostname and IP as the old namenode)
>> says that there are no datanodes in the cluster.****
>>
>> ** **
>>
>> What do I need in order to configure the datanodes to report their blocks
>> into this new namenode (same IP & hostname)?****
>>
>> ** **
>>
>> Thanks,****
>>
>> David****
>>
>> ** **
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>


-- 
Harsh J

Re: Recovering the namenode from failure

Posted by Harsh J <ha...@cloudera.com>.

I think he's mentioned the new NN is the same IP and Hostname as the old
one, and uses an actual checkpoint. All he has to do is start the DNs back
up again and they should report in fine.


On Tue, May 21, 2013 at 10:03 PM, Michael Segel
<mi...@hotmail.com>wrote:

> I think what he's missing is to change the configurations to point to the
> new name node.
>
> It sounds like the new NN has a different IP address from the old NN so
> the DNs don't know who to report to...
>
> On May 21, 2013, at 11:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> Hi David,
>
> You shouldn't need to do anything to get your DNs to report in -- as best
> they can tell, it's the same NN. Do you see any error messages in the DN
> logs?
>
> -Todd
>
> On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com>wrote:
>
>> I’m on CDH4, and trying to recover both the namenode and cloudera manager
>> VMs from HDFS after losing the namenode.****
>>
>> ** **
>>
>> All of our backup VMs are on HDFS, so for the moment I just want to hack
>> something together, copy the backup VMs off HDFS and get on with properly
>> reconfiguring via CDH Manger.****
>>
>> ** **
>>
>> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and
>> started it with –importCheckpoint (with the data from the secondary NN),
>> this seems to have worked, I have a namenode web UI up which expects to
>> find 32178 blocks.****
>>
>> ** **
>>
>> But my plain namenode (on the same hostname and IP as the old namenode)
>> says that there are no datanodes in the cluster.****
>>
>> ** **
>>
>> What do I need in order to configure the datanodes to report their blocks
>> into this new namenode (same IP & hostname)?****
>>
>> ** **
>>
>> Thanks,****
>>
>> David****
>>
>> ** **
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>


-- 
Harsh J

Re: Recovering the namenode from failure

Posted by Harsh J <ha...@cloudera.com>.

I think he's mentioned the new NN is the same IP and Hostname as the old
one, and uses an actual checkpoint. All he has to do is start the DNs back
up again and they should report in fine.


On Tue, May 21, 2013 at 10:03 PM, Michael Segel
<mi...@hotmail.com>wrote:

> I think what he's missing is to change the configurations to point to the
> new name node.
>
> It sounds like the new NN has a different IP address from the old NN so
> the DNs don't know who to report to...
>
> On May 21, 2013, at 11:23 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> Hi David,
>
> You shouldn't need to do anything to get your DNs to report in -- as best
> they can tell, it's the same NN. Do you see any error messages in the DN
> logs?
>
> -Todd
>
> On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com>wrote:
>
>> I’m on CDH4, and trying to recover both the namenode and cloudera manager
>> VMs from HDFS after losing the namenode.****
>>
>> ** **
>>
>> All of our backup VMs are on HDFS, so for the moment I just want to hack
>> something together, copy the backup VMs off HDFS and get on with properly
>> reconfiguring via CDH Manger.****
>>
>> ** **
>>
>> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and
>> started it with –importCheckpoint (with the data from the secondary NN),
>> this seems to have worked, I have a namenode web UI up which expects to
>> find 32178 blocks.****
>>
>> ** **
>>
>> But my plain namenode (on the same hostname and IP as the old namenode)
>> says that there are no datanodes in the cluster.****
>>
>> ** **
>>
>> What do I need in order to configure the datanodes to report their blocks
>> into this new namenode (same IP & hostname)?****
>>
>> ** **
>>
>> Thanks,****
>>
>> David****
>>
>> ** **
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>


-- 
Harsh J

Re: Recovering the namenode from failure

Posted by Michael Segel <mi...@hotmail.com>.

I think what he's missing is to change the configurations to point to the new name node. 

It sounds like the new NN has a different IP address from the old NN so the DNs don't know who to report to... 

On May 21, 2013, at 11:23 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi David,
> 
> You shouldn't need to do anything to get your DNs to report in -- as best they can tell, it's the same NN. Do you see any error messages in the DN logs?
> 
> -Todd
> 
> On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com> wrote:
> I’m on CDH4, and trying to recover both the namenode and cloudera manager VMs from HDFS after losing the namenode.
> 
>  
> 
> All of our backup VMs are on HDFS, so for the moment I just want to hack something together, copy the backup VMs off HDFS and get on with properly reconfiguring via CDH Manger.
> 
>  
> 
> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and started it with –importCheckpoint (with the data from the secondary NN), this seems to have worked, I have a namenode web UI up which expects to find 32178 blocks.
> 
>  
> 
> But my plain namenode (on the same hostname and IP as the old namenode) says that there are no datanodes in the cluster.
> 
>  
> 
> What do I need in order to configure the datanodes to report their blocks into this new namenode (same IP & hostname)?
> 
>  
> 
> Thanks,
> 
> David
> 
>  
> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: Recovering the namenode from failure

Posted by Michael Segel <mi...@hotmail.com>.

I think what he's missing is to change the configurations to point to the new name node. 

It sounds like the new NN has a different IP address from the old NN so the DNs don't know who to report to... 

On May 21, 2013, at 11:23 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi David,
> 
> You shouldn't need to do anything to get your DNs to report in -- as best they can tell, it's the same NN. Do you see any error messages in the DN logs?
> 
> -Todd
> 
> On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com> wrote:
> I’m on CDH4, and trying to recover both the namenode and cloudera manager VMs from HDFS after losing the namenode.
> 
>  
> 
> All of our backup VMs are on HDFS, so for the moment I just want to hack something together, copy the backup VMs off HDFS and get on with properly reconfiguring via CDH Manger.
> 
>  
> 
> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and started it with –importCheckpoint (with the data from the secondary NN), this seems to have worked, I have a namenode web UI up which expects to find 32178 blocks.
> 
>  
> 
> But my plain namenode (on the same hostname and IP as the old namenode) says that there are no datanodes in the cluster.
> 
>  
> 
> What do I need in order to configure the datanodes to report their blocks into this new namenode (same IP & hostname)?
> 
>  
> 
> Thanks,
> 
> David
> 
>  
> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: Recovering the namenode from failure

Posted by Michael Segel <mi...@hotmail.com>.

I think what he's missing is to change the configurations to point to the new name node. 

It sounds like the new NN has a different IP address from the old NN so the DNs don't know who to report to... 

On May 21, 2013, at 11:23 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi David,
> 
> You shouldn't need to do anything to get your DNs to report in -- as best they can tell, it's the same NN. Do you see any error messages in the DN logs?
> 
> -Todd
> 
> On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com> wrote:
> I’m on CDH4, and trying to recover both the namenode and cloudera manager VMs from HDFS after losing the namenode.
> 
>  
> 
> All of our backup VMs are on HDFS, so for the moment I just want to hack something together, copy the backup VMs off HDFS and get on with properly reconfiguring via CDH Manger.
> 
>  
> 
> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and started it with –importCheckpoint (with the data from the secondary NN), this seems to have worked, I have a namenode web UI up which expects to find 32178 blocks.
> 
>  
> 
> But my plain namenode (on the same hostname and IP as the old namenode) says that there are no datanodes in the cluster.
> 
>  
> 
> What do I need in order to configure the datanodes to report their blocks into this new namenode (same IP & hostname)?
> 
>  
> 
> Thanks,
> 
> David
> 
>  
> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: Recovering the namenode from failure

Posted by Michael Segel <mi...@hotmail.com>.

I think what he's missing is to change the configurations to point to the new name node. 

It sounds like the new NN has a different IP address from the old NN so the DNs don't know who to report to... 

On May 21, 2013, at 11:23 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi David,
> 
> You shouldn't need to do anything to get your DNs to report in -- as best they can tell, it's the same NN. Do you see any error messages in the DN logs?
> 
> -Todd
> 
> On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com> wrote:
> I’m on CDH4, and trying to recover both the namenode and cloudera manager VMs from HDFS after losing the namenode.
> 
>  
> 
> All of our backup VMs are on HDFS, so for the moment I just want to hack something together, copy the backup VMs off HDFS and get on with properly reconfiguring via CDH Manger.
> 
>  
> 
> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and started it with –importCheckpoint (with the data from the secondary NN), this seems to have worked, I have a namenode web UI up which expects to find 32178 blocks.
> 
>  
> 
> But my plain namenode (on the same hostname and IP as the old namenode) says that there are no datanodes in the cluster.
> 
>  
> 
> What do I need in order to configure the datanodes to report their blocks into this new namenode (same IP & hostname)?
> 
>  
> 
> Thanks,
> 
> David
> 
>  
> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Errors using MultipleOutputs and LZO compression

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Subject : Errors using MultipleOutputs and LZO compression
============================================
Hi

In Cloudera Manager 4.1.2. , we have defined a <MR-action> in Oozie thru the Hue interface.
This MR action is designed to read GZIP input files (typically 350+ gzip files ranging from 20MB to 200MB gzip size) and output either GZIP or LZO files
In the reducer MultipleOutputs are used to write the output.

Success Usecases
==============
1. mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
2. mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec

2. mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
    Reduced set of input gzip files (5-10 gzip files only)

Failure Usecase
============
1. mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
    350 gzip input files

ERRORs in Logs
=============
2013-05-20 16:05:44,849 ERROR [Thread-2] org.apache.hadoop.hdfs.DFSClient: Failed to close file /user/sasubramanian/impressions/output/outpdir/2013-03-19/0000044-130515165107614-oozie-oozi-W/_temporary/1/_temporary/attempt_1368666339740_5579_r_000011_3/header/2013-03-19/ieeuu3.pv.ie.nextag.com/part-r-00011.lzo<http://ieeuu3.pv.ie.nextag.com/part-r-00011.lzo>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/sasubramanian/impressions/output/outpdir/2013-03-19/0000044-130515165107614-oozie-oozi-W/_temporary/1/_temporary/attempt_1368666339740_5579_r_000011_3/header/2013-03-19/ieeuu3.pv.ie.nextag.com/part-r-00011.lzo<http://ieeuu3.pv.ie.nextag.com/part-r-00011.lzo> File does not exist. [Lease.  Holder: DFSClient_attempt_1368666339740_5579_r_000011_3_-1369131598_1, pendingcreates: 3]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2366)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2343)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:526)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44084)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.complete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:329)
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1769)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1756)
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:654)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:671)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:539)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2324)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-05-20 16:05:44,850 WARN [Thread-895] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception

Need your help
thanks


sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Errors using MultipleOutputs and LZO compression

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Subject : Errors using MultipleOutputs and LZO compression
============================================
Hi

In Cloudera Manager 4.1.2. , we have defined a <MR-action> in Oozie thru the Hue interface.
This MR action is designed to read GZIP input files (typically 350+ gzip files ranging from 20MB to 200MB gzip size) and output either GZIP or LZO files
In the reducer MultipleOutputs are used to write the output.

Success Usecases
==============
1. mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
2. mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec

2. mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
    Reduced set of input gzip files (5-10 gzip files only)

Failure Usecase
============
1. mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
    350 gzip input files

ERRORs in Logs
=============
2013-05-20 16:05:44,849 ERROR [Thread-2] org.apache.hadoop.hdfs.DFSClient: Failed to close file /user/sasubramanian/impressions/output/outpdir/2013-03-19/0000044-130515165107614-oozie-oozi-W/_temporary/1/_temporary/attempt_1368666339740_5579_r_000011_3/header/2013-03-19/ieeuu3.pv.ie.nextag.com/part-r-00011.lzo<http://ieeuu3.pv.ie.nextag.com/part-r-00011.lzo>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/sasubramanian/impressions/output/outpdir/2013-03-19/0000044-130515165107614-oozie-oozi-W/_temporary/1/_temporary/attempt_1368666339740_5579_r_000011_3/header/2013-03-19/ieeuu3.pv.ie.nextag.com/part-r-00011.lzo<http://ieeuu3.pv.ie.nextag.com/part-r-00011.lzo> File does not exist. [Lease.  Holder: DFSClient_attempt_1368666339740_5579_r_000011_3_-1369131598_1, pendingcreates: 3]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2366)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2343)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:526)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44084)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.complete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:329)
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1769)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1756)
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:654)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:671)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:539)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2324)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-05-20 16:05:44,850 WARN [Thread-895] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception

Need your help
thanks


sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Errors using MultipleOutputs and LZO compression

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Subject : Errors using MultipleOutputs and LZO compression
============================================
Hi

In Cloudera Manager 4.1.2. , we have defined a <MR-action> in Oozie thru the Hue interface.
This MR action is designed to read GZIP input files (typically 350+ gzip files ranging from 20MB to 200MB gzip size) and output either GZIP or LZO files
In the reducer MultipleOutputs are used to write the output.

Success Usecases
==============
1. mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
2. mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec

2. mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
    Reduced set of input gzip files (5-10 gzip files only)

Failure Usecase
============
1. mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
    350 gzip input files

ERRORs in Logs
=============
2013-05-20 16:05:44,849 ERROR [Thread-2] org.apache.hadoop.hdfs.DFSClient: Failed to close file /user/sasubramanian/impressions/output/outpdir/2013-03-19/0000044-130515165107614-oozie-oozi-W/_temporary/1/_temporary/attempt_1368666339740_5579_r_000011_3/header/2013-03-19/ieeuu3.pv.ie.nextag.com/part-r-00011.lzo<http://ieeuu3.pv.ie.nextag.com/part-r-00011.lzo>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/sasubramanian/impressions/output/outpdir/2013-03-19/0000044-130515165107614-oozie-oozi-W/_temporary/1/_temporary/attempt_1368666339740_5579_r_000011_3/header/2013-03-19/ieeuu3.pv.ie.nextag.com/part-r-00011.lzo<http://ieeuu3.pv.ie.nextag.com/part-r-00011.lzo> File does not exist. [Lease.  Holder: DFSClient_attempt_1368666339740_5579_r_000011_3_-1369131598_1, pendingcreates: 3]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2366)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2343)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:526)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44084)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.complete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:329)
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1769)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1756)
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:654)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:671)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:539)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2324)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-05-20 16:05:44,850 WARN [Thread-895] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception

Need your help
thanks


sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Errors using MultipleOutputs and LZO compression

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Subject : Errors using MultipleOutputs and LZO compression
============================================
Hi

In Cloudera Manager 4.1.2. , we have defined a <MR-action> in Oozie thru the Hue interface.
This MR action is designed to read GZIP input files (typically 350+ gzip files ranging from 20MB to 200MB gzip size) and output either GZIP or LZO files
In the reducer MultipleOutputs are used to write the output.

Success Usecases
==============
1. mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
2. mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec

2. mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
    Reduced set of input gzip files (5-10 gzip files only)

Failure Usecase
============
1. mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
    350 gzip input files

ERRORs in Logs
=============
2013-05-20 16:05:44,849 ERROR [Thread-2] org.apache.hadoop.hdfs.DFSClient: Failed to close file /user/sasubramanian/impressions/output/outpdir/2013-03-19/0000044-130515165107614-oozie-oozi-W/_temporary/1/_temporary/attempt_1368666339740_5579_r_000011_3/header/2013-03-19/ieeuu3.pv.ie.nextag.com/part-r-00011.lzo<http://ieeuu3.pv.ie.nextag.com/part-r-00011.lzo>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/sasubramanian/impressions/output/outpdir/2013-03-19/0000044-130515165107614-oozie-oozi-W/_temporary/1/_temporary/attempt_1368666339740_5579_r_000011_3/header/2013-03-19/ieeuu3.pv.ie.nextag.com/part-r-00011.lzo<http://ieeuu3.pv.ie.nextag.com/part-r-00011.lzo> File does not exist. [Lease.  Holder: DFSClient_attempt_1368666339740_5579_r_000011_3_-1369131598_1, pendingcreates: 3]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2366)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2343)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:526)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44084)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.complete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:329)
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1769)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1756)
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:654)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:671)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:539)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2324)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-05-20 16:05:44,850 WARN [Thread-895] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception

Need your help
thanks


sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Recovering the namenode from failure

Posted by Todd Lipcon <to...@cloudera.com>.

Hi David,

You shouldn't need to do anything to get your DNs to report in -- as best
they can tell, it's the same NN. Do you see any error messages in the DN
logs?

-Todd

On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com>wrote:

> I’m on CDH4, and trying to recover both the namenode and cloudera manager
> VMs from HDFS after losing the namenode.****
>
> ** **
>
> All of our backup VMs are on HDFS, so for the moment I just want to hack
> something together, copy the backup VMs off HDFS and get on with properly
> reconfiguring via CDH Manger.****
>
> ** **
>
> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and
> started it with –importCheckpoint (with the data from the secondary NN),
> this seems to have worked, I have a namenode web UI up which expects to
> find 32178 blocks.****
>
> ** **
>
> But my plain namenode (on the same hostname and IP as the old namenode)
> says that there are no datanodes in the cluster.****
>
> ** **
>
> What do I need in order to configure the datanodes to report their blocks
> into this new namenode (same IP & hostname)?****
>
> ** **
>
> Thanks,****
>
> David****
>
> ** **
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Recovering the namenode from failure

Posted by Todd Lipcon <to...@cloudera.com>.

Hi David,

You shouldn't need to do anything to get your DNs to report in -- as best
they can tell, it's the same NN. Do you see any error messages in the DN
logs?

-Todd

On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com>wrote:

> I’m on CDH4, and trying to recover both the namenode and cloudera manager
> VMs from HDFS after losing the namenode.****
>
> ** **
>
> All of our backup VMs are on HDFS, so for the moment I just want to hack
> something together, copy the backup VMs off HDFS and get on with properly
> reconfiguring via CDH Manger.****
>
> ** **
>
> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and
> started it with –importCheckpoint (with the data from the secondary NN),
> this seems to have worked, I have a namenode web UI up which expects to
> find 32178 blocks.****
>
> ** **
>
> But my plain namenode (on the same hostname and IP as the old namenode)
> says that there are no datanodes in the cluster.****
>
> ** **
>
> What do I need in order to configure the datanodes to report their blocks
> into this new namenode (same IP & hostname)?****
>
> ** **
>
> Thanks,****
>
> David****
>
> ** **
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Recovering the namenode from failure

Posted by Todd Lipcon <to...@cloudera.com>.

Hi David,

You shouldn't need to do anything to get your DNs to report in -- as best
they can tell, it's the same NN. Do you see any error messages in the DN
logs?

-Todd

On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com>wrote:

> I’m on CDH4, and trying to recover both the namenode and cloudera manager
> VMs from HDFS after losing the namenode.****
>
> ** **
>
> All of our backup VMs are on HDFS, so for the moment I just want to hack
> something together, copy the backup VMs off HDFS and get on with properly
> reconfiguring via CDH Manger.****
>
> ** **
>
> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and
> started it with –importCheckpoint (with the data from the secondary NN),
> this seems to have worked, I have a namenode web UI up which expects to
> find 32178 blocks.****
>
> ** **
>
> But my plain namenode (on the same hostname and IP as the old namenode)
> says that there are no datanodes in the cluster.****
>
> ** **
>
> What do I need in order to configure the datanodes to report their blocks
> into this new namenode (same IP & hostname)?****
>
> ** **
>
> Thanks,****
>
> David****
>
> ** **
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Recovering the namenode from failure

Posted by Todd Lipcon <to...@cloudera.com>.

Hi David,

You shouldn't need to do anything to get your DNs to report in -- as best
they can tell, it's the same NN. Do you see any error messages in the DN
logs?

-Todd

On Tue, May 21, 2013 at 12:30 AM, David Parks <da...@yahoo.com>wrote:

> I’m on CDH4, and trying to recover both the namenode and cloudera manager
> VMs from HDFS after losing the namenode.****
>
> ** **
>
> All of our backup VMs are on HDFS, so for the moment I just want to hack
> something together, copy the backup VMs off HDFS and get on with properly
> reconfiguring via CDH Manger.****
>
> ** **
>
> So I’ve installed a plain ‘ol namenode on one of my cluster nodes and
> started it with –importCheckpoint (with the data from the secondary NN),
> this seems to have worked, I have a namenode web UI up which expects to
> find 32178 blocks.****
>
> ** **
>
> But my plain namenode (on the same hostname and IP as the old namenode)
> says that there are no datanodes in the cluster.****
>
> ** **
>
> What do I need in order to configure the datanodes to report their blocks
> into this new namenode (same IP & hostname)?****
>
> ** **
>
> Thanks,****
>
> David****
>
> ** **
>



-- 
Todd Lipcon
Software Engineer, Cloudera