You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Arpit Gupta (Commented) (JIRA)" <ji...@apache.org> on 2012/02/01 21:55:00 UTC
[jira] [Commented] (MAPREDUCE-3782) teragen terasort jobs fail when using webhdfs://

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198136#comment-13198136 ] 

Arpit Gupta commented on MAPREDUCE-3782:
----------------------------------------

here are the steps to reproduce the issue (make sure webhdfs is enabled)

1. run teragen

bin/hadoop --config HADOOP_CONF_DIR jar hadoop-mapreduce-examples-0.23.1-SNAPSHOT.jar teragen 10000 webhdfs://NN:50070/user/path

prints the following 
{code}
12/02/01 20:03:33 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 158 for hrt_qa on 98.137.234.231:8020
12/02/01 20:03:33 INFO security.TokenCache: Got dt for hdfs://NN:8020;uri=IP:8020;t.service=IP:8020
{code}

Since a webhdfs url was provided a webhdfs delegation token should have been used.


2. Run terasort

bin/hadoop --config HADOOP_CONF_DIR jar hadoop-mapreduce-examples-0.23.1-SNAPSHOT.jar terasort webhdfs://NN:50070/user/path webhdfs://NN:50070/user/path2

This gets 2 delegation tokens one webhdfs and the other hdfs

{code}
12/02/01 20:03:48 INFO terasort.TeraSort: starting
12/02/01 20:03:49 INFO security.TokenCache: Got dt for webhdfs://NN:50070;uri=IP:50070;t.service=IP:50070
Spent 65ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 67ms
Sampling 2 splits of 2
Making 1 from 10000 sampled records
Computing parititions took 1668ms
Spent 1740ms computing partitions.
12/02/01 20:03:51 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 161 for USER on 98.137.234.231:8020
12/02/01 20:03:51 INFO security.TokenCache: Got dt for hdfs://NN:8020;uri=IP:8020;t.service=IP:8020
{code}

Both the tokens should be webhdfs delegation tokens.

Then we see a java io exception and the job fails. Below is the stack trace

{code}
2/02/01 20:03:54 INFO mapreduce.Job: Job job_1328054538421_0037 failed with state FAILED due to: Application application_1328054538421_0037 failed 1 times due to AM Container for appattempt_1328054538421_0037_000001 exited with  exitCode: -1000 due to: RemoteTrace: 
java.io.IOException: Offset=0 out of the range [0, 0); OPEN, path=/path/terajobs/output/_partition.lst
        at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:167)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:267)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$000(WebHdfsFileSystem.java:105)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlInputStream.checkResponseCode(WebHdfsFileSystem.java:676)
        at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:106)
        at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:130)
        at java.io.InputStream.read(InputStream.java:154)
        at java.io.DataInputStream.read(DataInputStream.java:83)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:75)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:109)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:260)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:232)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:183)
        at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1837)
        at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1806)
        at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1782)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:95)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
 at LocalTrace: 
        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Offset=0 out of the range [0, 0); OPEN, path=/user/hrt_qa/hdfsRegressionData/terajobs/output/_partition.lst
        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
{code}


                
> teragen terasort jobs fail when using webhdfs:// 
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3782
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3782
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Arpit Gupta
>
> When running a teragen job with a webhdfs:// url the delegation token that is retrieved is an hdfs delegation token. 
> And the subsequent terasort job on the output fails with java io exception

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira