You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Aaron Baff (JIRA)" <ji...@apache.org> on 2011/05/04 00:09:04 UTC

[jira] [Created] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Receiving NPE occasionally on RunningJob.getCounters() call
-----------------------------------------------------------

                 Key: MAPREDUCE-2470
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: client
    Affects Versions: 0.21.0
         Environment: FreeBSD, Java6, Hadoop r0.21.0
            Reporter: Aaron Baff


This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.


java.lang.NullPointerException
            at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
            at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
            at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
            at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
            at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
            at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
            at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
            at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
            at java.lang.Thread.run(Thread.java:619)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032596#comment-13032596 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

The contrib test failure is org.apache.hadoop.raid.TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2  which has been failing for a while and is a known issue.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034447#comment-13034447 ] 

Chris Douglas commented on MAPREDUCE-2470:
------------------------------------------

Minor comments on v1:
* Typo in {{testNullCounterDoengrade}}
* {{TestCounters}} might be a better location for the unit test than {{TestJobCounters}}, which is an integration test (starts a {{MiniMRCluster}}, etc.)

More generally: this would not match the other API downgrade methods (e.g. the \*ID, \*Status, and \*Report classes), which throw NPEs. Would it make sense for the client to handle the {{null}} RPC, instead?

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039054#comment-13039054 ] 

Hudson commented on MAPREDUCE-2470:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #700 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/700/])
    MAPREDUCE-2470. Fix NPE in RunningJobs::getCounters.
Contributed by Robert Joseph Evans

cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127444
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
* /hadoop/mapreduce/trunk/src/test/system/test/org/apache/hadoop/mapred/TestTaskKilling.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java


> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, MAPREDUCE-2470-v3.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037406#comment-13037406 ] 

Hudson commented on MAPREDUCE-2470:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #686 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/686/])
    Revert MAPREDUCE-2470
MAPREDUCE-2470. Fix NPE in RunningJobs::getCounters.
Contributed by Robert Joseph Evans

cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125599
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java

cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125578
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java


> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-2470:
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

bq. The difference is that with Job.getStatus it calls internally updateStatus which will throw an IOException if it is unable to get an updated status from the JobTracker. The only time you can get an NPE from that is if you catch the IOException, ignore it, and keep trying to use the Job object what was originally created.

Got it. Thanks for the explanation.

+1 I committed this. Thanks!

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Status: Open  (was: Patch Available)

Looks like Jenkins did not see the patch, so trying again

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038143#comment-13038143 ] 

Hadoop QA commented on MAPREDUCE-2470:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12480124/MAPREDUCE-2470-v3.patch
  against trunk revision 1126499.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//console

This message is automatically generated.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, MAPREDUCE-2470-v3.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032401#comment-13032401 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

Yes I totally agree, we need to update the documentation, and make sure no NPE happens.  I also wanted to look at the other methods in RunningJob to be sure that they all behave correctly after the job has been retired.  I don't know if there is a way to look it up in memory and return it even after the job has been retired, I will look at it and see.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037181#comment-13037181 ] 

Hudson commented on MAPREDUCE-2470:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #694 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/694/])
    Revert MAPREDUCE-2470

cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125599
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java


> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036811#comment-13036811 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

The difference is that with Job.getStatus it calls internally updateStatus which will throw an IOException if it is unable to get an updated status from the JobTracker.  The only time you can get an NPE from that is if you catch the IOException, ignore it, and keep trying to use the Job object what was originally created.  We can fix that, by modifying updateStatus to not overwrite Job.status until it knows that the updated status is not null.  But that was not explicitly part of this JIRA, and I could not find any other JIRA to cover it so I though it was not a big deal, and most likely worked as designed. 

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034786#comment-13034786 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

Sorry, I misunderstood the comment from before.  Yes I think it would be fine for us to not modify Counters.downgrade, and instead have NetworkedJob check for null to avoid calling downgrade at all.  

All other instances of calling *.downgrade from inside NetworedJob can never have a null passed to them, except possibly in the odd case where a null was explicitly set programatically, or possibly when the jobtracker completely loses a job and client.getJobStatus returns null and the first IOExeption is ignored. Either way it takes someone working hard to make it through an NPE.

I looked and could not find an appropriate place to add in the test to an existing file, so I will create a new Unit Test file under org.apache.hadoop.mapred for it.  Unless you know of a better place to put it.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034763#comment-13034763 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

I will update the typo and move the test, thanks for the catch.

I will also investigate the other Downgrade methods.  I believe that none of them will ever be called with a null, unless something is seriously wrong, but I will investigate to be sure.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Status: Patch Available  (was: Open)

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031816#comment-13031816 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

We saw this issue a little while ago through pig, but I had trouble reproducing it with pig.  I am looking into it now, and wanted to know if you could reproduce it easily and if so how?

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Attachment: MAPREDUCE-2470-v3.patch

Fixed issues with fault injection build.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, MAPREDUCE-2470-v3.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans reassigned MAPREDUCE-2470:
----------------------------------------------

    Assignee: Robert Joseph Evans

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032581#comment-13032581 ] 

Hadoop QA commented on MAPREDUCE-2470:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12478982/MAPREDUCE-2470-v1.patch
  against trunk revision 1102363.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//console

This message is automatically generated.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Aaron Baff (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Baff updated MAPREDUCE-2470:
----------------------------------

    Attachment: counters_null_data.pcap

Wireshark capture of RunningJob.getCounters() call.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037161#comment-13037161 ] 

Chris Douglas commented on MAPREDUCE-2470:
------------------------------------------

*sigh* Yea, it looks like the fault injection is broken. I'll revert it. Not sure why Hudson didn't flag that...

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas reopened MAPREDUCE-2470:
--------------------------------------


Reverted while FI breakage is investigated

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036814#comment-13036814 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

Oh I forgot to add that all of the other downgrade methods in JobClient.java are similar to Job.getStatus, either they return values from the cached JobStatus which should never be null except if an IOException was caught and ignored, so they are not likely to cause an NPE.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-2470:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I committed the revised patch. Thanks, Robert!

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, MAPREDUCE-2470-v3.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039166#comment-13039166 ] 

Hudson commented on MAPREDUCE-2470:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #690 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/690/])
    MAPREDUCE-2470. Fix NPE in RunningJobs::getCounters.
Contributed by Robert Joseph Evans

cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127444
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
* /hadoop/mapreduce/trunk/src/test/system/test/org/apache/hadoop/mapred/TestTaskKilling.java
* /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java


> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, MAPREDUCE-2470-v3.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Aaron Baff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028623#comment-13028623 ] 

Aaron Baff commented on MAPREDUCE-2470:
---------------------------------------

This appears to occur only when the Job has been retired, and the JobInfo data has been removed from HDFS (default dir is /jobtracker/jobInfo I believe). In that case, apparently the getJobCounters() call returns NULL, instead of a Counters object with no Counter's in it. Thus, in the Counters.downgrade() for the old MR API, it doesn't check for NULL, instead just uses the results in a for-each loop.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Status: Open  (was: Patch Available)

Canceling the patch to address the comments to V1.  Will post V2 shortly.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Aaron Baff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036543#comment-13036543 ] 

Aaron Baff commented on MAPREDUCE-2470:
---------------------------------------

I haven't looked, but I could see that potentially being the case with all of the *.downgrade() methods. They might just assume that the new API returns an object no matter what, even if it doesn't contain any data.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034920#comment-13034920 ] 

Hadoop QA commented on MAPREDUCE-2470:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12479464/MAPREDUCE-2470-v2.patch
  against trunk revision 1103993.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//console

This message is automatically generated.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Status: Patch Available  (was: Reopened)

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, MAPREDUCE-2470-v3.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Aaron Baff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032129#comment-13032129 ] 

Aaron Baff commented on MAPREDUCE-2470:
---------------------------------------

NULL is a reasonable return if it isn't able to get the counters. What if the JobTracker has the data cached in memory? Couldn't it be possible to return the values from there instead of the HDFS file? Or would that be a bunch of work to see if it is still in memory?

Just wanted to highlight that this is an NPE at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77), not just needing to get the Javadocs updated. Should be very quick and easy to simply add a if( counters == null ) { return null; } before the for-each loop.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032114#comment-13032114 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

That was the fix we were looking at.  Is making sure that RunningJob is documented properly to indicate what it will return for each API when the job is retired.  I was thinking simply to return a null for the counters, which is what the client API returns for that same situation.  I should be able to have a patch for it fairly quickly.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036537#comment-13036537 ] 

Chris Douglas commented on MAPREDUCE-2470:
------------------------------------------

Is there another NPE in {{JobClient::getJob()}}? It looks like exactly the same issue, but with {{JobStatus}}.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037940#comment-13037940 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:
------------------------------------------------

I am very sorry about that.  I ran the tests after V1, but not V2 of the patch.  I will investigate the failures.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Status: Patch Available  (was: Open)

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037148#comment-13037148 ] 

Todd Lipcon commented on MAPREDUCE-2470:
----------------------------------------

This seems to have broken the MR trunk build:
https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/691/

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Status: Patch Available  (was: Open)

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Attachment: MAPREDUCE-2470-v1.patch

Added Version 1 of the patch.  It simply avoids the NPE by returning null from getCounters after the job has retired, just like the Job class does. 

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Aaron Baff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031835#comment-13031835 ] 

Aaron Baff commented on MAPREDUCE-2470:
---------------------------------------

Yes, it's quite reproducible, at least for me. If a MR Job has been retired for around mapreduce.jobtracker.persist.jobstatus.hours hours (default 1 hour), then it is removed by the JobTracker from mapreduce.jobtracker.persist.jobstatus.dir (default /jobtracker/jobsInfo on HDFS). Once it's removed, the Counters are no longer available when fetching information about the Job. Going one step further, the JobTracker caches the last 1000 (default) MR Job info's in memory, yet even if it's still available in the cache and can be viewed through the Web UI, when you try and fetch the Counters programmatically but has been removed from HDFS by the JT, then you still get NULL returned from the JT, instead of an empty set of Counters. 

So, the real question for me, is this bad behavior by the JobTracker? Or should the Hadoop client library be the one to handle the NULL and either pass NULL along to user-code, or create an empty set of Counters? Whichever it is, it'd be very nice to have it documented in the Javadocs, including the old API as well as the new API.

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>         Attachments: counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-2470:
-------------------------------------------

    Attachment: MAPREDUCE-2470-v2.patch

> Receiving NPE occasionally on RunningJob.getCounters() call
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.21.0
>         Environment: FreeBSD, Java6, Hadoop r0.21.0
>            Reporter: Aaron Baff
>            Assignee: Robert Joseph Evans
>         Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, counters_null_data.pcap
>
>
> This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.
> java.lang.NullPointerException
>             at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
>             at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
>             at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
>             at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
>             at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
>             at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira