You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Liyin Liang (Created) (JIRA)" <ji...@apache.org> on 2012/01/05 13:25:39 UTC
[jira] [Created] (MAPREDUCE-3619) Change streaming code to use new
mapreduce api.
Change streaming code to use new mapreduce api.
-----------------------------------------------
Key: MAPREDUCE-3619
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3619
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/streaming
Affects Versions: 0.23.1
Reporter: Liyin Liang
If we run a streaming job with following python script as mapper or reducer, the job will throws NullPointerException.
{code:}
#!/usr/bin/python
import sys,os
class MyTask:
def __init__(self, file=sys.stdin):
self.file = file
print >>sys.stderr, "reporter:counter:spam,disp_flag_record,0"
print >>sys.stderr, "reporter:counter:spam,spam_record,0"
def process(self):
while True:
line = self.file.readline()
if not line:
break;
print line
if __name__ == "__main__":
task = MyTask()
task.process()
{code}
Here is the NPE related log:
2011-12-22 14:14:06,310 WARN org.apache.hadoop.streaming.PipeMapRed: java.lang.NullPointerException
at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.incrCounter(PipeMapRed.java:502)
at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:444)
This is because the above script's "print >>sys.stderr" will invoke reporter.incrCounter() during PipeMapper|PipeReducer.configure(). While we can not get reporter in configure() function.
To fix this problem, we should change streaming code to use new-api. Then we can call context.getCounter() in Mapper|Reducer.setup() function.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3619) Change streaming code to use new
mapreduce api.
Posted by "Liyin Liang (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liyin Liang resolved MAPREDUCE-3619.
------------------------------------
Resolution: Duplicate
> Change streaming code to use new mapreduce api.
> -----------------------------------------------
>
> Key: MAPREDUCE-3619
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3619
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/streaming, mrv2
> Affects Versions: 0.23.1
> Reporter: Liyin Liang
>
> If we run a streaming job with following python script as mapper or reducer, the job will throws NullPointerException.
> {code:}
> #!/usr/bin/python
> import sys,os
> class MyTask:
> def __init__(self, file=sys.stdin):
> self.file = file
> print >>sys.stderr, "reporter:counter:spam,disp_flag_record,0"
> print >>sys.stderr, "reporter:counter:spam,spam_record,0"
> def process(self):
> while True:
> line = self.file.readline()
> if not line:
> break;
> print line
> if __name__ == "__main__":
> task = MyTask()
> task.process()
> {code}
> Here is the NPE related log:
> 2011-12-22 14:14:06,310 WARN org.apache.hadoop.streaming.PipeMapRed: java.lang.NullPointerException
> at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.incrCounter(PipeMapRed.java:502)
> at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:444)
> This is because the above script's "print >>sys.stderr" will invoke reporter.incrCounter() during PipeMapper|PipeReducer.configure(). While we can not get reporter in configure() function.
> To fix this problem, we should change streaming code to use new-api. Then we can call context.getCounter() in Mapper|Reducer.setup() function.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira