You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Alexis (JIRA)" <ji...@apache.org> on 2010/12/23 04:21:02 UTC
[jira] Created: (MAPREDUCE-2229) Initialize reader in Sort example
Initialize reader in Sort example
---------------------------------
Key: MAPREDUCE-2229
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2229
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: examples
Affects Versions: 0.21.0
Reporter: Alexis
As described in paragraph "Total Sort" in HTDG book, page 223, I tried to create a Hadoop job to sort globally some input, using InputSampler with TotalOrderPartitioner.
Please run the mapreduce Sort example with the following arguments to reproduce the exception.
{noformat}
org.apache.hadoop.examples.Sort
-r 2
-outKey org.apache.hadoop.io.Text
-outValue org.apache.hadoop.io.Text
-inFormat org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat
-outFormat org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
-totalOrder 0.1 10000 10
test/sortInput
test/sortOutput
{noformat}
The issue is already described there:
- http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201011.mbox/%3CDB1B07B75C01FB40B814678DEE6E0085175C86CDFF@bdc.taomee-ex.com%3E
- http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg01372.html
This is a somewhat related comment:
http://www.mail-archive.com/common-user@hadoop.apache.org/msg03947.html
We need to initialize the reader to avoid the NPE occuring when generating the partition file:
{noformat}
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149)
at org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:91)
at org.apache.hadoop.mapreduce.lib.partition.InputSampler$RandomSampler.getSample(InputSampler.java:220)
at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:315)
at org.apache.hadoop.examples.Sort.run(Sort.java:166)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at org.apache.hadoop.examples.Sort.main(Sort.java:192)
{noformat}
Right now, this initialization only happens in runNewMapper in org.apache.hadoop.mapred.MapTask, but the sampling is performed before the job started. TeraInputFormat class for the TeraSort has its own writePartitionFile method. This is the javadoc comment of createRecordReader method in InputFormat class:
{noformat}
* Create a record reader for a given split. The framework will call
* {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before
* the split is used.
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-2229) Initialize reader in Sort example
Posted by "Alexis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexis resolved MAPREDUCE-2229.
-------------------------------
Resolution: Duplicate
Fix Version/s: 0.22.0
> Initialize reader in Sort example
> ---------------------------------
>
> Key: MAPREDUCE-2229
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2229
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: examples
> Affects Versions: 0.21.0
> Reporter: Alexis
> Fix For: 0.22.0
>
>
> As described in paragraph "Total Sort" in HTDG book, page 223, I tried to create a Hadoop job to sort globally some input, using InputSampler with TotalOrderPartitioner.
> Please run the mapreduce Sort example with the following arguments to reproduce the exception.
> {noformat}
> org.apache.hadoop.examples.Sort
> -r 2
> -outKey org.apache.hadoop.io.Text
> -outValue org.apache.hadoop.io.Text
> -inFormat org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat
> -outFormat org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
> -totalOrder 0.1 10000 10
> test/sortInput
> test/sortOutput
> {noformat}
> The issue is already described there:
> - http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201011.mbox/%3CDB1B07B75C01FB40B814678DEE6E0085175C86CDFF@bdc.taomee-ex.com%3E
> - http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg01372.html
> This is a somewhat related comment:
> http://www.mail-archive.com/common-user@hadoop.apache.org/msg03947.html
> We need to initialize the reader to avoid the NPE occuring when generating the partition file:
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149)
> at org.apache.hadoop.mapreduce.lib.input.KeyValueLineRecordReader.nextKeyValue(KeyValueLineRecordReader.java:91)
> at org.apache.hadoop.mapreduce.lib.partition.InputSampler$RandomSampler.getSample(InputSampler.java:220)
> at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:315)
> at org.apache.hadoop.examples.Sort.run(Sort.java:166)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> at org.apache.hadoop.examples.Sort.main(Sort.java:192)
> {noformat}
> Right now, this initialization only happens in runNewMapper in org.apache.hadoop.mapred.MapTask, but the sampling is performed before the job started. TeraInputFormat class for the TeraSort has its own writePartitionFile method. This is the javadoc comment of createRecordReader method in InputFormat class:
> {noformat}
> * Create a record reader for a given split. The framework will call
> * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before
> * the split is used.
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira