You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2013/12/11 09:21:07 UTC

[jira] [Comment Edited] (MAHOUT-1376) when mahout train data, there is Task Id : attempt_201312031842_0751_m_000000_0, Status : FAILED java.lang.IllegalArgumentException

    [ https://issues.apache.org/jira/browse/MAHOUT-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845181#comment-13845181 ] 

Suneel Marthi edited comment on MAHOUT-1376 at 12/11/13 8:19 AM:
-----------------------------------------------------------------

This issue has been reported several times before by various users and has been around since Mahout 0.7.

This happens when running 'trainnb' in MR mode only. The issue is that the call to createLabelIndex() in TrainNaiveBayesJob.java returns a value of 0 when running in MR mode.

Don't see the issue happen while running with MAHOUT_LOCAL=true. 

I can look at this tomorrow, meanwhile if someone's got a patch feel free to submit one.


was (Author: smarthi):
This issue has been reported several times before by various users and has been around since Mahout 0.7.

This happens when running 'trainnb' in MR mode only. The issue seems to be that the call to createLabelIndex() in TrainNaiveBayesJob.java returns a value of 0 when running in MR mode.

Don't see the issue happen while running with MAHOUT_LOCAL=true. 

I can look at this tomorrow, meanwhile if someone's got a patch feel free to submit one.

> when mahout train data, there is Task Id : attempt_201312031842_0751_m_000000_0, Status : FAILED java.lang.IllegalArgumentException
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1376
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1376
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.8
>         Environment: Hadoop 1.0.3,mahout 0.8
>            Reporter: wangqiaoshi
>              Labels: mahout,classification,trainnb
>             Fix For: 0.8
>
>
> vm001:/usr/local/hadoop/mahout-distribution-0.8 # ./bin/mahout trainnb -i /tmp/mahout-work-root/20news-train-vectors -el -o /tmp/mahout-work-root/model -li /tmp/mahout-work-root/labelindex -ow -c
> Running on hadoop, using /usr/local/hadoop/hadoop-0.20.2/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB: /usr/local/hadoop/mahout-distribution-0.8/mahout-examples-0.8-job.jar
> 13/12/10 10:29:56 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
> 13/12/10 10:29:56 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/tmp/mahout-work-root/20news-train-vectors], --labelIndex=[/tmp/mahout-work-root/labelindex], --output=[/tmp/mahout-work-root/model], --overwrite=null, --startPhase=[0], --tempDir=[temp], --trainComplementary=null}
> 13/12/10 10:29:56 INFO common.HadoopUtil: Deleting temp
> 13/12/10 10:29:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 13/12/10 10:29:57 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
> 13/12/10 10:29:57 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/10 10:30:00 INFO input.FileInputFormat: Total input paths to process : 1
> 13/12/10 10:30:01 INFO mapred.JobClient: Running job: job_201312031842_0750
> 13/12/10 10:30:02 INFO mapred.JobClient:  map 0% reduce 0%
> 13/12/10 10:30:18 INFO mapred.JobClient:  map 100% reduce 0%
> 13/12/10 10:30:30 INFO mapred.JobClient:  map 100% reduce 100%
> 13/12/10 10:30:35 INFO mapred.JobClient: Job complete: job_201312031842_0750
> 13/12/10 10:30:35 INFO mapred.JobClient: Counters: 29
> 13/12/10 10:30:35 INFO mapred.JobClient:   Job Counters 
> 13/12/10 10:30:35 INFO mapred.JobClient:     Launched reduce tasks=1
> 13/12/10 10:30:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12445
> 13/12/10 10:30:35 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Rack-local map tasks=1
> 13/12/10 10:30:35 INFO mapred.JobClient:     Launched map tasks=1
> 13/12/10 10:30:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10355
> 13/12/10 10:30:35 INFO mapred.JobClient:   File Output Format Counters 
> 13/12/10 10:30:35 INFO mapred.JobClient:     Bytes Written=97
> 13/12/10 10:30:35 INFO mapred.JobClient:   FileSystemCounters
> 13/12/10 10:30:35 INFO mapred.JobClient:     FILE_BYTES_READ=119
> 13/12/10 10:30:35 INFO mapred.JobClient:     HDFS_BYTES_READ=270
> 13/12/10 10:30:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=45827
> 13/12/10 10:30:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=97
> 13/12/10 10:30:35 INFO mapred.JobClient:   File Input Format Counters 
> 13/12/10 10:30:35 INFO mapred.JobClient:     Bytes Read=133
> 13/12/10 10:30:35 INFO mapred.JobClient:   Map-Reduce Framework
> 13/12/10 10:30:35 INFO mapred.JobClient:     Map output materialized bytes=14
> 13/12/10 10:30:35 INFO mapred.JobClient:     Map input records=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Spilled Records=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Map output bytes=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     CPU time spent (ms)=2080
> 13/12/10 10:30:35 INFO mapred.JobClient:     Total committed heap usage (bytes)=1016594432
> 13/12/10 10:30:35 INFO mapred.JobClient:     Combine input records=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=137
> 13/12/10 10:30:35 INFO mapred.JobClient:     Reduce input records=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Reduce input groups=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Combine output records=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Physical memory (bytes) snapshot=313008128
> 13/12/10 10:30:35 INFO mapred.JobClient:     Reduce output records=0
> 13/12/10 10:30:35 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2980098048
> 13/12/10 10:30:35 INFO mapred.JobClient:     Map output records=0
> 13/12/10 10:30:38 INFO input.FileInputFormat: Total input paths to process : 1
> 13/12/10 10:30:38 INFO mapred.JobClient: Running job: job_201312031842_0751
> 13/12/10 10:30:39 INFO mapred.JobClient:  map 0% reduce 0%
> 13/12/10 10:30:55 INFO mapred.JobClient: Task Id : attempt_201312031842_0751_m_000000_0, Status : FAILED
> java.lang.IllegalArgumentException
>         at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
>         at org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 13/12/10 10:31:04 INFO mapred.JobClient: Task Id : attempt_201312031842_0751_m_000000_1, Status : FAILED
> java.lang.IllegalArgumentException
>         at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
>         at org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 13/12/10 10:31:13 INFO mapred.JobClient: Task Id : attempt_201312031842_0751_m_000000_2, Status : FAILED
> java.lang.IllegalArgumentException
>         at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
>         at org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 13/12/10 10:31:28 INFO mapred.JobClient: Job complete: job_201312031842_0751
> 13/12/10 10:31:28 INFO mapred.JobClient: Counters: 8
> 13/12/10 10:31:28 INFO mapred.JobClient:   Job Counters 
> 13/12/10 10:31:28 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=26279
> 13/12/10 10:31:28 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
> 13/12/10 10:31:28 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
> 13/12/10 10:31:28 INFO mapred.JobClient:     Rack-local map tasks=3
> 13/12/10 10:31:28 INFO mapred.JobClient:     Launched map tasks=4
> 13/12/10 10:31:28 INFO mapred.JobClient:     Data-local map tasks=1
> 13/12/10 10:31:28 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 13/12/10 10:31:28 INFO mapred.JobClient:     Failed map tasks=1
> 13/12/10 10:31:28 INFO driver.MahoutDriver: Program took 92707 ms (Minutes: 1.5451166666666667)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)