You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Berttenfall M. (Created) (JIRA)" <ji...@apache.org> on 2011/12/18 14:52:30 UTC

[jira] [Created] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

RandomForest quits with ArrayIndexOutOfBoundsException while running sample
---------------------------------------------------------------------------

                 Key: MAHOUT-932
                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
             Project: Mahout
          Issue Type: Bug
          Components: Classification
    Affects Versions: 0.6
         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011

Dual Core MacBook Pro 2009, 8 Gb, SSD
            Reporter: Berttenfall M.
            Priority: Minor


Hello,

when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.

The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
Running:
$MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.

I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).

It quits after a while (map is almost done) with the following message:
11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Ikumasa Mukai (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173992#comment-13173992 ] 

Ikumasa Mukai edited comment on MAHOUT-932 at 12/21/11 10:34 AM:
-----------------------------------------------------------------

I agree with your easiest solution, the meaning of "-t" option is changed to how many trees should be made per mapper.
Becuase this problem can be vanished if we can get the collect num of mappers from "mapred.map.tasks".
I think this will be or is fixed on Hadoop f/w. :)
                
      was (Author: ikumasa mukai):
    I agree with your easiest solution, "-t" option means how many trees should be made per mapper.
Becuase this problem can be vanished if we can get the collect num of mappers from "mapred.map.tasks".
I think this will be or is fixed on Hadoop f/w. :)
                  
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Berttenfall M. (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172186#comment-13172186 ] 

Berttenfall M. commented on MAHOUT-932:
---------------------------------------

I just tested it with Hadoop 0.20.2 (been using latest stable [0.20.203] before). It's still crashing during the BuildForest step.
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Berttenfall M. (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172174#comment-13172174 ] 

Berttenfall M. commented on MAHOUT-932:
---------------------------------------

I tried with manually edited ARFF files (removed all @* lines), but there's still the ArrayIndexOutOfBoundsException during the BuildForest step.

Could anyone check that with the current checkout?
I haven't modified any sources and I haven't seen any other Java problems on my mac.
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Berttenfall M. (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171892#comment-13171892 ] 

Berttenfall M. commented on MAHOUT-932:
---------------------------------------

I am using the latest trunk. And as I wrote, I've already been using the UCI file format.

Btw: I think it is a bug, when the program takes ARFF files (and the samples advise that)  but cannot process them.
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Berttenfall M. (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172620#comment-13172620 ] 

Berttenfall M. commented on MAHOUT-932:
---------------------------------------

Yeah, right.
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Berttenfall M. (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171893#comment-13171893 ] 

Berttenfall M. commented on MAHOUT-932:
---------------------------------------

To clear that up: using ARFF files does not work at any step I tested (neither Descripe nor BuildForest).

So I converted the files to the UCI format (which is basically a headerfree CSV file) and the classifier appears to like this format. As long as the array is in the bounds. :-/
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Ikumasa Mukai (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173992#comment-13173992 ] 

Ikumasa Mukai commented on MAHOUT-932:
--------------------------------------

I agree with your easiest solution, "-t" option means how many trees should be made per mapper.
Becuase this problem can be vanished if we can get the collect num of mappers from "mapred.map.tasks".
I think this will be or is fixed on Hadoop f/w. :)
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Deneche A. Hakim (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173359#comment-13173359 ] 

Deneche A. Hakim commented on MAHOUT-932:
-----------------------------------------

When this code was first implemented, to find how many mappers were launched one had to check "mapred.map.tasks". But what I soon found is that this value was not set correctly when Hadoop is in standalone mode, so I added a quick hack to detect standalone mode and update "mapred.map.tasks" myself, but this later changed in a newer version of Hadoop. I don't know if there is an easy and reliable way to know how much mappers were launched so each mapper can compute how many trees he must build.
The easiest solution when the user sets -t to 100 is to pass 100 to each mapper so it's the user responsibility to "know" how many mappers will be launched. Or maybe leave the current implementation and make it possible to grow more trees than expected when the number of mappers happens to be different. What do you think is better ?
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Berttenfall M. (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173340#comment-13173340 ] 

Berttenfall M. commented on MAHOUT-932:
---------------------------------------

So when the array is set to the size of 100, it is not directly based on the mapred.map.tasks var?
Running it with mapred.map.tasks=1 might not be really making sense at all for standalone test mode, but still, the array size should be based on it, when the calculation (mapping) is done using mapred.map.tasks.

Or what do I miss here?
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Deneche A. Hakim (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172195#comment-13172195 ] 

Deneche A. Hakim commented on MAHOUT-932:
-----------------------------------------

Ok, I must admit I didn't try the examples in standalone mode for some time now. I will try them again and see if I can reproduce the problem.
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Ikumasa Mukai (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172839#comment-13172839 ] 

Ikumasa Mukai edited comment on MAHOUT-932 at 12/20/11 2:01 AM:
----------------------------------------------------------------

Maybe this problem is caused with this process.
{noformat}
1)The number of trees for our forest is 100 (-t 100)
2)Hadoop makes 10 mappers
  This value is calced like 18742306/1874231=10 (the size of data / -Dmapred.max.split.size)
3)Each mapper calcs how many trees should be made. This num shuld be 10 trees(/mapper).
4)For this calculation, the value of mapred.map.tasks which is provided by Hadooop is used.
  With v0.20.2, mapred.map.tasks is 10. 100/10 = 10 trees will be made(/mapper).   <OK>
  With v0.20.204, mapred.map.tasks is 1. 100/1 = 100 trees will be made(/mapper).  <NG>
5)After making the trees, we gather the trees from all mappers to make the forest. 
  On this phase we have an exception with v0.20.204.
6)This is because the total num of trees we gather is 1000! (100 trees x 10 mappers)
  But the size of the prepared array for gathering (TreeID) is 100.
{noformat}
To fix this, we can change the param for calculating on "4)", but this isn't good way I think..
What do you think?

Regards,
                
      was (Author: ikumasa mukai):
    Maybe this problem is caused with this process.
{noformat}
1)The number of trees for our forest is 100 (-t 100)
2)Hadoop makes 10 mappers
  This value is calced like 18742306/1874231=10 (the size of data / -Dmapred.max.split.size)
3)Each mapper calcs how many trees should be made. This num shuld be 10 trees(/mapper).
4)For this calculation, the value of mapred.map.tasks which is provided by Hadooop is used.
  With v0.20.2, mapred.map.tasks is 10. 100/10 = 10 trees are made(/mapper).
  With v0.20.204, mapred.map.tasks is 1. 100/1 = 100 trees are made(/mapper).
5)After making the trees, we gather the trees from all mappers to make the forest. 
  On this phase we have an exception with v0.20.204.
6)This is because the total num of trees we gather is 1000! (100 trees x 10 mappers)
  But the size of the prepared array for gathering (TreeID) is 100.
{noformat}
To fix this, we can change the param for calculating on "4)", but this isn't good way I think..
What do you think?

Regards,
                  
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Deneche A. Hakim (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172189#comment-13172189 ] 

Deneche A. Hakim commented on MAHOUT-932:
-----------------------------------------

Are you using Hadoop in standalone or pseudo-distributed mode ?
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Berttenfall M. (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172192#comment-13172192 ] 

Berttenfall M. commented on MAHOUT-932:
---------------------------------------

Default (stand alone) mode.
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Deneche A. Hakim (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171901#comment-13171901 ] 

Deneche A. Hakim commented on MAHOUT-932:
-----------------------------------------

ARFF isn't supported, you need to remove the header from the file first. It's not really a big deal to make it work with the sequential classifier but it's much more complicated when it comes to the distributed one.

It's strange that BuildForest throws an exception, I tried that same example a week ago and it run well. What file are you using to train the classifier ? KDDTrain+.ARFF ?

Two more questions: what version of Hadoop are you using ? and how is it installed: standalone, pseudo-distributed or fully distributed ?
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Berttenfall M. (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172200#comment-13172200 ] 

Berttenfall M. commented on MAHOUT-932:
---------------------------------------

Ohh no! I am sorry, I forgot to refresh my .profile.

I was able to run the BuildForest step now with Hadoop 0.20.2.
The current stable release of Hadoop does not seem to work.

So I guess it is still a bug?


Sorry for mixup with my config. :-/
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Deneche A. Hakim (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172204#comment-13172204 ] 

Deneche A. Hakim commented on MAHOUT-932:
-----------------------------------------

so it didn't work with Hadoop 0.20.203 in standalone mode, right ?
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Deneche A. Hakim (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171889#comment-13171889 ] 

Deneche A. Hakim commented on MAHOUT-932:
-----------------------------------------

First, make sure you are using the latest trunk, they were many bug fixes since 0.5.
Then make sure to load the files that end with .arff and remove all the lines that start with @. The .txt files contain a supplemental field that makes the descriptor throw an exception (wrong number of attributes).
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Bernhard Lehner (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187691#comment-13187691 ] 

Bernhard Lehner commented on MAHOUT-932:
----------------------------------------

Hello,
I only started recently with MAHOUT, but hopefully I can contribute to this topic:
I have the same Problem with the ArrayIndexOutOfBoundsException, but the exact behaviour differs in detail from Version to Version:
+++++++++++++++++++++++++++++++++++++++++
MAHOUT-0.5+HADOOP-0.20.2(running local)

The beforementioned KDDtrain(100%) dataset from the partial implementation example can be used for training without any problems whatsoever. BUT using a larger dataset (~10 times, aprox. 180MB) leads to the Exception (index equals the -t parameter) - regardless of the value of the -Dmapred.max.split.size parameter.
To avoid partitioning I even set the parameter to more than the actual input data size, nevertheless the PartialBuilder gets involved and leads to the Exception. Only by not passing the parameter at all I was able to train the RF.
But this led to another interesting result: When I compared the performance of the resulting RF-model with a model trained from WEKA, I noticed a ~2% drop of accuracy on every single one of my ten different train- and testsets. Using a bit smaller datasets (aprox. 150MB for training) didn't lead to such different behaviour in MAHOUT and WEKA.
It seems that partitioning is responsible for the 2% drop in accuracy, as it is explicitly mentioned on

https://cwiki.apache.org/MAHOUT/partial-implementation.html

"...
IMPORTANT: using less partitions should give better classification results, but needs a lot of memory. So if the Jobs are failing, try increasing the number of partitions.
..."

The weird thing is, I certainly have the amount of memory available (8GB) to train the RF, but I didn't find a way to prevent partitioning...


+++++++++++++++++++++++++++++++++++++++++
MAHOUT-0.6-trunk+HADOOP-0.20.204 | HADOOP-1.0.0(running local)

Now I am able to reproduce the same results as WEKA by not using the -p option, hence invoking the InMem Mapred implementation (i.e. no more 2% drop in accuracy). But, of course, it takes some time to finish training...
However, I still haven't found a combination of -p option and -Dmapred.max.split.size parameter that doesn't lead to the Exception thrown by PartialBuilder with large datasets.

I agree with Ikumasa-san that there is some discrepancy regarding the expected amount of subtrees. What I'm still not sure about is:
- Is this a problem with MAHOUT or HADOOP?
- Why does the -Dmapred.max.split.size parameter doesn't have any impact on the behaviour - even when it is set so high that it should not come to any partitioning at all?

This behaviour should be easy to reproduce by concatenating the content of the KDDtrain-file over and over. To speed up the whole process I would advise using a small number of trees and attributes (I used t 5 -sl 10),
since it has no effect on the throwing of the exception.


Regarding the -t parameter:
>From the user point of view I think it would be best to just give the total amount of trees as parameter - regardless of the amount of mappers involved. Given that HADOOP delivers the amount of invoked mappers, i would definitely vote for this solution





Best regards,


 



                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Ikumasa Mukai (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172839#comment-13172839 ] 

Ikumasa Mukai commented on MAHOUT-932:
--------------------------------------

Maybe this problem is caused with this process.

1)The number of trees for our forest is 100 (-t 100)
2)Hadoop makes 10 mappers (this value is calced 18742306/1874231=10 (-Dmapred.max.split.size=1874231 and the size of data=18742306 ))
3)Each mapper calcs how many trees should be made. This num shuld be 10 trees(/mapper).
4)For this calculation, the value of mapred.map.tasks which is provided by Hadooop is used.
  With v0.20.2, mapred.map.tasks is 10. 100/10 = 10 trees are made(/mapper).
  With v0.20.204, mapred.map.tasks is 1. 100/1 = 100 trees are made(/mapper).
5)After making the trees, we gather the trees from all mappers to make the forest. 
  On this phase we have an exception with v0.20.204.
6)This is because the total num of trees we gather is 1000! (100 trees x 10 mappers)
  But the size of the prepared array for gathering (TreeID) is 100.

To fix this, we can change the param for calculating on "4)", but this isn't good way I think..
What do you think?

Regards,
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Ikumasa Mukai (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172839#comment-13172839 ] 

Ikumasa Mukai edited comment on MAHOUT-932 at 12/20/11 1:57 AM:
----------------------------------------------------------------

Maybe this problem is caused with this process.
{noformat}
1)The number of trees for our forest is 100 (-t 100)
2)Hadoop makes 10 mappers
  This value is calced like 18742306/1874231=10 (the size of data / -Dmapred.max.split.size)
3)Each mapper calcs how many trees should be made. This num shuld be 10 trees(/mapper).
4)For this calculation, the value of mapred.map.tasks which is provided by Hadooop is used.
  With v0.20.2, mapred.map.tasks is 10. 100/10 = 10 trees are made(/mapper).
  With v0.20.204, mapred.map.tasks is 1. 100/1 = 100 trees are made(/mapper).
5)After making the trees, we gather the trees from all mappers to make the forest. 
  On this phase we have an exception with v0.20.204.
6)This is because the total num of trees we gather is 1000! (100 trees x 10 mappers)
  But the size of the prepared array for gathering (TreeID) is 100.
{noformat}
To fix this, we can change the param for calculating on "4)", but this isn't good way I think..
What do you think?

Regards,
                
      was (Author: ikumasa mukai):
    Maybe this problem is caused with this process.

1)The number of trees for our forest is 100 (-t 100)
2)Hadoop makes 10 mappers (this value is calced 18742306/1874231=10 (-Dmapred.max.split.size=1874231 and the size of data=18742306 ))
3)Each mapper calcs how many trees should be made. This num shuld be 10 trees(/mapper).
4)For this calculation, the value of mapred.map.tasks which is provided by Hadooop is used.
  With v0.20.2, mapred.map.tasks is 10. 100/10 = 10 trees are made(/mapper).
  With v0.20.204, mapred.map.tasks is 1. 100/1 = 100 trees are made(/mapper).
5)After making the trees, we gather the trees from all mappers to make the forest. 
  On this phase we have an exception with v0.20.204.
6)This is because the total num of trees we gather is 1000! (100 trees x 10 mappers)
  But the size of the prepared array for gathering (TreeID) is 100.

To fix this, we can change the param for calculating on "4)", but this isn't good way I think..
What do you think?

Regards,
                  
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Posted by "Ikumasa Mukai (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172178#comment-13172178 ] 

Ikumasa Mukai commented on MAHOUT-932:
--------------------------------------

Hi Berttenfall-san.
I had the same problem and could solve this using old Hadoop(0.20.2).

Do you use the new Hadoop(maybe v0.20.204 or 205)?

Maybe this is caused by the behavior that the new Hadoop makes the value of mapred.map.tasks to 1 implicitly when we use hadoop in localmode.

Regards,

                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira