You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by deneche abdelhakim <a_...@yahoo.fr> on 2010/04/03 21:21:59 UTC

Re : Question about mahout Describe

Hi,

Just committed a new version of TestForest. If you add "-mr" to the command line it should launch a Hadoop Job to classify the data. This is a basic implementation that can't compute the confusion matrix, so using "-a" has no effect. This implementation is also not tested very well (being a work in progress), so if you want to test it, select a random subset of your test data and classify them using the sequential implementation (without using -mr) then compare the predictions with those of the distributed implementation, the results won't be exactly the same (due the random behavior of the classifier when it encounter ties) but 90% of the predictions should be the same.

let me know what you think of it. I'm working on the confusion matrix, but it should take some time to finish

--- En date de : Ven 26.3.10, Yang Sun <so...@gmail.com> a écrit :

> De: Yang Sun <so...@gmail.com>
> Objet: Question about mahout Describe
> À: mahout-user@lucene.apache.org
> Date: Vendredi 26 mars 2010, 22h16
> I was testing mahout recently. It
> runs great on small testing datasets.
> However, when I try to expand the dataset to a big dataset
> directory, I got
> the following error message:
> 
> [localhost]$ hjar
> examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.df.mapreduce.TestForest -i
> /user/fulltestdata/* -ds rf/
> testdata.info -m rf-testmodel-5-100 -a -o
> rf/fulltestprediction
> 
> Exception in thread "main" java.io.IOException: Cannot open
> filename
> /user/fulltestdata/*
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465)
>         at
> org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
>         at
> org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
>         at
> org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190)
>         at
> org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137)
>         at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at
> java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> My question is: can I use mahout on directories instead of
> single files? and
> how?
> 
> Thanks,
>

Re: Re : Question about mahout Describe

Posted by Yang Sun <so...@gmail.com>.

Thanks deneche, I will test it soon.


On Sat, Apr 3, 2010 at 12:21 PM, deneche abdelhakim <a_...@yahoo.fr>wrote:

> Hi,
>
> Just committed a new version of TestForest. If you add "-mr" to the command
> line it should launch a Hadoop Job to classify the data. This is a basic
> implementation that can't compute the confusion matrix, so using "-a" has no
> effect. This implementation is also not tested very well (being a work in
> progress), so if you want to test it, select a random subset of your test
> data and classify them using the sequential implementation (without using
> -mr) then compare the predictions with those of the distributed
> implementation, the results won't be exactly the same (due the random
> behavior of the classifier when it encounter ties) but 90% of the
> predictions should be the same.
>
> let me know what you think of it. I'm working on the confusion matrix, but
> it should take some time to finish
>
> --- En date de : Ven 26.3.10, Yang Sun <so...@gmail.com> a écrit :
>
> > De: Yang Sun <so...@gmail.com>
> > Objet: Question about mahout Describe
> > À: mahout-user@lucene.apache.org
> > Date: Vendredi 26 mars 2010, 22h16
>  > I was testing mahout recently. It
> > runs great on small testing datasets.
> > However, when I try to expand the dataset to a big dataset
> > directory, I got
> > the following error message:
> >
> > [localhost]$ hjar
> > examples/target/mahout-examples-0.4-SNAPSHOT.job
> > org.apache.mahout.df.mapreduce.TestForest -i
> > /user/fulltestdata/* -ds rf/
> > testdata.info -m rf-testmodel-5-100 -a -o
> > rf/fulltestprediction
> >
> > Exception in thread "main" java.io.IOException: Cannot open
> > filename
> > /user/fulltestdata/*
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465)
> >         at
> > org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372)
> >         at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
> >         at
> > org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
> >         at
> > org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190)
> >         at
> > org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137)
> >         at
> > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >         at
> > org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228)
> >         at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at
> > java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> > org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > My question is: can I use mahout on directories instead of
> > single files? and
> > how?
> >
> > Thanks,
> >
>
>
>
>