You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Svetlomir Kasabov <sk...@smail.inf.fh-brs.de> on 2011/06/02 21:55:26 UTC

Logistic Regression + non-CSV examples

Hello,

I have looked at the donut-example for Logistic Regression and have 
noticed, that a CSV file is used for the training and the test examples. 
My problem is, that my test examples are usually POJOs ( Plain Old Java 
Objects, which have usually nothing to do with the CSV format). In other 
words, I need to simulate this code for probability calculation without 
using CSV:

/LogisticModelParameters lmp = LogisticModelParameters.loadFrom(new 
File(LOGISTIC_MODEL_PATH));
CsvRecordFactory csv = lmp.getCsvRecordFactory();
OnlineLogisticRegression lr = lmp.createRegression();
...
Vector v = new SequentialAccessSparseVector(lmp.getNumFeatures());
int target = csv.processLine(line, v); //here 'line is the entry from 
the CSV file'
double score = lr.classifyScalar(v);
/

Is there a way to do this? Can you please provide some sample code?

Thanks a lot and best regards.

Svetlomir.










Re: Logistic Regression + non-CSV examples

Posted by Ted Dunning <te...@gmail.com>.
Look at the test examples for the classes that inherit from
FeatureValueEncoder.  Also look at AdaptiveLogisticRegression and related
tests.

On Thu, Jun 2, 2011 at 12:55 PM, Svetlomir Kasabov <
skasab2s@smail.inf.fh-brs.de> wrote:

> Hello,
>
> I have looked at the donut-example for Logistic Regression and have
> noticed, that a CSV file is used for the training and the test examples. My
> problem is, that my test examples are usually POJOs ( Plain Old Java
> Objects, which have usually nothing to do with the CSV format). In other
> words, I need to simulate this code for probability calculation without
> using CSV:
>
> /LogisticModelParameters lmp = LogisticModelParameters.loadFrom(new
> File(LOGISTIC_MODEL_PATH));
> CsvRecordFactory csv = lmp.getCsvRecordFactory();
> OnlineLogisticRegression lr = lmp.createRegression();
> ...
> Vector v = new SequentialAccessSparseVector(lmp.getNumFeatures());
> int target = csv.processLine(line, v); //here 'line is the entry from the
> CSV file'
> double score = lr.classifyScalar(v);
> /
>
> Is there a way to do this? Can you please provide some sample code?
>
> Thanks a lot and best regards.
>
> Svetlomir.
>
>
>
>
>
>
>
>
>
>

Re: Logistic Regression + non-CSV examples

Posted by Hector Yee <he...@gmail.com>.
You can look at the unit test to see how it does it and create a Matrix
object directly for training.

On Thu, Jun 2, 2011 at 12:55 PM, Svetlomir Kasabov <
skasab2s@smail.inf.fh-brs.de> wrote:

> Hello,
>
> I have looked at the donut-example for Logistic Regression and have
> noticed, that a CSV file is used for the training and the test examples. My
> problem is, that my test examples are usually POJOs ( Plain Old Java
> Objects, which have usually nothing to do with the CSV format). In other
> words, I need to simulate this code for probability calculation without
> using CSV:
>
> /LogisticModelParameters lmp = LogisticModelParameters.loadFrom(new
> File(LOGISTIC_MODEL_PATH));
> CsvRecordFactory csv = lmp.getCsvRecordFactory();
> OnlineLogisticRegression lr = lmp.createRegression();
> ...
> Vector v = new SequentialAccessSparseVector(lmp.getNumFeatures());
> int target = csv.processLine(line, v); //here 'line is the entry from the
> CSV file'
> double score = lr.classifyScalar(v);
> /
>
> Is there a way to do this? Can you please provide some sample code?
>
> Thanks a lot and best regards.
>
> Svetlomir.
>
>
>
>
>
>
>
>
>
>


-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

Re : Reg Randomn forest

Posted by deneche abdelhakim <a_...@yahoo.fr>.
The Hadoop 0.21 patch introduces a lot of changes that make the Random Forest code crash. I fixed the _SUCCESS file problem easily but was faced with another exception that is not that easy to fix.


________________________________
De : praneet mhatre <pr...@gmail.com>
À : user@mahout.apache.org
Envoyé le : Vendredi 3 Juin 2011 18h04
Objet : Re: Reg Randomn forest

Hi,

Even I faced the exact same problem and had a long exchange of emails with
the Mahout folks regarding this. I'll link you to the mail archive to save
them the trouble of going thru it all again:
http://search.lucidimagination.com/search/document/ecbfb35f9e05706b/partial_implementation_of_random_forest#98cc8b90d38c0423.
In a nutshell, CDH3 uses some patches from Hadoop 0.21 which a create
a
_SUCCEED file in the output path and the current code does not know how to
deal with that file. I switched to an earlier version of Hadoop and
everything worked perfectly.

I don't know if this issue has been fixed yet. One of the developers could
throw some light on that.

Thanks,

On Fri, Jun 3, 2011 at 4:15 AM, <ex...@nokia.com> wrote:

> Hi,
>
> I tried to run Randomn forest for KDD data in the Hadoop cluster(CDH
> version 3) and ended up with the following error during build forest:-
>
> Exception in thread "main" java.lang.IllegalStateException:
> java.io.EOFException
>        at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
>        at
> org.apache.mahout.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:173)
>        at
> org.apache.mahout.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:121)
>        at org.apache.mahout.df.mapreduce.Builder.build(Builder.java:324)
>        at
> org.apache.mahout.df.mapreduce.BuildForest.buildForest(BuildForest.java:195)
>        at
> org.apache.mahout.df.mapreduce.BuildForest.run(BuildForest.java:159)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at
> org.apache.mahout.df.mapreduce.BuildForest.main(BuildForest.java:239)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>        at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
>        at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
>        ... 12 more
>
> Any help in resolving the above issue is greatly appreciated.
>
> Thanks and Regards,
> Ranjit.C
>



-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine

Re: Reg Randomn forest

Posted by praneet mhatre <pr...@gmail.com>.
Hi,

Even I faced the exact same problem and had a long exchange of emails with
the Mahout folks regarding this. I'll link you to the mail archive to save
them the trouble of going thru it all again:
http://search.lucidimagination.com/search/document/ecbfb35f9e05706b/partial_implementation_of_random_forest#98cc8b90d38c0423.
In a nutshell, CDH3 uses some patches from Hadoop 0.21 which a create
a
_SUCCEED file in the output path and the current code does not know how to
deal with that file. I switched to an earlier version of Hadoop and
everything worked perfectly.

I don't know if this issue has been fixed yet. One of the developers could
throw some light on that.

Thanks,

On Fri, Jun 3, 2011 at 4:15 AM, <ex...@nokia.com> wrote:

> Hi,
>
> I tried to run Randomn forest for KDD data in the Hadoop cluster(CDH
> version 3) and ended up with the following error during build forest:-
>
> Exception in thread "main" java.lang.IllegalStateException:
> java.io.EOFException
>        at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
>        at
> org.apache.mahout.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:173)
>        at
> org.apache.mahout.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:121)
>        at org.apache.mahout.df.mapreduce.Builder.build(Builder.java:324)
>        at
> org.apache.mahout.df.mapreduce.BuildForest.buildForest(BuildForest.java:195)
>        at
> org.apache.mahout.df.mapreduce.BuildForest.run(BuildForest.java:159)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at
> org.apache.mahout.df.mapreduce.BuildForest.main(BuildForest.java:239)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>        at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
>        at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
>        ... 12 more
>
> Any help in resolving the above issue is greatly appreciated.
>
> Thanks and Regards,
> Ranjit.C
>



-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine

Reg Randomn forest

Posted by ex...@nokia.com.
Hi,

I tried to run Randomn forest for KDD data in the Hadoop cluster(CDH version 3) and ended up with the following error during build forest:-

Exception in thread "main" java.lang.IllegalStateException: java.io.EOFException
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
        at org.apache.mahout.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:173)
        at org.apache.mahout.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:121)
        at org.apache.mahout.df.mapreduce.Builder.build(Builder.java:324)
        at org.apache.mahout.df.mapreduce.BuildForest.buildForest(BuildForest.java:195)
        at org.apache.mahout.df.mapreduce.BuildForest.run(BuildForest.java:159)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.mahout.df.mapreduce.BuildForest.main(BuildForest.java:239)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
        ... 12 more

Any help in resolving the above issue is greatly appreciated.

Thanks and Regards,
Ranjit.C