You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Rajesh Nikam <ra...@gmail.com> on 2012/10/16 15:46:45 UTC

WEKA logistic regression on hadoop

Hi,

I was looking for logistic regression algorithms on hadoop.
mahout is one good package to use on hadoop, however I am not able to get
could results with my experiments.

There are logistic regression algorithms supported with WEKA which I have
used on Windows.
I guess I should be able to run these algos from JAR files as is on linux.

java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M 6
-t lr.arff

Have anyone ported them to take advantage of hadoop ?

How to interpret the output generated from it like what is Coefficients and
Odds Ratios that could be used for classification ?


Options: -R 1.0E-8 -M 6

Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
                 Class
Variable       class_1
======================
a1                   0
a2                   0
a3                   0
a4              0.0082
a5              0.0151
a6             -0.1034
a7                   0
a8                   0
a9                   0
a10            -0.0397
a11            -0.0003
a13            -0.1195
a14            -0.1389
Intercept      -21.487


Odds Ratios...
                 Class
Variable       class_1
======================
a1                   1
a2                   1
a3                   1
a4              1.0083
a5              1.0152
a6              0.9018
a7                   1
a8                   1
a9                   1
a10              0.961
a11             0.9997
a13             0.8873
a14             0.8703

Time taken to build model: 6.39 seconds
Time taken to test model on training data: 1.86 seconds

=== Error on training data ===

Correctly Classified Instances       49528               99.9173 %
Incorrectly Classified Instances        41                0.0827 %
Kappa statistic                          0.9983
Mean absolute error                      0.0011
Root mean squared error                  0.0244
Relative absolute error                  0.2202 %
Root relative squared error              4.895  %
Total Number of Instances            49569


=== Confusion Matrix ===

     a     b   <-- classified as
 26526    37 |     a = class_1
     4 23002 |     b = class_2



=== Stratified cross-validation ===

Correctly Classified Instances       49492               99.8447 %
Incorrectly Classified Instances        77                0.1553 %
Kappa statistic                          0.9969
Mean absolute error                      0.0015
Root mean squared error                  0.0358
Relative absolute error                  0.3108 %
Root relative squared error              7.1718 %
Total Number of Instances            49569


=== Confusion Matrix ===

     a     b   <-- classified as
 26532    31 |     a = class_1
    46 22960 |     b = class_2

Thanks in advance.
Rajesh

Re: WEKA logistic regression on hadoop

Posted by Bertrand Dechoux <de...@gmail.com>.

Weka is indeed a more complete package of data mining solutions but its aim
is not to support Hadoop whereas it is the aim of Mahout.

The implemented methods are standard data mining methods. If you are
looking for Hadoop support you should ask the Mahout mailing list but if
you have question on Weka itself you should ask the Weka mailing list. Not
all algorithms are easy to migrate to Hadoop and lots of data mining
applications are fine without a Hadoop cluster eg the netflix prize
provided a 'big' public dataset but it was only about 1 GB.

Regards

Bertrand


On Tue, Oct 16, 2012 at 3:46 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi,
>
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get
> could results with my experiments.
>
> There are logistic regression algorithms supported with WEKA which I have
> used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
>
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M
> 6 -t lr.arff
>
> Have anyone ported them to take advantage of hadoop ?
>
> How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
>
>
> Options: -R 1.0E-8 -M 6
>
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
>
>
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
>
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
>
> === Error on training data ===
>
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
>
>
>
> === Stratified cross-validation ===
>
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
>
> Thanks in advance.
> Rajesh
>



-- 
Bertrand Dechoux

Re: WEKA logistic regression on hadoop

Posted by Rajesh Nikam <ra...@gmail.com>.

Hi Abhishek,

I have also tried using WEKA SMO, however it take toooooo long (I waited
for more than 6 days ) for training for set of more than million instances.
However logistic regression could come out with model in 20 mins.

This is pretty fast!

My problem is I can use model as is in classification rather want to
understand and use weights generated from training.

My question is how to use Coefficients and Odds Ratios that could be used
for classification as mentioned in previous post?

Hope people dont raise exception to ask hadoop only question :(

Thanks,
Rajesh

On Tue, Oct 16, 2012 at 7:37 PM, Abhishek Shivkumar <
abhisheksgumadi@gmail.com> wrote:

> As far as I know weka cannot be run on hadoop directly.
> What can be done is if your algorithm first generats a model based on a
> training data initially, then you can run your training offline on your
> laptop and serialize, i.e. write the trained model in a file. Now, put this
> model file on hdfs and read it inside your setup method of map reduce
> programs.
>
> As and when you read your input in your mapper method, you can take the
> trained model file to determine any decision such as a classification or
> other supervised machine lerarning algorithm decisions.
>
> I did this for SVM and it did work.
> I am interested to know if anyone else has tried any alternate method to
> port weka algorithms on hadoop.
>
> Thanks!
> With Regards,
> Abhishek S
>
> On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <ra...@gmail.com> wrote:
>
> > Hi,
> >
> > I was looking for logistic regression algorithms on hadoop.
> > mahout is one good package to use on hadoop, however I am not able to
> get could results with my experiments.
> >
> > There are logistic regression algorithms supported with WEKA which I
> have used on Windows.
> > I guess I should be able to run these algos from JAR files as is on
> linux.
> >
> > java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8
> -M 6 -t lr.arff
> >
> > Have anyone ported them to take advantage of hadoop ?
> >
> > How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
> >
> >
> > Options: -R 1.0E-8 -M 6
> >
> > Logistic Regression with ridge parameter of 1.0E-8
> > Coefficients...
> >                  Class
> > Variable       class_1
> > ======================
> > a1                   0
> > a2                   0
> > a3                   0
> > a4              0.0082
> > a5              0.0151
> > a6             -0.1034
> > a7                   0
> > a8                   0
> > a9                   0
> > a10            -0.0397
> > a11            -0.0003
> > a13            -0.1195
> > a14            -0.1389
> > Intercept      -21.487
> >
> >
> > Odds Ratios...
> >                  Class
> > Variable       class_1
> > ======================
> > a1                   1
> > a2                   1
> > a3                   1
> > a4              1.0083
> > a5              1.0152
> > a6              0.9018
> > a7                   1
> > a8                   1
> > a9                   1
> > a10              0.961
> > a11             0.9997
> > a13             0.8873
> > a14             0.8703
> >
> > Time taken to build model: 6.39 seconds
> > Time taken to test model on training data: 1.86 seconds
> >
> > === Error on training data ===
> >
> > Correctly Classified Instances       49528               99.9173 %
> > Incorrectly Classified Instances        41                0.0827 %
> > Kappa statistic                          0.9983
> > Mean absolute error                      0.0011
> > Root mean squared error                  0.0244
> > Relative absolute error                  0.2202 %
> > Root relative squared error              4.895  %
> > Total Number of Instances            49569
> >
> >
> > === Confusion Matrix ===
> >
> >      a     b   <-- classified as
> >  26526    37 |     a = class_1
> >      4 23002 |     b = class_2
> >
> >
> >
> > === Stratified cross-validation ===
> >
> > Correctly Classified Instances       49492               99.8447 %
> > Incorrectly Classified Instances        77                0.1553 %
> > Kappa statistic                          0.9969
> > Mean absolute error                      0.0015
> > Root mean squared error                  0.0358
> > Relative absolute error                  0.3108 %
> > Root relative squared error              7.1718 %
> > Total Number of Instances            49569
> >
> >
> > === Confusion Matrix ===
> >
> >      a     b   <-- classified as
> >  26532    31 |     a = class_1
> >     46 22960 |     b = class_2
> >
> > Thanks in advance.
> > Rajesh
>

Re: WEKA logistic regression on hadoop

Posted by Rajesh Nikam <ra...@gmail.com>.

Hi Abhishek,

I have also tried using WEKA SMO, however it take toooooo long (I waited
for more than 6 days ) for training for set of more than million instances.
However logistic regression could come out with model in 20 mins.

This is pretty fast!

My problem is I can use model as is in classification rather want to
understand and use weights generated from training.

My question is how to use Coefficients and Odds Ratios that could be used
for classification as mentioned in previous post?

Hope people dont raise exception to ask hadoop only question :(

Thanks,
Rajesh

On Tue, Oct 16, 2012 at 7:37 PM, Abhishek Shivkumar <
abhisheksgumadi@gmail.com> wrote:

> As far as I know weka cannot be run on hadoop directly.
> What can be done is if your algorithm first generats a model based on a
> training data initially, then you can run your training offline on your
> laptop and serialize, i.e. write the trained model in a file. Now, put this
> model file on hdfs and read it inside your setup method of map reduce
> programs.
>
> As and when you read your input in your mapper method, you can take the
> trained model file to determine any decision such as a classification or
> other supervised machine lerarning algorithm decisions.
>
> I did this for SVM and it did work.
> I am interested to know if anyone else has tried any alternate method to
> port weka algorithms on hadoop.
>
> Thanks!
> With Regards,
> Abhishek S
>
> On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <ra...@gmail.com> wrote:
>
> > Hi,
> >
> > I was looking for logistic regression algorithms on hadoop.
> > mahout is one good package to use on hadoop, however I am not able to
> get could results with my experiments.
> >
> > There are logistic regression algorithms supported with WEKA which I
> have used on Windows.
> > I guess I should be able to run these algos from JAR files as is on
> linux.
> >
> > java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8
> -M 6 -t lr.arff
> >
> > Have anyone ported them to take advantage of hadoop ?
> >
> > How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
> >
> >
> > Options: -R 1.0E-8 -M 6
> >
> > Logistic Regression with ridge parameter of 1.0E-8
> > Coefficients...
> >                  Class
> > Variable       class_1
> > ======================
> > a1                   0
> > a2                   0
> > a3                   0
> > a4              0.0082
> > a5              0.0151
> > a6             -0.1034
> > a7                   0
> > a8                   0
> > a9                   0
> > a10            -0.0397
> > a11            -0.0003
> > a13            -0.1195
> > a14            -0.1389
> > Intercept      -21.487
> >
> >
> > Odds Ratios...
> >                  Class
> > Variable       class_1
> > ======================
> > a1                   1
> > a2                   1
> > a3                   1
> > a4              1.0083
> > a5              1.0152
> > a6              0.9018
> > a7                   1
> > a8                   1
> > a9                   1
> > a10              0.961
> > a11             0.9997
> > a13             0.8873
> > a14             0.8703
> >
> > Time taken to build model: 6.39 seconds
> > Time taken to test model on training data: 1.86 seconds
> >
> > === Error on training data ===
> >
> > Correctly Classified Instances       49528               99.9173 %
> > Incorrectly Classified Instances        41                0.0827 %
> > Kappa statistic                          0.9983
> > Mean absolute error                      0.0011
> > Root mean squared error                  0.0244
> > Relative absolute error                  0.2202 %
> > Root relative squared error              4.895  %
> > Total Number of Instances            49569
> >
> >
> > === Confusion Matrix ===
> >
> >      a     b   <-- classified as
> >  26526    37 |     a = class_1
> >      4 23002 |     b = class_2
> >
> >
> >
> > === Stratified cross-validation ===
> >
> > Correctly Classified Instances       49492               99.8447 %
> > Incorrectly Classified Instances        77                0.1553 %
> > Kappa statistic                          0.9969
> > Mean absolute error                      0.0015
> > Root mean squared error                  0.0358
> > Relative absolute error                  0.3108 %
> > Root relative squared error              7.1718 %
> > Total Number of Instances            49569
> >
> >
> > === Confusion Matrix ===
> >
> >      a     b   <-- classified as
> >  26532    31 |     a = class_1
> >     46 22960 |     b = class_2
> >
> > Thanks in advance.
> > Rajesh
>

Re: WEKA logistic regression on hadoop

Posted by Rajesh Nikam <ra...@gmail.com>.

Hi Abhishek,

I have also tried using WEKA SMO, however it take toooooo long (I waited
for more than 6 days ) for training for set of more than million instances.
However logistic regression could come out with model in 20 mins.

This is pretty fast!

My problem is I can use model as is in classification rather want to
understand and use weights generated from training.

My question is how to use Coefficients and Odds Ratios that could be used
for classification as mentioned in previous post?

Hope people dont raise exception to ask hadoop only question :(

Thanks,
Rajesh

On Tue, Oct 16, 2012 at 7:37 PM, Abhishek Shivkumar <
abhisheksgumadi@gmail.com> wrote:

> As far as I know weka cannot be run on hadoop directly.
> What can be done is if your algorithm first generats a model based on a
> training data initially, then you can run your training offline on your
> laptop and serialize, i.e. write the trained model in a file. Now, put this
> model file on hdfs and read it inside your setup method of map reduce
> programs.
>
> As and when you read your input in your mapper method, you can take the
> trained model file to determine any decision such as a classification or
> other supervised machine lerarning algorithm decisions.
>
> I did this for SVM and it did work.
> I am interested to know if anyone else has tried any alternate method to
> port weka algorithms on hadoop.
>
> Thanks!
> With Regards,
> Abhishek S
>
> On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <ra...@gmail.com> wrote:
>
> > Hi,
> >
> > I was looking for logistic regression algorithms on hadoop.
> > mahout is one good package to use on hadoop, however I am not able to
> get could results with my experiments.
> >
> > There are logistic regression algorithms supported with WEKA which I
> have used on Windows.
> > I guess I should be able to run these algos from JAR files as is on
> linux.
> >
> > java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8
> -M 6 -t lr.arff
> >
> > Have anyone ported them to take advantage of hadoop ?
> >
> > How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
> >
> >
> > Options: -R 1.0E-8 -M 6
> >
> > Logistic Regression with ridge parameter of 1.0E-8
> > Coefficients...
> >                  Class
> > Variable       class_1
> > ======================
> > a1                   0
> > a2                   0
> > a3                   0
> > a4              0.0082
> > a5              0.0151
> > a6             -0.1034
> > a7                   0
> > a8                   0
> > a9                   0
> > a10            -0.0397
> > a11            -0.0003
> > a13            -0.1195
> > a14            -0.1389
> > Intercept      -21.487
> >
> >
> > Odds Ratios...
> >                  Class
> > Variable       class_1
> > ======================
> > a1                   1
> > a2                   1
> > a3                   1
> > a4              1.0083
> > a5              1.0152
> > a6              0.9018
> > a7                   1
> > a8                   1
> > a9                   1
> > a10              0.961
> > a11             0.9997
> > a13             0.8873
> > a14             0.8703
> >
> > Time taken to build model: 6.39 seconds
> > Time taken to test model on training data: 1.86 seconds
> >
> > === Error on training data ===
> >
> > Correctly Classified Instances       49528               99.9173 %
> > Incorrectly Classified Instances        41                0.0827 %
> > Kappa statistic                          0.9983
> > Mean absolute error                      0.0011
> > Root mean squared error                  0.0244
> > Relative absolute error                  0.2202 %
> > Root relative squared error              4.895  %
> > Total Number of Instances            49569
> >
> >
> > === Confusion Matrix ===
> >
> >      a     b   <-- classified as
> >  26526    37 |     a = class_1
> >      4 23002 |     b = class_2
> >
> >
> >
> > === Stratified cross-validation ===
> >
> > Correctly Classified Instances       49492               99.8447 %
> > Incorrectly Classified Instances        77                0.1553 %
> > Kappa statistic                          0.9969
> > Mean absolute error                      0.0015
> > Root mean squared error                  0.0358
> > Relative absolute error                  0.3108 %
> > Root relative squared error              7.1718 %
> > Total Number of Instances            49569
> >
> >
> > === Confusion Matrix ===
> >
> >      a     b   <-- classified as
> >  26532    31 |     a = class_1
> >     46 22960 |     b = class_2
> >
> > Thanks in advance.
> > Rajesh
>

Re: WEKA logistic regression on hadoop

Posted by Rajesh Nikam <ra...@gmail.com>.

Hi Abhishek,

I have also tried using WEKA SMO, however it take toooooo long (I waited
for more than 6 days ) for training for set of more than million instances.
However logistic regression could come out with model in 20 mins.

This is pretty fast!

My problem is I can use model as is in classification rather want to
understand and use weights generated from training.

My question is how to use Coefficients and Odds Ratios that could be used
for classification as mentioned in previous post?

Hope people dont raise exception to ask hadoop only question :(

Thanks,
Rajesh

On Tue, Oct 16, 2012 at 7:37 PM, Abhishek Shivkumar <
abhisheksgumadi@gmail.com> wrote:

> As far as I know weka cannot be run on hadoop directly.
> What can be done is if your algorithm first generats a model based on a
> training data initially, then you can run your training offline on your
> laptop and serialize, i.e. write the trained model in a file. Now, put this
> model file on hdfs and read it inside your setup method of map reduce
> programs.
>
> As and when you read your input in your mapper method, you can take the
> trained model file to determine any decision such as a classification or
> other supervised machine lerarning algorithm decisions.
>
> I did this for SVM and it did work.
> I am interested to know if anyone else has tried any alternate method to
> port weka algorithms on hadoop.
>
> Thanks!
> With Regards,
> Abhishek S
>
> On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <ra...@gmail.com> wrote:
>
> > Hi,
> >
> > I was looking for logistic regression algorithms on hadoop.
> > mahout is one good package to use on hadoop, however I am not able to
> get could results with my experiments.
> >
> > There are logistic regression algorithms supported with WEKA which I
> have used on Windows.
> > I guess I should be able to run these algos from JAR files as is on
> linux.
> >
> > java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8
> -M 6 -t lr.arff
> >
> > Have anyone ported them to take advantage of hadoop ?
> >
> > How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
> >
> >
> > Options: -R 1.0E-8 -M 6
> >
> > Logistic Regression with ridge parameter of 1.0E-8
> > Coefficients...
> >                  Class
> > Variable       class_1
> > ======================
> > a1                   0
> > a2                   0
> > a3                   0
> > a4              0.0082
> > a5              0.0151
> > a6             -0.1034
> > a7                   0
> > a8                   0
> > a9                   0
> > a10            -0.0397
> > a11            -0.0003
> > a13            -0.1195
> > a14            -0.1389
> > Intercept      -21.487
> >
> >
> > Odds Ratios...
> >                  Class
> > Variable       class_1
> > ======================
> > a1                   1
> > a2                   1
> > a3                   1
> > a4              1.0083
> > a5              1.0152
> > a6              0.9018
> > a7                   1
> > a8                   1
> > a9                   1
> > a10              0.961
> > a11             0.9997
> > a13             0.8873
> > a14             0.8703
> >
> > Time taken to build model: 6.39 seconds
> > Time taken to test model on training data: 1.86 seconds
> >
> > === Error on training data ===
> >
> > Correctly Classified Instances       49528               99.9173 %
> > Incorrectly Classified Instances        41                0.0827 %
> > Kappa statistic                          0.9983
> > Mean absolute error                      0.0011
> > Root mean squared error                  0.0244
> > Relative absolute error                  0.2202 %
> > Root relative squared error              4.895  %
> > Total Number of Instances            49569
> >
> >
> > === Confusion Matrix ===
> >
> >      a     b   <-- classified as
> >  26526    37 |     a = class_1
> >      4 23002 |     b = class_2
> >
> >
> >
> > === Stratified cross-validation ===
> >
> > Correctly Classified Instances       49492               99.8447 %
> > Incorrectly Classified Instances        77                0.1553 %
> > Kappa statistic                          0.9969
> > Mean absolute error                      0.0015
> > Root mean squared error                  0.0358
> > Relative absolute error                  0.3108 %
> > Root relative squared error              7.1718 %
> > Total Number of Instances            49569
> >
> >
> > === Confusion Matrix ===
> >
> >      a     b   <-- classified as
> >  26532    31 |     a = class_1
> >     46 22960 |     b = class_2
> >
> > Thanks in advance.
> > Rajesh
>

Re: WEKA logistic regression on hadoop

Posted by Abhishek Shivkumar <ab...@gmail.com>.

As far as I know weka cannot be run on hadoop directly.
What can be done is if your algorithm first generats a model based on a training data initially, then you can run your training offline on your laptop and serialize, i.e. write the trained model in a file. Now, put this model file on hdfs and read it inside your setup method of map reduce programs. 

As and when you read your input in your mapper method, you can take the trained model file to determine any decision such as a classification or other supervised machine lerarning algorithm decisions.

I did this for SVM and it did work.
I am interested to know if anyone else has tried any alternate method to port weka algorithms on hadoop.

Thanks!
With Regards,
Abhishek S

On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi,
> 
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get could results with my experiments.
> 
> There are logistic regression algorithms supported with WEKA which I have used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
> 
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M 6 -t lr.arff 
> 
> Have anyone ported them to take advantage of hadoop ?
> 
> How to interpret the output generated from it like what is Coefficients and Odds Ratios that could be used for classification ?
> 
> 
> Options: -R 1.0E-8 -M 6 
> 
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
> 
> 
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
> 
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
> 
> === Error on training data ===
> 
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
> 
> 
> 
> === Stratified cross-validation ===
> 
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
> 
> Thanks in advance.
> Rajesh

Re: WEKA logistic regression on hadoop

Posted by Bertrand Dechoux <de...@gmail.com>.

Weka is indeed a more complete package of data mining solutions but its aim
is not to support Hadoop whereas it is the aim of Mahout.

The implemented methods are standard data mining methods. If you are
looking for Hadoop support you should ask the Mahout mailing list but if
you have question on Weka itself you should ask the Weka mailing list. Not
all algorithms are easy to migrate to Hadoop and lots of data mining
applications are fine without a Hadoop cluster eg the netflix prize
provided a 'big' public dataset but it was only about 1 GB.

Regards

Bertrand


On Tue, Oct 16, 2012 at 3:46 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi,
>
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get
> could results with my experiments.
>
> There are logistic regression algorithms supported with WEKA which I have
> used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
>
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M
> 6 -t lr.arff
>
> Have anyone ported them to take advantage of hadoop ?
>
> How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
>
>
> Options: -R 1.0E-8 -M 6
>
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
>
>
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
>
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
>
> === Error on training data ===
>
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
>
>
>
> === Stratified cross-validation ===
>
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
>
> Thanks in advance.
> Rajesh
>



-- 
Bertrand Dechoux

Re: WEKA logistic regression on hadoop

Posted by Abhishek Shivkumar <ab...@gmail.com>.

As far as I know weka cannot be run on hadoop directly.
What can be done is if your algorithm first generats a model based on a training data initially, then you can run your training offline on your laptop and serialize, i.e. write the trained model in a file. Now, put this model file on hdfs and read it inside your setup method of map reduce programs. 

As and when you read your input in your mapper method, you can take the trained model file to determine any decision such as a classification or other supervised machine lerarning algorithm decisions.

I did this for SVM and it did work.
I am interested to know if anyone else has tried any alternate method to port weka algorithms on hadoop.

Thanks!
With Regards,
Abhishek S

On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi,
> 
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get could results with my experiments.
> 
> There are logistic regression algorithms supported with WEKA which I have used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
> 
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M 6 -t lr.arff 
> 
> Have anyone ported them to take advantage of hadoop ?
> 
> How to interpret the output generated from it like what is Coefficients and Odds Ratios that could be used for classification ?
> 
> 
> Options: -R 1.0E-8 -M 6 
> 
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
> 
> 
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
> 
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
> 
> === Error on training data ===
> 
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
> 
> 
> 
> === Stratified cross-validation ===
> 
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
> 
> Thanks in advance.
> Rajesh

Re: WEKA logistic regression on hadoop

Posted by Bertrand Dechoux <de...@gmail.com>.

Weka is indeed a more complete package of data mining solutions but its aim
is not to support Hadoop whereas it is the aim of Mahout.

The implemented methods are standard data mining methods. If you are
looking for Hadoop support you should ask the Mahout mailing list but if
you have question on Weka itself you should ask the Weka mailing list. Not
all algorithms are easy to migrate to Hadoop and lots of data mining
applications are fine without a Hadoop cluster eg the netflix prize
provided a 'big' public dataset but it was only about 1 GB.

Regards

Bertrand


On Tue, Oct 16, 2012 at 3:46 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi,
>
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get
> could results with my experiments.
>
> There are logistic regression algorithms supported with WEKA which I have
> used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
>
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M
> 6 -t lr.arff
>
> Have anyone ported them to take advantage of hadoop ?
>
> How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
>
>
> Options: -R 1.0E-8 -M 6
>
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
>
>
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
>
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
>
> === Error on training data ===
>
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
>
>
>
> === Stratified cross-validation ===
>
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
>
> Thanks in advance.
> Rajesh
>



-- 
Bertrand Dechoux

Re: WEKA logistic regression on hadoop

Posted by Abhishek Shivkumar <ab...@gmail.com>.

As far as I know weka cannot be run on hadoop directly.
What can be done is if your algorithm first generats a model based on a training data initially, then you can run your training offline on your laptop and serialize, i.e. write the trained model in a file. Now, put this model file on hdfs and read it inside your setup method of map reduce programs. 

As and when you read your input in your mapper method, you can take the trained model file to determine any decision such as a classification or other supervised machine lerarning algorithm decisions.

I did this for SVM and it did work.
I am interested to know if anyone else has tried any alternate method to port weka algorithms on hadoop.

Thanks!
With Regards,
Abhishek S

On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi,
> 
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get could results with my experiments.
> 
> There are logistic regression algorithms supported with WEKA which I have used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
> 
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M 6 -t lr.arff 
> 
> Have anyone ported them to take advantage of hadoop ?
> 
> How to interpret the output generated from it like what is Coefficients and Odds Ratios that could be used for classification ?
> 
> 
> Options: -R 1.0E-8 -M 6 
> 
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
> 
> 
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
> 
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
> 
> === Error on training data ===
> 
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
> 
> 
> 
> === Stratified cross-validation ===
> 
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
> 
> Thanks in advance.
> Rajesh

Re: WEKA logistic regression on hadoop

Posted by Abhishek Shivkumar <ab...@gmail.com>.

As far as I know weka cannot be run on hadoop directly.
What can be done is if your algorithm first generats a model based on a training data initially, then you can run your training offline on your laptop and serialize, i.e. write the trained model in a file. Now, put this model file on hdfs and read it inside your setup method of map reduce programs. 

As and when you read your input in your mapper method, you can take the trained model file to determine any decision such as a classification or other supervised machine lerarning algorithm decisions.

I did this for SVM and it did work.
I am interested to know if anyone else has tried any alternate method to port weka algorithms on hadoop.

Thanks!
With Regards,
Abhishek S

On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi,
> 
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get could results with my experiments.
> 
> There are logistic regression algorithms supported with WEKA which I have used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
> 
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M 6 -t lr.arff 
> 
> Have anyone ported them to take advantage of hadoop ?
> 
> How to interpret the output generated from it like what is Coefficients and Odds Ratios that could be used for classification ?
> 
> 
> Options: -R 1.0E-8 -M 6 
> 
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
> 
> 
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
> 
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
> 
> === Error on training data ===
> 
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
> 
> 
> 
> === Stratified cross-validation ===
> 
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
> 
> Thanks in advance.
> Rajesh

Re: WEKA logistic regression on hadoop

Posted by Bertrand Dechoux <de...@gmail.com>.

Weka is indeed a more complete package of data mining solutions but its aim
is not to support Hadoop whereas it is the aim of Mahout.

The implemented methods are standard data mining methods. If you are
looking for Hadoop support you should ask the Mahout mailing list but if
you have question on Weka itself you should ask the Weka mailing list. Not
all algorithms are easy to migrate to Hadoop and lots of data mining
applications are fine without a Hadoop cluster eg the netflix prize
provided a 'big' public dataset but it was only about 1 GB.

Regards

Bertrand


On Tue, Oct 16, 2012 at 3:46 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi,
>
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get
> could results with my experiments.
>
> There are logistic regression algorithms supported with WEKA which I have
> used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
>
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M
> 6 -t lr.arff
>
> Have anyone ported them to take advantage of hadoop ?
>
> How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
>
>
> Options: -R 1.0E-8 -M 6
>
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
>
>
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
>
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
>
> === Error on training data ===
>
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
>
>
>
> === Stratified cross-validation ===
>
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
>
> Thanks in advance.
> Rajesh
>



-- 
Bertrand Dechoux