You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Rajesh Nikam <ra...@gmail.com> on 2012/10/12 15:06:22 UTC

Logistic regression package on Hadoop

Hi,

Could you please suggest Logistic regression package that could be used on
Hadoop ?
I have large data and looking for LR package with kernel supports.

Thanks
Rajesh

Re: Logistic regression package on Hadoop

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi Rajesh,

You may want to use the mahout mailing list for mahout related question.
http://mahout.apache.org/mailinglists.html

Regards

Bertrand

On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi Harsh,
>
> Thanks for giving link for sgd from mahout.
>
> I have asked question on issue with using sgd. Below is description of it.
> Ted Dunning has mentioned their may be some issue with data encoding.
>
> However I am not able to point issue. Could you please let me know what is
> issue its format or usage.
>
> Attached uses input files
>
> I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> Converted this to csv file just by updating header: iris-3-classes.csv
>
> mahout org.apache.mahout.classifier.
> sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output /usr/local/mahout/trunk/
> *iris-3-classes.model* --target class *--categories 3* --predictors
> sepallength sepalwidth petallength petalwidth --types n
>
> >> it gave following error.
> Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
>
> Now created csv with only 2 classes. PFA iris-2-classes.csv
>
> >> trained iris-2-classes.csv with sgd
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepallength sepalwidth petallength petalwidth --types n
>
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
>
> AUC = 0.14
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.6, -0.3], [-0.8, -0.4]]
>
> >> AUC seems to poor. Now changed --predictors
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepalwidth petallength --types n n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
> AUC is improved, however from confusion matrix seems everything is
> classified as class a.
>
> Below is the output.
>
> "target","model-output","log-likelihood"
> 0,0.492,-0.677017
> 0,0.493,-0.679192
> 0,0.493,-0.678355
> 0,0.493,-0.678724
> 0,0.492,-0.676583
> 0,0.491,-0.675182
> 0,0.492,-0.677452
> 0,0.492,-0.677419
> 0,0.493,-0.679628
> 0,0.493,-0.678724
> 0,0.491,-0.676116
> 0,0.492,-0.677386
> 0,0.493,-0.679192
> 0,0.493,-0.679291
> 0,0.491,-0.674912
> 0,0.490,-0.673081
> 0,0.491,-0.675313
> 0,0.492,-0.677017
> 0,0.491,-0.675616
> 0,0.491,-0.675682
> 0,0.492,-0.677353
> 0,0.491,-0.676116
> 0,0.492,-0.676714
> 0,0.492,-0.677788
> 0,0.492,-0.677287
> 0,0.493,-0.679126
> 0,0.492,-0.677386
> 0,0.492,-0.676984
> 0,0.492,-0.677452
> 0,0.492,-0.678256
> 0,0.493,-0.678691
> 0,0.492,-0.677419
> 0,0.491,-0.674381
> 0,0.490,-0.673980
> 0,0.493,-0.678724
> 0,0.493,-0.678387
> 0,0.492,-0.677050
> 0,0.493,-0.678724
> 0,0.493,-0.679225
> 0,0.492,-0.677419
> 0,0.492,-0.677050
> 0,0.495,-0.682279
> 0,0.493,-0.678355
> 0,0.492,-0.676951
> 0,0.491,-0.675550
> 0,0.493,-0.679192
> 0,0.491,-0.675649
> 0,0.493,-0.678322
> 0,0.491,-0.676116
> 0,0.492,-0.677887
> 1,0.492,-0.709316
> 1,0.492,-0.709248
> 1,0.492,-0.708935
> 1,0.494,-0.705048
> 1,0.493,-0.707488
> 1,0.493,-0.707454
> 1,0.492,-0.709765
> 1,0.494,-0.705258
> 1,0.493,-0.707936
> 1,0.493,-0.706803
> 1,0.495,-0.703539
> 1,0.493,-0.708249
> 1,0.494,-0.704601
> 1,0.493,-0.707970
> 1,0.493,-0.707597
> 1,0.492,-0.708765
> 1,0.492,-0.708351
> 1,0.493,-0.706871
> 1,0.494,-0.704770
> 1,0.494,-0.705908
> 1,0.492,-0.709350
> 1,0.493,-0.707285
> 1,0.493,-0.706247
> 1,0.493,-0.707522
> 1,0.493,-0.707835
> 1,0.492,-0.708317
> 1,0.493,-0.707556
> 1,0.492,-0.708520
> 1,0.493,-0.707902
> 1,0.494,-0.706220
> 1,0.494,-0.705427
> 1,0.494,-0.705393
> 1,0.493,-0.706803
> 1,0.493,-0.707210
> 1,0.492,-0.708351
> 1,0.492,-0.710146
> 1,0.492,-0.708867
> 1,0.494,-0.705183
> 1,0.493,-0.708215
> 1,0.494,-0.705942
> 1,0.493,-0.706525
> 1,0.492,-0.708385
> 1,0.493,-0.706389
> 1,0.494,-0.704811
> 1,0.493,-0.706905
> 1,0.493,-0.708249
> 1,0.493,-0.707801
> 1,0.493,-0.707835
> 1,0.494,-0.705604
> 1,0.493,-0.707319
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
>
> On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Harsh,
>>
>> THanks for the plug.  Rajesh has been talking to us.
>>
>>
>> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Hi Rajesh,
>>>
>>> Please head over to the Apache Mahout project. See
>>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>>
>>> Apache Mahout is homed at http://mahout.apache.org and works well with
>>> Hadoop MR, etc..
>>>
>>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > Could you please suggest Logistic regression package that could be
>>> used on
>>> > Hadoop ?
>>> > I have large data and looking for LR package with kernel supports.
>>> >
>>> > Thanks
>>> > Rajesh
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Bertrand Dechoux

Re: Logistic regression package on Hadoop

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi Rajesh,

You may want to use the mahout mailing list for mahout related question.
http://mahout.apache.org/mailinglists.html

Regards

Bertrand

On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi Harsh,
>
> Thanks for giving link for sgd from mahout.
>
> I have asked question on issue with using sgd. Below is description of it.
> Ted Dunning has mentioned their may be some issue with data encoding.
>
> However I am not able to point issue. Could you please let me know what is
> issue its format or usage.
>
> Attached uses input files
>
> I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> Converted this to csv file just by updating header: iris-3-classes.csv
>
> mahout org.apache.mahout.classifier.
> sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output /usr/local/mahout/trunk/
> *iris-3-classes.model* --target class *--categories 3* --predictors
> sepallength sepalwidth petallength petalwidth --types n
>
> >> it gave following error.
> Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
>
> Now created csv with only 2 classes. PFA iris-2-classes.csv
>
> >> trained iris-2-classes.csv with sgd
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepallength sepalwidth petallength petalwidth --types n
>
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
>
> AUC = 0.14
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.6, -0.3], [-0.8, -0.4]]
>
> >> AUC seems to poor. Now changed --predictors
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepalwidth petallength --types n n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
> AUC is improved, however from confusion matrix seems everything is
> classified as class a.
>
> Below is the output.
>
> "target","model-output","log-likelihood"
> 0,0.492,-0.677017
> 0,0.493,-0.679192
> 0,0.493,-0.678355
> 0,0.493,-0.678724
> 0,0.492,-0.676583
> 0,0.491,-0.675182
> 0,0.492,-0.677452
> 0,0.492,-0.677419
> 0,0.493,-0.679628
> 0,0.493,-0.678724
> 0,0.491,-0.676116
> 0,0.492,-0.677386
> 0,0.493,-0.679192
> 0,0.493,-0.679291
> 0,0.491,-0.674912
> 0,0.490,-0.673081
> 0,0.491,-0.675313
> 0,0.492,-0.677017
> 0,0.491,-0.675616
> 0,0.491,-0.675682
> 0,0.492,-0.677353
> 0,0.491,-0.676116
> 0,0.492,-0.676714
> 0,0.492,-0.677788
> 0,0.492,-0.677287
> 0,0.493,-0.679126
> 0,0.492,-0.677386
> 0,0.492,-0.676984
> 0,0.492,-0.677452
> 0,0.492,-0.678256
> 0,0.493,-0.678691
> 0,0.492,-0.677419
> 0,0.491,-0.674381
> 0,0.490,-0.673980
> 0,0.493,-0.678724
> 0,0.493,-0.678387
> 0,0.492,-0.677050
> 0,0.493,-0.678724
> 0,0.493,-0.679225
> 0,0.492,-0.677419
> 0,0.492,-0.677050
> 0,0.495,-0.682279
> 0,0.493,-0.678355
> 0,0.492,-0.676951
> 0,0.491,-0.675550
> 0,0.493,-0.679192
> 0,0.491,-0.675649
> 0,0.493,-0.678322
> 0,0.491,-0.676116
> 0,0.492,-0.677887
> 1,0.492,-0.709316
> 1,0.492,-0.709248
> 1,0.492,-0.708935
> 1,0.494,-0.705048
> 1,0.493,-0.707488
> 1,0.493,-0.707454
> 1,0.492,-0.709765
> 1,0.494,-0.705258
> 1,0.493,-0.707936
> 1,0.493,-0.706803
> 1,0.495,-0.703539
> 1,0.493,-0.708249
> 1,0.494,-0.704601
> 1,0.493,-0.707970
> 1,0.493,-0.707597
> 1,0.492,-0.708765
> 1,0.492,-0.708351
> 1,0.493,-0.706871
> 1,0.494,-0.704770
> 1,0.494,-0.705908
> 1,0.492,-0.709350
> 1,0.493,-0.707285
> 1,0.493,-0.706247
> 1,0.493,-0.707522
> 1,0.493,-0.707835
> 1,0.492,-0.708317
> 1,0.493,-0.707556
> 1,0.492,-0.708520
> 1,0.493,-0.707902
> 1,0.494,-0.706220
> 1,0.494,-0.705427
> 1,0.494,-0.705393
> 1,0.493,-0.706803
> 1,0.493,-0.707210
> 1,0.492,-0.708351
> 1,0.492,-0.710146
> 1,0.492,-0.708867
> 1,0.494,-0.705183
> 1,0.493,-0.708215
> 1,0.494,-0.705942
> 1,0.493,-0.706525
> 1,0.492,-0.708385
> 1,0.493,-0.706389
> 1,0.494,-0.704811
> 1,0.493,-0.706905
> 1,0.493,-0.708249
> 1,0.493,-0.707801
> 1,0.493,-0.707835
> 1,0.494,-0.705604
> 1,0.493,-0.707319
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
>
> On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Harsh,
>>
>> THanks for the plug.  Rajesh has been talking to us.
>>
>>
>> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Hi Rajesh,
>>>
>>> Please head over to the Apache Mahout project. See
>>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>>
>>> Apache Mahout is homed at http://mahout.apache.org and works well with
>>> Hadoop MR, etc..
>>>
>>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > Could you please suggest Logistic regression package that could be
>>> used on
>>> > Hadoop ?
>>> > I have large data and looking for LR package with kernel supports.
>>> >
>>> > Thanks
>>> > Rajesh
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Bertrand Dechoux

Re: Logistic regression package on Hadoop

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi Rajesh,

You may want to use the mahout mailing list for mahout related question.
http://mahout.apache.org/mailinglists.html

Regards

Bertrand

On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi Harsh,
>
> Thanks for giving link for sgd from mahout.
>
> I have asked question on issue with using sgd. Below is description of it.
> Ted Dunning has mentioned their may be some issue with data encoding.
>
> However I am not able to point issue. Could you please let me know what is
> issue its format or usage.
>
> Attached uses input files
>
> I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> Converted this to csv file just by updating header: iris-3-classes.csv
>
> mahout org.apache.mahout.classifier.
> sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output /usr/local/mahout/trunk/
> *iris-3-classes.model* --target class *--categories 3* --predictors
> sepallength sepalwidth petallength petalwidth --types n
>
> >> it gave following error.
> Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
>
> Now created csv with only 2 classes. PFA iris-2-classes.csv
>
> >> trained iris-2-classes.csv with sgd
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepallength sepalwidth petallength petalwidth --types n
>
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
>
> AUC = 0.14
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.6, -0.3], [-0.8, -0.4]]
>
> >> AUC seems to poor. Now changed --predictors
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepalwidth petallength --types n n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
> AUC is improved, however from confusion matrix seems everything is
> classified as class a.
>
> Below is the output.
>
> "target","model-output","log-likelihood"
> 0,0.492,-0.677017
> 0,0.493,-0.679192
> 0,0.493,-0.678355
> 0,0.493,-0.678724
> 0,0.492,-0.676583
> 0,0.491,-0.675182
> 0,0.492,-0.677452
> 0,0.492,-0.677419
> 0,0.493,-0.679628
> 0,0.493,-0.678724
> 0,0.491,-0.676116
> 0,0.492,-0.677386
> 0,0.493,-0.679192
> 0,0.493,-0.679291
> 0,0.491,-0.674912
> 0,0.490,-0.673081
> 0,0.491,-0.675313
> 0,0.492,-0.677017
> 0,0.491,-0.675616
> 0,0.491,-0.675682
> 0,0.492,-0.677353
> 0,0.491,-0.676116
> 0,0.492,-0.676714
> 0,0.492,-0.677788
> 0,0.492,-0.677287
> 0,0.493,-0.679126
> 0,0.492,-0.677386
> 0,0.492,-0.676984
> 0,0.492,-0.677452
> 0,0.492,-0.678256
> 0,0.493,-0.678691
> 0,0.492,-0.677419
> 0,0.491,-0.674381
> 0,0.490,-0.673980
> 0,0.493,-0.678724
> 0,0.493,-0.678387
> 0,0.492,-0.677050
> 0,0.493,-0.678724
> 0,0.493,-0.679225
> 0,0.492,-0.677419
> 0,0.492,-0.677050
> 0,0.495,-0.682279
> 0,0.493,-0.678355
> 0,0.492,-0.676951
> 0,0.491,-0.675550
> 0,0.493,-0.679192
> 0,0.491,-0.675649
> 0,0.493,-0.678322
> 0,0.491,-0.676116
> 0,0.492,-0.677887
> 1,0.492,-0.709316
> 1,0.492,-0.709248
> 1,0.492,-0.708935
> 1,0.494,-0.705048
> 1,0.493,-0.707488
> 1,0.493,-0.707454
> 1,0.492,-0.709765
> 1,0.494,-0.705258
> 1,0.493,-0.707936
> 1,0.493,-0.706803
> 1,0.495,-0.703539
> 1,0.493,-0.708249
> 1,0.494,-0.704601
> 1,0.493,-0.707970
> 1,0.493,-0.707597
> 1,0.492,-0.708765
> 1,0.492,-0.708351
> 1,0.493,-0.706871
> 1,0.494,-0.704770
> 1,0.494,-0.705908
> 1,0.492,-0.709350
> 1,0.493,-0.707285
> 1,0.493,-0.706247
> 1,0.493,-0.707522
> 1,0.493,-0.707835
> 1,0.492,-0.708317
> 1,0.493,-0.707556
> 1,0.492,-0.708520
> 1,0.493,-0.707902
> 1,0.494,-0.706220
> 1,0.494,-0.705427
> 1,0.494,-0.705393
> 1,0.493,-0.706803
> 1,0.493,-0.707210
> 1,0.492,-0.708351
> 1,0.492,-0.710146
> 1,0.492,-0.708867
> 1,0.494,-0.705183
> 1,0.493,-0.708215
> 1,0.494,-0.705942
> 1,0.493,-0.706525
> 1,0.492,-0.708385
> 1,0.493,-0.706389
> 1,0.494,-0.704811
> 1,0.493,-0.706905
> 1,0.493,-0.708249
> 1,0.493,-0.707801
> 1,0.493,-0.707835
> 1,0.494,-0.705604
> 1,0.493,-0.707319
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
>
> On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Harsh,
>>
>> THanks for the plug.  Rajesh has been talking to us.
>>
>>
>> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Hi Rajesh,
>>>
>>> Please head over to the Apache Mahout project. See
>>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>>
>>> Apache Mahout is homed at http://mahout.apache.org and works well with
>>> Hadoop MR, etc..
>>>
>>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > Could you please suggest Logistic regression package that could be
>>> used on
>>> > Hadoop ?
>>> > I have large data and looking for LR package with kernel supports.
>>> >
>>> > Thanks
>>> > Rajesh
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Bertrand Dechoux

Re: Logistic regression package on Hadoop

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi Rajesh,

You may want to use the mahout mailing list for mahout related question.
http://mahout.apache.org/mailinglists.html

Regards

Bertrand

On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam <ra...@gmail.com> wrote:

> Hi Harsh,
>
> Thanks for giving link for sgd from mahout.
>
> I have asked question on issue with using sgd. Below is description of it.
> Ted Dunning has mentioned their may be some issue with data encoding.
>
> However I am not able to point issue. Could you please let me know what is
> issue its format or usage.
>
> Attached uses input files
>
> I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> Converted this to csv file just by updating header: iris-3-classes.csv
>
> mahout org.apache.mahout.classifier.
> sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output /usr/local/mahout/trunk/
> *iris-3-classes.model* --target class *--categories 3* --predictors
> sepallength sepalwidth petallength petalwidth --types n
>
> >> it gave following error.
> Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
>
> Now created csv with only 2 classes. PFA iris-2-classes.csv
>
> >> trained iris-2-classes.csv with sgd
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepallength sepalwidth petallength petalwidth --types n
>
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
>
> AUC = 0.14
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.6, -0.3], [-0.8, -0.4]]
>
> >> AUC seems to poor. Now changed --predictors
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepalwidth petallength --types n n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
> AUC is improved, however from confusion matrix seems everything is
> classified as class a.
>
> Below is the output.
>
> "target","model-output","log-likelihood"
> 0,0.492,-0.677017
> 0,0.493,-0.679192
> 0,0.493,-0.678355
> 0,0.493,-0.678724
> 0,0.492,-0.676583
> 0,0.491,-0.675182
> 0,0.492,-0.677452
> 0,0.492,-0.677419
> 0,0.493,-0.679628
> 0,0.493,-0.678724
> 0,0.491,-0.676116
> 0,0.492,-0.677386
> 0,0.493,-0.679192
> 0,0.493,-0.679291
> 0,0.491,-0.674912
> 0,0.490,-0.673081
> 0,0.491,-0.675313
> 0,0.492,-0.677017
> 0,0.491,-0.675616
> 0,0.491,-0.675682
> 0,0.492,-0.677353
> 0,0.491,-0.676116
> 0,0.492,-0.676714
> 0,0.492,-0.677788
> 0,0.492,-0.677287
> 0,0.493,-0.679126
> 0,0.492,-0.677386
> 0,0.492,-0.676984
> 0,0.492,-0.677452
> 0,0.492,-0.678256
> 0,0.493,-0.678691
> 0,0.492,-0.677419
> 0,0.491,-0.674381
> 0,0.490,-0.673980
> 0,0.493,-0.678724
> 0,0.493,-0.678387
> 0,0.492,-0.677050
> 0,0.493,-0.678724
> 0,0.493,-0.679225
> 0,0.492,-0.677419
> 0,0.492,-0.677050
> 0,0.495,-0.682279
> 0,0.493,-0.678355
> 0,0.492,-0.676951
> 0,0.491,-0.675550
> 0,0.493,-0.679192
> 0,0.491,-0.675649
> 0,0.493,-0.678322
> 0,0.491,-0.676116
> 0,0.492,-0.677887
> 1,0.492,-0.709316
> 1,0.492,-0.709248
> 1,0.492,-0.708935
> 1,0.494,-0.705048
> 1,0.493,-0.707488
> 1,0.493,-0.707454
> 1,0.492,-0.709765
> 1,0.494,-0.705258
> 1,0.493,-0.707936
> 1,0.493,-0.706803
> 1,0.495,-0.703539
> 1,0.493,-0.708249
> 1,0.494,-0.704601
> 1,0.493,-0.707970
> 1,0.493,-0.707597
> 1,0.492,-0.708765
> 1,0.492,-0.708351
> 1,0.493,-0.706871
> 1,0.494,-0.704770
> 1,0.494,-0.705908
> 1,0.492,-0.709350
> 1,0.493,-0.707285
> 1,0.493,-0.706247
> 1,0.493,-0.707522
> 1,0.493,-0.707835
> 1,0.492,-0.708317
> 1,0.493,-0.707556
> 1,0.492,-0.708520
> 1,0.493,-0.707902
> 1,0.494,-0.706220
> 1,0.494,-0.705427
> 1,0.494,-0.705393
> 1,0.493,-0.706803
> 1,0.493,-0.707210
> 1,0.492,-0.708351
> 1,0.492,-0.710146
> 1,0.492,-0.708867
> 1,0.494,-0.705183
> 1,0.493,-0.708215
> 1,0.494,-0.705942
> 1,0.493,-0.706525
> 1,0.492,-0.708385
> 1,0.493,-0.706389
> 1,0.494,-0.704811
> 1,0.493,-0.706905
> 1,0.493,-0.708249
> 1,0.493,-0.707801
> 1,0.493,-0.707835
> 1,0.494,-0.705604
> 1,0.493,-0.707319
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
>
> On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Harsh,
>>
>> THanks for the plug.  Rajesh has been talking to us.
>>
>>
>> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Hi Rajesh,
>>>
>>> Please head over to the Apache Mahout project. See
>>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>>
>>> Apache Mahout is homed at http://mahout.apache.org and works well with
>>> Hadoop MR, etc..
>>>
>>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > Could you please suggest Logistic regression package that could be
>>> used on
>>> > Hadoop ?
>>> > I have large data and looking for LR package with kernel supports.
>>> >
>>> > Thanks
>>> > Rajesh
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Bertrand Dechoux

Re: Logistic regression package on Hadoop

Posted by Rajesh Nikam <ra...@gmail.com>.
Hi Harsh,

Thanks for giving link for sgd from mahout.

I have asked question on issue with using sgd. Below is description of it.
Ted Dunning has mentioned their may be some issue with data encoding.

However I am not able to point issue. Could you please let me know what is
issue its format or usage.

Attached uses input files

I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
Converted this to csv file just by updating header: iris-3-classes.csv

mahout org.apache.mahout.classifier.
sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output
/usr/local/mahout/trunk/
*iris-3-classes.model* --target class *--categories 3* --predictors
sepallength sepalwidth petallength petalwidth --types n

>> it gave following error.
Exception in thread "main" java.lang.IllegalArgumentException: Can only
call classifyScalar with two categories

Now created csv with only 2 classes. PFA iris-2-classes.csv

>> trained iris-2-classes.csv with sgd

mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
/usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
2* --predictors sepallength sepalwidth petallength petalwidth --types n


mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
--model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion

AUC = 0.14
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.6, -0.3], [-0.8, -0.4]]

>> AUC seems to poor. Now changed --predictors

mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
/usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
2* --predictors sepalwidth petallength --types n n

mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
--model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
--scores

AUC = 0.80
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.7, -0.3], [-0.7, -0.4]]

AUC is improved, however from confusion matrix seems everything is
classified as class a.

Below is the output.

"target","model-output","log-likelihood"
0,0.492,-0.677017
0,0.493,-0.679192
0,0.493,-0.678355
0,0.493,-0.678724
0,0.492,-0.676583
0,0.491,-0.675182
0,0.492,-0.677452
0,0.492,-0.677419
0,0.493,-0.679628
0,0.493,-0.678724
0,0.491,-0.676116
0,0.492,-0.677386
0,0.493,-0.679192
0,0.493,-0.679291
0,0.491,-0.674912
0,0.490,-0.673081
0,0.491,-0.675313
0,0.492,-0.677017
0,0.491,-0.675616
0,0.491,-0.675682
0,0.492,-0.677353
0,0.491,-0.676116
0,0.492,-0.676714
0,0.492,-0.677788
0,0.492,-0.677287
0,0.493,-0.679126
0,0.492,-0.677386
0,0.492,-0.676984
0,0.492,-0.677452
0,0.492,-0.678256
0,0.493,-0.678691
0,0.492,-0.677419
0,0.491,-0.674381
0,0.490,-0.673980
0,0.493,-0.678724
0,0.493,-0.678387
0,0.492,-0.677050
0,0.493,-0.678724
0,0.493,-0.679225
0,0.492,-0.677419
0,0.492,-0.677050
0,0.495,-0.682279
0,0.493,-0.678355
0,0.492,-0.676951
0,0.491,-0.675550
0,0.493,-0.679192
0,0.491,-0.675649
0,0.493,-0.678322
0,0.491,-0.676116
0,0.492,-0.677887
1,0.492,-0.709316
1,0.492,-0.709248
1,0.492,-0.708935
1,0.494,-0.705048
1,0.493,-0.707488
1,0.493,-0.707454
1,0.492,-0.709765
1,0.494,-0.705258
1,0.493,-0.707936
1,0.493,-0.706803
1,0.495,-0.703539
1,0.493,-0.708249
1,0.494,-0.704601
1,0.493,-0.707970
1,0.493,-0.707597
1,0.492,-0.708765
1,0.492,-0.708351
1,0.493,-0.706871
1,0.494,-0.704770
1,0.494,-0.705908
1,0.492,-0.709350
1,0.493,-0.707285
1,0.493,-0.706247
1,0.493,-0.707522
1,0.493,-0.707835
1,0.492,-0.708317
1,0.493,-0.707556
1,0.492,-0.708520
1,0.493,-0.707902
1,0.494,-0.706220
1,0.494,-0.705427
1,0.494,-0.705393
1,0.493,-0.706803
1,0.493,-0.707210
1,0.492,-0.708351
1,0.492,-0.710146
1,0.492,-0.708867
1,0.494,-0.705183
1,0.493,-0.708215
1,0.494,-0.705942
1,0.493,-0.706525
1,0.492,-0.708385
1,0.493,-0.706389
1,0.494,-0.704811
1,0.493,-0.706905
1,0.493,-0.708249
1,0.493,-0.707801
1,0.493,-0.707835
1,0.494,-0.705604
1,0.493,-0.707319

AUC = 0.80
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.7, -0.3], [-0.7, -0.4]]


On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <td...@maprtech.com> wrote:

> Harsh,
>
> THanks for the plug.  Rajesh has been talking to us.
>
>
> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> Hi Rajesh,
>>
>> Please head over to the Apache Mahout project. See
>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>
>> Apache Mahout is homed at http://mahout.apache.org and works well with
>> Hadoop MR, etc..
>>
>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Could you please suggest Logistic regression package that could be used
>> on
>> > Hadoop ?
>> > I have large data and looking for LR package with kernel supports.
>> >
>> > Thanks
>> > Rajesh
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Logistic regression package on Hadoop

Posted by Rajesh Nikam <ra...@gmail.com>.
Hi Harsh,

Thanks for giving link for sgd from mahout.

I have asked question on issue with using sgd. Below is description of it.
Ted Dunning has mentioned their may be some issue with data encoding.

However I am not able to point issue. Could you please let me know what is
issue its format or usage.

Attached uses input files

I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
Converted this to csv file just by updating header: iris-3-classes.csv

mahout org.apache.mahout.classifier.
sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output
/usr/local/mahout/trunk/
*iris-3-classes.model* --target class *--categories 3* --predictors
sepallength sepalwidth petallength petalwidth --types n

>> it gave following error.
Exception in thread "main" java.lang.IllegalArgumentException: Can only
call classifyScalar with two categories

Now created csv with only 2 classes. PFA iris-2-classes.csv

>> trained iris-2-classes.csv with sgd

mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
/usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
2* --predictors sepallength sepalwidth petallength petalwidth --types n


mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
--model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion

AUC = 0.14
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.6, -0.3], [-0.8, -0.4]]

>> AUC seems to poor. Now changed --predictors

mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
/usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
2* --predictors sepalwidth petallength --types n n

mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
--model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
--scores

AUC = 0.80
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.7, -0.3], [-0.7, -0.4]]

AUC is improved, however from confusion matrix seems everything is
classified as class a.

Below is the output.

"target","model-output","log-likelihood"
0,0.492,-0.677017
0,0.493,-0.679192
0,0.493,-0.678355
0,0.493,-0.678724
0,0.492,-0.676583
0,0.491,-0.675182
0,0.492,-0.677452
0,0.492,-0.677419
0,0.493,-0.679628
0,0.493,-0.678724
0,0.491,-0.676116
0,0.492,-0.677386
0,0.493,-0.679192
0,0.493,-0.679291
0,0.491,-0.674912
0,0.490,-0.673081
0,0.491,-0.675313
0,0.492,-0.677017
0,0.491,-0.675616
0,0.491,-0.675682
0,0.492,-0.677353
0,0.491,-0.676116
0,0.492,-0.676714
0,0.492,-0.677788
0,0.492,-0.677287
0,0.493,-0.679126
0,0.492,-0.677386
0,0.492,-0.676984
0,0.492,-0.677452
0,0.492,-0.678256
0,0.493,-0.678691
0,0.492,-0.677419
0,0.491,-0.674381
0,0.490,-0.673980
0,0.493,-0.678724
0,0.493,-0.678387
0,0.492,-0.677050
0,0.493,-0.678724
0,0.493,-0.679225
0,0.492,-0.677419
0,0.492,-0.677050
0,0.495,-0.682279
0,0.493,-0.678355
0,0.492,-0.676951
0,0.491,-0.675550
0,0.493,-0.679192
0,0.491,-0.675649
0,0.493,-0.678322
0,0.491,-0.676116
0,0.492,-0.677887
1,0.492,-0.709316
1,0.492,-0.709248
1,0.492,-0.708935
1,0.494,-0.705048
1,0.493,-0.707488
1,0.493,-0.707454
1,0.492,-0.709765
1,0.494,-0.705258
1,0.493,-0.707936
1,0.493,-0.706803
1,0.495,-0.703539
1,0.493,-0.708249
1,0.494,-0.704601
1,0.493,-0.707970
1,0.493,-0.707597
1,0.492,-0.708765
1,0.492,-0.708351
1,0.493,-0.706871
1,0.494,-0.704770
1,0.494,-0.705908
1,0.492,-0.709350
1,0.493,-0.707285
1,0.493,-0.706247
1,0.493,-0.707522
1,0.493,-0.707835
1,0.492,-0.708317
1,0.493,-0.707556
1,0.492,-0.708520
1,0.493,-0.707902
1,0.494,-0.706220
1,0.494,-0.705427
1,0.494,-0.705393
1,0.493,-0.706803
1,0.493,-0.707210
1,0.492,-0.708351
1,0.492,-0.710146
1,0.492,-0.708867
1,0.494,-0.705183
1,0.493,-0.708215
1,0.494,-0.705942
1,0.493,-0.706525
1,0.492,-0.708385
1,0.493,-0.706389
1,0.494,-0.704811
1,0.493,-0.706905
1,0.493,-0.708249
1,0.493,-0.707801
1,0.493,-0.707835
1,0.494,-0.705604
1,0.493,-0.707319

AUC = 0.80
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.7, -0.3], [-0.7, -0.4]]


On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <td...@maprtech.com> wrote:

> Harsh,
>
> THanks for the plug.  Rajesh has been talking to us.
>
>
> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> Hi Rajesh,
>>
>> Please head over to the Apache Mahout project. See
>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>
>> Apache Mahout is homed at http://mahout.apache.org and works well with
>> Hadoop MR, etc..
>>
>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Could you please suggest Logistic regression package that could be used
>> on
>> > Hadoop ?
>> > I have large data and looking for LR package with kernel supports.
>> >
>> > Thanks
>> > Rajesh
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Logistic regression package on Hadoop

Posted by Rajesh Nikam <ra...@gmail.com>.
Hi Harsh,

Thanks for giving link for sgd from mahout.

I have asked question on issue with using sgd. Below is description of it.
Ted Dunning has mentioned their may be some issue with data encoding.

However I am not able to point issue. Could you please let me know what is
issue its format or usage.

Attached uses input files

I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
Converted this to csv file just by updating header: iris-3-classes.csv

mahout org.apache.mahout.classifier.
sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output
/usr/local/mahout/trunk/
*iris-3-classes.model* --target class *--categories 3* --predictors
sepallength sepalwidth petallength petalwidth --types n

>> it gave following error.
Exception in thread "main" java.lang.IllegalArgumentException: Can only
call classifyScalar with two categories

Now created csv with only 2 classes. PFA iris-2-classes.csv

>> trained iris-2-classes.csv with sgd

mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
/usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
2* --predictors sepallength sepalwidth petallength petalwidth --types n


mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
--model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion

AUC = 0.14
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.6, -0.3], [-0.8, -0.4]]

>> AUC seems to poor. Now changed --predictors

mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
/usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
2* --predictors sepalwidth petallength --types n n

mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
--model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
--scores

AUC = 0.80
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.7, -0.3], [-0.7, -0.4]]

AUC is improved, however from confusion matrix seems everything is
classified as class a.

Below is the output.

"target","model-output","log-likelihood"
0,0.492,-0.677017
0,0.493,-0.679192
0,0.493,-0.678355
0,0.493,-0.678724
0,0.492,-0.676583
0,0.491,-0.675182
0,0.492,-0.677452
0,0.492,-0.677419
0,0.493,-0.679628
0,0.493,-0.678724
0,0.491,-0.676116
0,0.492,-0.677386
0,0.493,-0.679192
0,0.493,-0.679291
0,0.491,-0.674912
0,0.490,-0.673081
0,0.491,-0.675313
0,0.492,-0.677017
0,0.491,-0.675616
0,0.491,-0.675682
0,0.492,-0.677353
0,0.491,-0.676116
0,0.492,-0.676714
0,0.492,-0.677788
0,0.492,-0.677287
0,0.493,-0.679126
0,0.492,-0.677386
0,0.492,-0.676984
0,0.492,-0.677452
0,0.492,-0.678256
0,0.493,-0.678691
0,0.492,-0.677419
0,0.491,-0.674381
0,0.490,-0.673980
0,0.493,-0.678724
0,0.493,-0.678387
0,0.492,-0.677050
0,0.493,-0.678724
0,0.493,-0.679225
0,0.492,-0.677419
0,0.492,-0.677050
0,0.495,-0.682279
0,0.493,-0.678355
0,0.492,-0.676951
0,0.491,-0.675550
0,0.493,-0.679192
0,0.491,-0.675649
0,0.493,-0.678322
0,0.491,-0.676116
0,0.492,-0.677887
1,0.492,-0.709316
1,0.492,-0.709248
1,0.492,-0.708935
1,0.494,-0.705048
1,0.493,-0.707488
1,0.493,-0.707454
1,0.492,-0.709765
1,0.494,-0.705258
1,0.493,-0.707936
1,0.493,-0.706803
1,0.495,-0.703539
1,0.493,-0.708249
1,0.494,-0.704601
1,0.493,-0.707970
1,0.493,-0.707597
1,0.492,-0.708765
1,0.492,-0.708351
1,0.493,-0.706871
1,0.494,-0.704770
1,0.494,-0.705908
1,0.492,-0.709350
1,0.493,-0.707285
1,0.493,-0.706247
1,0.493,-0.707522
1,0.493,-0.707835
1,0.492,-0.708317
1,0.493,-0.707556
1,0.492,-0.708520
1,0.493,-0.707902
1,0.494,-0.706220
1,0.494,-0.705427
1,0.494,-0.705393
1,0.493,-0.706803
1,0.493,-0.707210
1,0.492,-0.708351
1,0.492,-0.710146
1,0.492,-0.708867
1,0.494,-0.705183
1,0.493,-0.708215
1,0.494,-0.705942
1,0.493,-0.706525
1,0.492,-0.708385
1,0.493,-0.706389
1,0.494,-0.704811
1,0.493,-0.706905
1,0.493,-0.708249
1,0.493,-0.707801
1,0.493,-0.707835
1,0.494,-0.705604
1,0.493,-0.707319

AUC = 0.80
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.7, -0.3], [-0.7, -0.4]]


On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <td...@maprtech.com> wrote:

> Harsh,
>
> THanks for the plug.  Rajesh has been talking to us.
>
>
> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> Hi Rajesh,
>>
>> Please head over to the Apache Mahout project. See
>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>
>> Apache Mahout is homed at http://mahout.apache.org and works well with
>> Hadoop MR, etc..
>>
>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Could you please suggest Logistic regression package that could be used
>> on
>> > Hadoop ?
>> > I have large data and looking for LR package with kernel supports.
>> >
>> > Thanks
>> > Rajesh
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Logistic regression package on Hadoop

Posted by Rajesh Nikam <ra...@gmail.com>.
Hi Harsh,

Thanks for giving link for sgd from mahout.

I have asked question on issue with using sgd. Below is description of it.
Ted Dunning has mentioned their may be some issue with data encoding.

However I am not able to point issue. Could you please let me know what is
issue its format or usage.

Attached uses input files

I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
Converted this to csv file just by updating header: iris-3-classes.csv

mahout org.apache.mahout.classifier.
sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output
/usr/local/mahout/trunk/
*iris-3-classes.model* --target class *--categories 3* --predictors
sepallength sepalwidth petallength petalwidth --types n

>> it gave following error.
Exception in thread "main" java.lang.IllegalArgumentException: Can only
call classifyScalar with two categories

Now created csv with only 2 classes. PFA iris-2-classes.csv

>> trained iris-2-classes.csv with sgd

mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
/usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
2* --predictors sepallength sepalwidth petallength petalwidth --types n


mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
--model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion

AUC = 0.14
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.6, -0.3], [-0.8, -0.4]]

>> AUC seems to poor. Now changed --predictors

mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
/usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
/usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
2* --predictors sepalwidth petallength --types n n

mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
--model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
--scores

AUC = 0.80
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.7, -0.3], [-0.7, -0.4]]

AUC is improved, however from confusion matrix seems everything is
classified as class a.

Below is the output.

"target","model-output","log-likelihood"
0,0.492,-0.677017
0,0.493,-0.679192
0,0.493,-0.678355
0,0.493,-0.678724
0,0.492,-0.676583
0,0.491,-0.675182
0,0.492,-0.677452
0,0.492,-0.677419
0,0.493,-0.679628
0,0.493,-0.678724
0,0.491,-0.676116
0,0.492,-0.677386
0,0.493,-0.679192
0,0.493,-0.679291
0,0.491,-0.674912
0,0.490,-0.673081
0,0.491,-0.675313
0,0.492,-0.677017
0,0.491,-0.675616
0,0.491,-0.675682
0,0.492,-0.677353
0,0.491,-0.676116
0,0.492,-0.676714
0,0.492,-0.677788
0,0.492,-0.677287
0,0.493,-0.679126
0,0.492,-0.677386
0,0.492,-0.676984
0,0.492,-0.677452
0,0.492,-0.678256
0,0.493,-0.678691
0,0.492,-0.677419
0,0.491,-0.674381
0,0.490,-0.673980
0,0.493,-0.678724
0,0.493,-0.678387
0,0.492,-0.677050
0,0.493,-0.678724
0,0.493,-0.679225
0,0.492,-0.677419
0,0.492,-0.677050
0,0.495,-0.682279
0,0.493,-0.678355
0,0.492,-0.676951
0,0.491,-0.675550
0,0.493,-0.679192
0,0.491,-0.675649
0,0.493,-0.678322
0,0.491,-0.676116
0,0.492,-0.677887
1,0.492,-0.709316
1,0.492,-0.709248
1,0.492,-0.708935
1,0.494,-0.705048
1,0.493,-0.707488
1,0.493,-0.707454
1,0.492,-0.709765
1,0.494,-0.705258
1,0.493,-0.707936
1,0.493,-0.706803
1,0.495,-0.703539
1,0.493,-0.708249
1,0.494,-0.704601
1,0.493,-0.707970
1,0.493,-0.707597
1,0.492,-0.708765
1,0.492,-0.708351
1,0.493,-0.706871
1,0.494,-0.704770
1,0.494,-0.705908
1,0.492,-0.709350
1,0.493,-0.707285
1,0.493,-0.706247
1,0.493,-0.707522
1,0.493,-0.707835
1,0.492,-0.708317
1,0.493,-0.707556
1,0.492,-0.708520
1,0.493,-0.707902
1,0.494,-0.706220
1,0.494,-0.705427
1,0.494,-0.705393
1,0.493,-0.706803
1,0.493,-0.707210
1,0.492,-0.708351
1,0.492,-0.710146
1,0.492,-0.708867
1,0.494,-0.705183
1,0.493,-0.708215
1,0.494,-0.705942
1,0.493,-0.706525
1,0.492,-0.708385
1,0.493,-0.706389
1,0.494,-0.704811
1,0.493,-0.706905
1,0.493,-0.708249
1,0.493,-0.707801
1,0.493,-0.707835
1,0.494,-0.705604
1,0.493,-0.707319

AUC = 0.80
confusion: [[50.0, 50.0], [0.0, 0.0]]
entropy: [[-0.7, -0.3], [-0.7, -0.4]]


On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <td...@maprtech.com> wrote:

> Harsh,
>
> THanks for the plug.  Rajesh has been talking to us.
>
>
> On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> Hi Rajesh,
>>
>> Please head over to the Apache Mahout project. See
>> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>>
>> Apache Mahout is homed at http://mahout.apache.org and works well with
>> Hadoop MR, etc..
>>
>> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Could you please suggest Logistic regression package that could be used
>> on
>> > Hadoop ?
>> > I have large data and looking for LR package with kernel supports.
>> >
>> > Thanks
>> > Rajesh
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Logistic regression package on Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Harsh,

THanks for the plug.  Rajesh has been talking to us.

On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Rajesh,
>
> Please head over to the Apache Mahout project. See
> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>
> Apache Mahout is homed at http://mahout.apache.org and works well with
> Hadoop MR, etc..
>
> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
> wrote:
> > Hi,
> >
> > Could you please suggest Logistic regression package that could be used
> on
> > Hadoop ?
> > I have large data and looking for LR package with kernel supports.
> >
> > Thanks
> > Rajesh
> >
> >
>
>
>
> --
> Harsh J
>

Re: Logistic regression package on Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Harsh,

THanks for the plug.  Rajesh has been talking to us.

On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Rajesh,
>
> Please head over to the Apache Mahout project. See
> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>
> Apache Mahout is homed at http://mahout.apache.org and works well with
> Hadoop MR, etc..
>
> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
> wrote:
> > Hi,
> >
> > Could you please suggest Logistic regression package that could be used
> on
> > Hadoop ?
> > I have large data and looking for LR package with kernel supports.
> >
> > Thanks
> > Rajesh
> >
> >
>
>
>
> --
> Harsh J
>

Re: Logistic regression package on Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Harsh,

THanks for the plug.  Rajesh has been talking to us.

On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Rajesh,
>
> Please head over to the Apache Mahout project. See
> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>
> Apache Mahout is homed at http://mahout.apache.org and works well with
> Hadoop MR, etc..
>
> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
> wrote:
> > Hi,
> >
> > Could you please suggest Logistic regression package that could be used
> on
> > Hadoop ?
> > I have large data and looking for LR package with kernel supports.
> >
> > Thanks
> > Rajesh
> >
> >
>
>
>
> --
> Harsh J
>

Re: Logistic regression package on Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Harsh,

THanks for the plug.  Rajesh has been talking to us.

On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Rajesh,
>
> Please head over to the Apache Mahout project. See
> https://cwiki.apache.org/MAHOUT/logistic-regression.html
>
> Apache Mahout is homed at http://mahout.apache.org and works well with
> Hadoop MR, etc..
>
> On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com>
> wrote:
> > Hi,
> >
> > Could you please suggest Logistic regression package that could be used
> on
> > Hadoop ?
> > I have large data and looking for LR package with kernel supports.
> >
> > Thanks
> > Rajesh
> >
> >
>
>
>
> --
> Harsh J
>

Re: Logistic regression package on Hadoop

Posted by Harsh J <ha...@cloudera.com>.
Hi Rajesh,

Please head over to the Apache Mahout project. See
https://cwiki.apache.org/MAHOUT/logistic-regression.html

Apache Mahout is homed at http://mahout.apache.org and works well with
Hadoop MR, etc..

On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com> wrote:
> Hi,
>
> Could you please suggest Logistic regression package that could be used on
> Hadoop ?
> I have large data and looking for LR package with kernel supports.
>
> Thanks
> Rajesh
>
>



-- 
Harsh J

Re: Logistic regression package on Hadoop

Posted by Harsh J <ha...@cloudera.com>.
Hi Rajesh,

Please head over to the Apache Mahout project. See
https://cwiki.apache.org/MAHOUT/logistic-regression.html

Apache Mahout is homed at http://mahout.apache.org and works well with
Hadoop MR, etc..

On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com> wrote:
> Hi,
>
> Could you please suggest Logistic regression package that could be used on
> Hadoop ?
> I have large data and looking for LR package with kernel supports.
>
> Thanks
> Rajesh
>
>



-- 
Harsh J

Re: Logistic regression package on Hadoop

Posted by Harsh J <ha...@cloudera.com>.
Hi Rajesh,

Please head over to the Apache Mahout project. See
https://cwiki.apache.org/MAHOUT/logistic-regression.html

Apache Mahout is homed at http://mahout.apache.org and works well with
Hadoop MR, etc..

On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com> wrote:
> Hi,
>
> Could you please suggest Logistic regression package that could be used on
> Hadoop ?
> I have large data and looking for LR package with kernel supports.
>
> Thanks
> Rajesh
>
>



-- 
Harsh J

Re: Logistic regression package on Hadoop

Posted by Harsh J <ha...@cloudera.com>.
Hi Rajesh,

Please head over to the Apache Mahout project. See
https://cwiki.apache.org/MAHOUT/logistic-regression.html

Apache Mahout is homed at http://mahout.apache.org and works well with
Hadoop MR, etc..

On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <ra...@gmail.com> wrote:
> Hi,
>
> Could you please suggest Logistic regression package that could be used on
> Hadoop ?
> I have large data and looking for LR package with kernel supports.
>
> Thanks
> Rajesh
>
>



-- 
Harsh J