You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Xiaobo Gu <gu...@gmail.com> on 2011/05/02 17:58:39 UTC

Re: Is any more detailed documentation aout the sgd logistic regression example.

In our environments data will be prepared inside the relational data
warehouse, and then export as csv files, that's the trainlogistic
command line works well for us, but we will have both numeric and
category predictor variables, does SGD support category variables, and
are there examples about this? because I think the results bellow does
not apply for category variables,

color ~ -0.157*Intercept Term + -0.678*x + -0.416*y
Intercept Term -0.15655
x -0.67841
y -0.41587

On Fri, Apr 22, 2011 at 6:16 AM, Ted Dunning <te...@gmail.com> wrote:
> The trainlogistic command is (as Stanley says) only a simple example.
>
> You will need to write a program something like TrainNewsGroups for your
> modelers to use.
>
> I agree that the API oriented code in Mahout is not what those users need.
>  I was, however, what my users needed.
>
> It would be great if you would like to contribute a good command line for
> the more advanced SGD classifier training
> API.
>
> On Tue, Apr 19, 2011 at 10:51 PM, Stanley Xu <we...@gmail.com> wrote:
>
>> Hi Xiaobo,
>>
>> You could check the chapter 13-16 from <Mahout In Action>, it provided all
>> the parameters the command line tool of 'mahout trainlogistic' could use.
>> But the trainlogistic command is still only a simple example. If you wanted
>> to use that in a production environment, you still have to write the feature
>> encode code by yourself. The code you need to write is pretty easy, just
>> parse the input and put that in a Vector and let the LR train the data.
>>
>> Best wishes,
>> Stanley Xu
>>
>>
>>
>>
>> On Tue, Apr 19, 2011 at 9:09 PM, XiaoboGu <gu...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for your reply, after some reading of the wiki pages, I think what
>>> I want is a Logistic Regression command-line, since the target users of
>>> Mahout are data analysts, who can't write Java code, a command line is more
>>> convenient. Some specific questions are :
>>> 1. What format should we apply when preparing data for logistic
>>> regression, can we use csv, and should we put the value for the target
>>> variable as the first column in every row the csv file.
>>> 2. What options can we support to the command line if there is one.
>>> 3. How can interpret the results.
>>>
>>> Because Logistic Regression is the working horse of credit scoring in
>>> industry, I think it will make Mahout friends of more analysts if LR support
>>> is smooth.
>>>
>>> Regards,
>>>
>>> Xiaobo Gu
>>>
>>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>>> Sent: Wednesday, April 13, 2011 1:02 AM
>>> To: user@mahout.apache.org
>>> Cc: Xiaobo Gu
>>> Subject: Re: Is any more detailed documentation aout the sgd logistic
>>> regression example.
>>>
>>> Can you be more specific about what you have and what you want?
>>>
>>> The book Mahout in Action provides quite a lot of details with sample code
>>> for a server farm.
>>>
>>> The TrainNewsGroups example provides code that you can copy.
>>>
>>> Do you have these resources?  Do you want more?  Did you want more theory?
>>>
>>> On Tue, Apr 12, 2011 at 9:11 AM, Xiaobo Gu <gu...@gmail.com>
>>> wrote:
>>> Hi,
>>> Documents about sgd logistic regression itself are welcome too.
>>> Regards,
>>>
>>> Xiaobo Gu
>>>
>>>
>>>
>>
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

For small problems, you can even retain the training data in memory for
maximum speed.


On Fri, Aug 5, 2011 at 9:59 PM, Xiaobo Gu <gu...@gmail.com> wrote:

> Hi Stanley,
> Can you help with this:
>
>  You might encode the
>  feature to vector and serialize them to the file system by MapReduce to
>  reduce cost on data parsing.
>
> And I have started a new thread on
>
>
> http://mail-archives.apache.org/mod_mbox/mahout-dev/201108.mbox/%3cCACOCgckzcAm4V8y3CQhnBWtUy9jVgAbKzE1R+z6zpQAF=8XLEg@mail.gmail.com%3e
>
> > Best wishes,
> > Stanley Xu
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

For small problems, you can even retain the training data in memory for
maximum speed.


On Fri, Aug 5, 2011 at 9:59 PM, Xiaobo Gu <gu...@gmail.com> wrote:

> Hi Stanley,
> Can you help with this:
>
>  You might encode the
>  feature to vector and serialize them to the file system by MapReduce to
>  reduce cost on data parsing.
>
> And I have started a new thread on
>
>
> http://mail-archives.apache.org/mod_mbox/mahout-dev/201108.mbox/%3cCACOCgckzcAm4V8y3CQhnBWtUy9jVgAbKzE1R+z6zpQAF=8XLEg@mail.gmail.com%3e
>
> > Best wishes,
> > Stanley Xu
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Xiaobo Gu <gu...@gmail.com>.

Hi Stanley,
Can you help with this:

 You might encode the
 feature to vector and serialize them to the file system by MapReduce to
 reduce cost on data parsing.

And I have started a new thread on

http://mail-archives.apache.org/mod_mbox/mahout-dev/201108.mbox/%3cCACOCgckzcAm4V8y3CQhnBWtUy9jVgAbKzE1R+z6zpQAF=8XLEg@mail.gmail.com%3e

> Best wishes,
> Stanley Xu

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Xiaobo Gu <gu...@gmail.com>.

Hi Stanley,
Can you help with this:

 You might encode the
 feature to vector and serialize them to the file system by MapReduce to
 reduce cost on data parsing.

And I have started a new thread on

http://mail-archives.apache.org/mod_mbox/mahout-dev/201108.mbox/%3cCACOCgckzcAm4V8y3CQhnBWtUy9jVgAbKzE1R+z6zpQAF=8XLEg@mail.gmail.com%3e

> Best wishes,
> Stanley Xu

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

Please file a bug report at http://issues.apache.org/jira/browse/MAHOUT

Attach a diff file with the extension .patch.  Create the diff at the mahout
top directory.

On Tue, May 17, 2011 at 8:49 PM, Xiaobo Gu <gu...@gmail.com> wrote:

> I have write a command line program proto type for RunAdaptiveLogistic,
> 1. How can I make it invokeable from mahout
> 2. Can  you help to fine tune the AdaptiveLogisticRegression creating
> and settings to make it make sense.
>
>
>
> On Tue, May 10, 2011 at 11:30 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > Great idea.  Why don't you implement something like what you need?
>  Others
> > will be happy to contribute improvements.
> >
> > On Tue, May 10, 2011 at 8:26 AM, XiaoboGu <gu...@gmail.com>
> wrote:
> >
> >> > There isn't a good command line for this, largely because it is
> difficult
> >> to
> >> > describe how to convert each CSV field.  There is some beginnings of
> >> efforts
> >> > on this, but the results are still limit.
> >>
> >> In common usages the predictor variables are almost number or category
> >> variables encoded into numbers, so an unify CSV file converter is
> possible
> >> for data with only these data types.
> >>
> >
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Xiaobo Gu <gu...@gmail.com>.

I have write a command line program proto type for RunAdaptiveLogistic,
1. How can I make it invokeable from mahout
2. Can  you help to fine tune the AdaptiveLogisticRegression creating
and settings to make it make sense.



On Tue, May 10, 2011 at 11:30 PM, Ted Dunning <te...@gmail.com> wrote:
> Great idea.  Why don't you implement something like what you need?  Others
> will be happy to contribute improvements.
>
> On Tue, May 10, 2011 at 8:26 AM, XiaoboGu <gu...@gmail.com> wrote:
>
>> > There isn't a good command line for this, largely because it is difficult
>> to
>> > describe how to convert each CSV field.  There is some beginnings of
>> efforts
>> > on this, but the results are still limit.
>>
>> In common usages the predictor variables are almost number or category
>> variables encoded into numbers, so an unify CSV file converter is possible
>> for data with only these data types.
>>
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

Great idea.  Why don't you implement something like what you need?  Others
will be happy to contribute improvements.

On Tue, May 10, 2011 at 8:26 AM, XiaoboGu <gu...@gmail.com> wrote:

> > There isn't a good command line for this, largely because it is difficult
> to
> > describe how to convert each CSV field.  There is some beginnings of
> efforts
> > on this, but the results are still limit.
>
> In common usages the predictor variables are almost number or category
> variables encoded into numbers, so an unify CSV file converter is possible
> for data with only these data types.
>

RE: Is any more detailed documentation aout the sgd logistic regression example.

Posted by XiaoboGu <gu...@gmail.com>.


> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Thursday, May 05, 2011 11:22 PM
> To: user@mahout.apache.org
> Subject: Re: Is any more detailed documentation aout the sgd logistic regression example.
> 
> On Thu, May 5, 2011 at 7:48 AM, Xiaobo Gu <gu...@gmail.com> wrote:
> 
> > On Thu, May 5, 2011 at 10:40 PM, Stanley Xu <we...@gmail.com> wrote:
> > > 1. You could use the command line to add shape as category features, it
> > will
> > > hash categoryname=value as the feature and set the value as 1.0, it is
> > the
> > > standard way to convert a category feature to multiple numeric
> > > feature(convert to 0/1 feature)
> >
> > Can we just use "word" type for category predictor variables?
> >
> 
> Yes.
> 
> 
> > > 2. In production mode, don't use csv, you will find most of the time
> > spent
> > > are on parse the csv data and hash them to features. You might encode the
> > > feature to vector and serialize them to the file system by MapReduce to
> > > reduce cost on data parsing.
> >
> > Currentlly we are not familiar with Vectors, is there a standard way
> > (command line )to encode csv files into Vector and serialize them into
> > file system,
> >
> 
> There isn't a good command line for this, largely because it is difficult to
> describe how to convert each CSV field.  There is some beginnings of efforts
> on this, but the results are still limit.

In common usages the predictor variables are almost number or category variables encoded into numbers, so an unify CSV file converter is possible for data with only these data types.


> 
> > And what do you mean by "file system", local file system or HDFS,
> > because you mentioned MapReduce
> >
> 
> That shouldn't much matter.

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

There are a few others as well.

>From the code, there are these:

 public void setInterval(int interval)
 public void setInterval(int minInterval, int maxInterval)
 public void setPoolSize(int poolSize)
 public void setThreadCount(int threadCount)
 public void setAucEvaluator(OnlineAuc auc)
 private void setupOptimizer(int poolSize)
 public void setBest(State<Wrapper, CrossFoldLearner> best)
 public void setRecord(int record)
 public void setBuffer(List<TrainingExample> buffer)
 public void setEp(EvolutionaryProcess<Wrapper, CrossFoldLearner> ep)
 public void setSeed(State<Wrapper, CrossFoldLearner> seed)
 public void setAveragingWindow(int averagingWindow)
 public void setFreezeSurvivors(boolean freezeSurvivors)


Aside from the ones you mention and the setAucEvaluator, most of these
should not be used.  There are also a number of other indirect knobs
available if you access, for instance, the underlying evolutionary
algorithm, or set options on the AUC evaluator or the prior.


On Thu, May 19, 2011 at 10:23 PM, Xiaobo Gu <gu...@gmail.com> wrote:

> Hi Ted,
>
> Are interval, averagingWindow, thread count, and prior Fuction the
> only four tuneable options of AdaptiveLogisticRegression?
>
> Regards,
>
> Xiaobo Gu
>
>
> On Tue, May 10, 2011 at 11:26 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > In the meantime, look at building your own command line tool for
> > AdaptiveLogisticRegression.
> >
> > On Tue, May 10, 2011 at 8:25 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> >> Go for it.
> >>
> >> Produce a JIRA and a patch.
> >>
> >>
> >> On Tue, May 10, 2011 at 8:19 AM, XiaoboGu <gu...@gmail.com>
> wrote:
> >>
> >>> Can you add a --algorithm option to the trainlogistic and runlogistic
> >>> program, and other options need by specific algorithms, such as using
> L1 or
> >>> L2 prior, then TL and RL will be production ready tool for us.
> >>
> >>
> >>
> >
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Xiaobo Gu <gu...@gmail.com>.

Hi Ted,

Are interval, averagingWindow, thread count, and prior Fuction the
only four tuneable options of AdaptiveLogisticRegression?

Regards,

Xiaobo Gu


On Tue, May 10, 2011 at 11:26 PM, Ted Dunning <te...@gmail.com> wrote:
> In the meantime, look at building your own command line tool for
> AdaptiveLogisticRegression.
>
> On Tue, May 10, 2011 at 8:25 AM, Ted Dunning <te...@gmail.com> wrote:
>
>> Go for it.
>>
>> Produce a JIRA and a patch.
>>
>>
>> On Tue, May 10, 2011 at 8:19 AM, XiaoboGu <gu...@gmail.com> wrote:
>>
>>> Can you add a --algorithm option to the trainlogistic and runlogistic
>>> program, and other options need by specific algorithms, such as using L1 or
>>> L2 prior, then TL and RL will be production ready tool for us.
>>
>>
>>
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

In the meantime, look at building your own command line tool for
AdaptiveLogisticRegression.

On Tue, May 10, 2011 at 8:25 AM, Ted Dunning <te...@gmail.com> wrote:

> Go for it.
>
> Produce a JIRA and a patch.
>
>
> On Tue, May 10, 2011 at 8:19 AM, XiaoboGu <gu...@gmail.com> wrote:
>
>> Can you add a --algorithm option to the trainlogistic and runlogistic
>> program, and other options need by specific algorithms, such as using L1 or
>> L2 prior, then TL and RL will be production ready tool for us.
>
>
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

Go for it.

Produce a JIRA and a patch.

On Tue, May 10, 2011 at 8:19 AM, XiaoboGu <gu...@gmail.com> wrote:

> Can you add a --algorithm option to the trainlogistic and runlogistic
> program, and other options need by specific algorithms, such as using L1 or
> L2 prior, then TL and RL will be production ready tool for us.

RE: Is any more detailed documentation aout the sgd logistic regression example.

Posted by XiaoboGu <gu...@gmail.com>.


> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Sunday, May 08, 2011 4:23 AM
> To: user@mahout.apache.org
> Subject: Re: Is any more detailed documentation aout the sgd logistic regression example.
> 
> You can't do that directly.
> 
> You can use the http address of the file in HDFS.


In our environment all users are csv oriented, and the csv data to be analyzed are almost generated inside HDFS, so it will save time if the programs can read from HDFS directly, 
And I think the open function of class TrainLogistic is the only piece of code to change if we want this, is it?



> Note also that trainlogistic and runlogistic are intended pretty much only
> for simple demonstration purposes.

Can you add a --algorithm option to the trainlogistic and runlogistic program, and other options need by specific algorithms, such as using L1 or L2 prior, then TL and RL will be production ready tool for us.


> For large scale data, you should use AdaptiveLogisticRegression
> 
> 2011/5/7 Xiaobo Gu <gu...@gmail.com>
> 
> > trainlogistic and runlogistic
> >
> > 2011/5/7, Ted Dunning <te...@gmail.com>:
> > > Huh?
> > >
> > > What program are you talking about?
> > >
> > >>
> > >> How can I specify a HDFS URI for the --input option
> > >
> >
> > --
> > 从我的移动设备发送
> >

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Xiaobo Gu <gu...@gmail.com>.

On Sun, May 8, 2011 at 4:22 AM, Ted Dunning <te...@gmail.com> wrote:
> You can't do that directly.
>
> You can use the http address of the file in HDFS.

What's the HTTP URL for a example file named /data/gpwext/data.csv

HTTP://namenode:8020/data/gpwext/data.csv?

>
> Note also that trainlogistic and runlogistic are intended pretty much only
> for simple demonstration purposes.
>
> For large scale data, you should use AdaptiveLogisticRegression
>
> 2011/5/7 Xiaobo Gu <gu...@gmail.com>
>
>> trainlogistic and runlogistic
>>
>> 2011/5/7, Ted Dunning <te...@gmail.com>:
>> > Huh?
>> >
>> > What program are you talking about?
>> >
>> >>
>> >> How can I specify a HDFS URI for the --input option
>> >
>>
>> --
>> 从我的移动设备发送
>>
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

You can't do that directly.

You can use the http address of the file in HDFS.

Note also that trainlogistic and runlogistic are intended pretty much only
for simple demonstration purposes.

For large scale data, you should use AdaptiveLogisticRegression

2011/5/7 Xiaobo Gu <gu...@gmail.com>

> trainlogistic and runlogistic
>
> 2011/5/7, Ted Dunning <te...@gmail.com>:
> > Huh?
> >
> > What program are you talking about?
> >
> >>
> >> How can I specify a HDFS URI for the --input option
> >
>
> --
> 从我的移动设备发送
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Xiaobo Gu <gu...@gmail.com>.

trainlogistic and runlogistic

2011/5/7, Ted Dunning <te...@gmail.com>:
> Huh?
>
> What program are you talking about?
>
> On Fri, May 6, 2011 at 9:36 PM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> >> > 2. In production mode, don't use csv, you will find most of the time
>> >> spent
>> >> > are on parse the csv data and hash them to features. You might encode
>> the
>> >> > feature to vector and serialize them to the file system by MapReduce
>> to
>> >> > reduce cost on data parsing.
>> >>
>> >> Currentlly we are not familiar with Vectors, is there a standard way
>> >> (command line )to encode csv files into Vector and serialize them into
>> >> file system,
>> >>
>> >
>> > There isn't a good command line for this, largely because it is
>> > difficult
>> to
>> > describe how to convert each CSV field.  There is some beginnings of
>> efforts
>> > on this, but the results are still limit.
>> >
>> >
>> >> And what do you mean by "file system", local file system or HDFS,
>> >> because you mentioned MapReduce
>>
>> How can I specify a HDFS URI for the --input option
>

-- 
从我的移动设备发送

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

Huh?

What program are you talking about?

On Fri, May 6, 2011 at 9:36 PM, Xiaobo Gu <gu...@gmail.com> wrote:

> >> > 2. In production mode, don't use csv, you will find most of the time
> >> spent
> >> > are on parse the csv data and hash them to features. You might encode
> the
> >> > feature to vector and serialize them to the file system by MapReduce
> to
> >> > reduce cost on data parsing.
> >>
> >> Currentlly we are not familiar with Vectors, is there a standard way
> >> (command line )to encode csv files into Vector and serialize them into
> >> file system,
> >>
> >
> > There isn't a good command line for this, largely because it is difficult
> to
> > describe how to convert each CSV field.  There is some beginnings of
> efforts
> > on this, but the results are still limit.
> >
> >
> >> And what do you mean by "file system", local file system or HDFS,
> >> because you mentioned MapReduce
>
> How can I specify a HDFS URI for the --input option

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Xiaobo Gu <gu...@gmail.com>.

On Thu, May 5, 2011 at 11:21 PM, Ted Dunning <te...@gmail.com> wrote:
> On Thu, May 5, 2011 at 7:48 AM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> On Thu, May 5, 2011 at 10:40 PM, Stanley Xu <we...@gmail.com> wrote:
>> > 1. You could use the command line to add shape as category features, it
>> will
>> > hash categoryname=value as the feature and set the value as 1.0, it is
>> the
>> > standard way to convert a category feature to multiple numeric
>> > feature(convert to 0/1 feature)
>>
>> Can we just use "word" type for category predictor variables?
>>
>
> Yes.
>
>
>> > 2. In production mode, don't use csv, you will find most of the time
>> spent
>> > are on parse the csv data and hash them to features. You might encode the
>> > feature to vector and serialize them to the file system by MapReduce to
>> > reduce cost on data parsing.
>>
>> Currentlly we are not familiar with Vectors, is there a standard way
>> (command line )to encode csv files into Vector and serialize them into
>> file system,
>>
>
> There isn't a good command line for this, largely because it is difficult to
> describe how to convert each CSV field.  There is some beginnings of efforts
> on this, but the results are still limit.
>
>
>> And what do you mean by "file system", local file system or HDFS,
>> because you mentioned MapReduce

How can I specify a HDFS URI for the --input option ?

>
> That shouldn't much matter.
>

Re: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Ted Dunning <te...@gmail.com>.

On Thu, May 5, 2011 at 7:48 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> On Thu, May 5, 2011 at 10:40 PM, Stanley Xu <we...@gmail.com> wrote:
> > 1. You could use the command line to add shape as category features, it
> will
> > hash categoryname=value as the feature and set the value as 1.0, it is
> the
> > standard way to convert a category feature to multiple numeric
> > feature(convert to 0/1 feature)
>
> Can we just use "word" type for category predictor variables?
>

Yes.


> > 2. In production mode, don't use csv, you will find most of the time
> spent
> > are on parse the csv data and hash them to features. You might encode the
> > feature to vector and serialize them to the file system by MapReduce to
> > reduce cost on data parsing.
>
> Currentlly we are not familiar with Vectors, is there a standard way
> (command line )to encode csv files into Vector and serialize them into
> file system,
>

There isn't a good command line for this, largely because it is difficult to
describe how to convert each CSV field.  There is some beginnings of efforts
on this, but the results are still limit.


> And what do you mean by "file system", local file system or HDFS,
> because you mentioned MapReduce
>

That shouldn't much matter.

Fwd: Is any more detailed documentation aout the sgd logistic regression example.

Posted by Xiaobo Gu <gu...@gmail.com>.

On Thu, May 5, 2011 at 10:40 PM, Stanley Xu <we...@gmail.com> wrote:
> 1. You could use the command line to add shape as category features, it will
> hash categoryname=value as the feature and set the value as 1.0, it is the
> standard way to convert a category feature to multiple numeric
> feature(convert to 0/1 feature)

Can we just use "word" type for category predictor variables?

> 2. In production mode, don't use csv, you will find most of the time spent
> are on parse the csv data and hash them to features. You might encode the
> feature to vector and serialize them to the file system by MapReduce to
> reduce cost on data parsing.

Currentlly we are not familiar with Vectors, is there a standard way
(command line )to encode csv files into Vector and serialize them into
file system,
And what do you mean by "file system", local file system or HDFS,
because you mentioned MapReduce



> Best wishes,
> Stanley Xu
>
>
>
> On Mon, May 2, 2011 at 11:58 PM, Xiaobo Gu <gu...@gmail.com> wrote:
>>
>> In our environments data will be prepared inside the relational data
>> warehouse, and then export as csv files, that's the trainlogistic
>> command line works well for us, but we will have both numeric and
>> category predictor variables, does SGD support category variables, and
>> are there examples about this? because I think the results bellow does
>> not apply for category variables,
>>
>> color ~ -0.157*Intercept Term + -0.678*x + -0.416*y
>> Intercept Term -0.15655
>> x -0.67841
>> y -0.41587
>>
>> On Fri, Apr 22, 2011 at 6:16 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>> > The trainlogistic command is (as Stanley says) only a simple example.
>> >
>> > You will need to write a program something like TrainNewsGroups for your
>> > modelers to use.
>> >
>> > I agree that the API oriented code in Mahout is not what those users
>> > need.
>> >  I was, however, what my users needed.
>> >
>> > It would be great if you would like to contribute a good command line
>> > for
>> > the more advanced SGD classifier training
>> > API.
>> >
>> > On Tue, Apr 19, 2011 at 10:51 PM, Stanley Xu <we...@gmail.com>
>> > wrote:
>> >
>> >> Hi Xiaobo,
>> >>
>> >> You could check the chapter 13-16 from <Mahout In Action>, it provided
>> >> all
>> >> the parameters the command line tool of 'mahout trainlogistic' could
>> >> use.
>> >> But the trainlogistic command is still only a simple example. If you
>> >> wanted
>> >> to use that in a production environment, you still have to write the
>> >> feature
>> >> encode code by yourself. The code you need to write is pretty easy,
>> >> just
>> >> parse the input and put that in a Vector and let the LR train the data.
>> >>
>> >> Best wishes,
>> >> Stanley Xu
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Apr 19, 2011 at 9:09 PM, XiaoboGu <gu...@gmail.com>
>> >> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Thanks for your reply, after some reading of the wiki pages, I think
>> >>> what
>> >>> I want is a Logistic Regression command-line, since the target users
>> >>> of
>> >>> Mahout are data analysts, who can't write Java code, a command line is
>> >>> more
>> >>> convenient. Some specific questions are :
>> >>> 1. What format should we apply when preparing data for logistic
>> >>> regression, can we use csv, and should we put the value for the target
>> >>> variable as the first column in every row the csv file.
>> >>> 2. What options can we support to the command line if there is one.
>> >>> 3. How can interpret the results.
>> >>>
>> >>> Because Logistic Regression is the working horse of credit scoring in
>> >>> industry, I think it will make Mahout friends of more analysts if LR
>> >>> support
>> >>> is smooth.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Xiaobo Gu
>> >>>
>> >>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>> >>> Sent: Wednesday, April 13, 2011 1:02 AM
>> >>> To: user@mahout.apache.org
>> >>> Cc: Xiaobo Gu
>> >>> Subject: Re: Is any more detailed documentation aout the sgd logistic
>> >>> regression example.
>> >>>
>> >>> Can you be more specific about what you have and what you want?
>> >>>
>> >>> The book Mahout in Action provides quite a lot of details with sample
>> >>> code
>> >>> for a server farm.
>> >>>
>> >>> The TrainNewsGroups example provides code that you can copy.
>> >>>
>> >>> Do you have these resources?  Do you want more?  Did you want more
>> >>> theory?
>> >>>
>> >>> On Tue, Apr 12, 2011 at 9:11 AM, Xiaobo Gu <gu...@gmail.com>
>> >>> wrote:
>> >>> Hi,
>> >>> Documents about sgd logistic regression itself are welcome too.
>> >>> Regards,
>> >>>
>> >>> Xiaobo Gu
>> >>>
>> >>>
>> >>>
>> >>
>> >
>
>