You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by XiaoboGu <gu...@gmail.com> on 2011/05/29 05:48:53 UTC

RE: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

> 
> 
> > "Multinomial models" means the number n of distinct values the target is
> > more than 2, and they should be encoded as 0, 1, 2,......, n-1,
> >
> 
> Yes.
But the values of the color column of donut.csv are encoded as 1 and 2.



RE: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by XiaoboGu <gu...@gmail.com>.
I see now.

> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Sunday, May 29, 2011 1:44 PM
> To: user@mahout.apache.org
> Subject: Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after
> training?
> 
> I think not.
> 
> When you present the first symbol to the Dictionary, dict.size() will be
> zero.  That value will be inserted into the table under that symbol.  Each
> new symbol will be inserted with the size of the table as it was *before*
> that symbol was inserted.
> 
> I have added a line to CsvRecordFactoryTest.testDictionaryOrder to
> demonstrate and enforce this.  It won't be committed until the current
> release goes out.
> 
> On Sat, May 28, 2011 at 9:57 PM, XiaoboGu <gu...@gmail.com> wrote:
> >
> >
> > Ok, then target values are always more than 0, I refer to this
> >
> > public class Dictionary {
> >  private final Map<String, Integer> dict = Maps.newLinkedHashMap();
> >
> >  public int intern(String s) {
> >    if (!dict.containsKey(s)) {
> >      dict.put(s, dict.size());
> >    }
> >    return dict.get(s);
> >   }
> >
> >
> >


Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by Stanley Xu <we...@gmail.com>.
Yes. What I mean is for row number, which is num_target - 1.
The number of features is the same as the vector length of the training
example.

Best wishes,
Stanley Xu



On Mon, May 30, 2011 at 12:25 PM, Ted Dunning <te...@gmail.com> wrote:

> Stanley is correct in his first point because the number of items has to
> match.
>
> The second point confuses me.  beta is a matrix and thus has rows and
> columns.  It should be NUM_FEATURES x (NUM_TARGETS - 1) in size.
>
> On Sun, May 29, 2011 at 7:50 PM, Stanley Xu <we...@gmail.com> wrote:
>
> > Nope. The target values from 1.... to n-1 will be mapper to a list of
> > target
> > value from 0....to n-2.
> > And in the beta matrix, we will use all 0 weights for the first line
> > vector,
> > so the weight generated would be beta[0]..... to beta[n-3].
> >
> > Best wishes,
> > Stanley Xu
> >
> >
> >
> > On Mon, May 30, 2011 at 10:16 AM, Xiaobo Gu <gu...@gmail.com>
> > wrote:
> >
> > > On Mon, May 30, 2011 at 2:30 AM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > > > Target values 1 ... n-1 correspond to columns 0 ... n-2 of the beta
> > > matrix.
> > > >  classifyFull puts a synthetic result at location 0.
> > >
> > > I think Target values 1 ... n-1 correspond to row 0, ... n-1 of the
> beta
> > > matrix,
> > >
> > > is it ?
> > >
> > >
> > >
> > > >
> > > > If you can afford the (very small) cost of allocating a larger
> vector,
> > I
> > > > recommend using classifyFull to make your life simpler.  I almost
> > regret
> > > > using the simpler name for the method that imposes complexity on the
> > > user.
> > > >
> > > > On Sun, May 29, 2011 at 12:57 AM, XiaoboGu <gu...@gmail.com>
> > > wrote:
> > > >
> > > >> Then which value is missed in the beta matrix of
> > > OnlineLogisticRegression,
> > > >> the last value of the target present to LR.train(), that is n - 1 is
> > > missed?
> > > >>
> > > >>
> > > >> > -----Original Message-----
> > > >> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > > >> > Sent: Sunday, May 29, 2011 1:44 PM
> > > >> > To: user@mahout.apache.org
> > > >> > Subject: Re: Are the OnlineLogisticRegression s of a
> > > CrossFolderLearner
> > > >> object equal after
> > > >> > training?
> > > >> >
> > > >> > I think not.
> > > >> >
> > > >> > When you present the first symbol to the Dictionary, dict.size()
> > will
> > > be
> > > >> > zero.  That value will be inserted into the table under that
> symbol.
> > > >>  Each
> > > >> > new symbol will be inserted with the size of the table as it was
> > > *before*
> > > >> > that symbol was inserted.
> > > >> >
> > > >> > I have added a line to CsvRecordFactoryTest.testDictionaryOrder to
> > > >> > demonstrate and enforce this.  It won't be committed until the
> > current
> > > >> > release goes out.
> > > >> >
> > > >> > On Sat, May 28, 2011 at 9:57 PM, XiaoboGu <guxiaobo1982@gmail.com
> >
> > > >> wrote:
> > > >> > >
> > > >> > >
> > > >> > > Ok, then target values are always more than 0, I refer to this
> > > >> > >
> > > >> > > public class Dictionary {
> > > >> > >  private final Map<String, Integer> dict =
> > Maps.newLinkedHashMap();
> > > >> > >
> > > >> > >  public int intern(String s) {
> > > >> > >    if (!dict.containsKey(s)) {
> > > >> > >      dict.put(s, dict.size());
> > > >> > >    }
> > > >> > >    return dict.get(s);
> > > >> > >   }
> > > >> > >
> > > >> > >
> > > >> > >
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by Ted Dunning <te...@gmail.com>.
Stanley is correct in his first point because the number of items has to
match.

The second point confuses me.  beta is a matrix and thus has rows and
columns.  It should be NUM_FEATURES x (NUM_TARGETS - 1) in size.

On Sun, May 29, 2011 at 7:50 PM, Stanley Xu <we...@gmail.com> wrote:

> Nope. The target values from 1.... to n-1 will be mapper to a list of
> target
> value from 0....to n-2.
> And in the beta matrix, we will use all 0 weights for the first line
> vector,
> so the weight generated would be beta[0]..... to beta[n-3].
>
> Best wishes,
> Stanley Xu
>
>
>
> On Mon, May 30, 2011 at 10:16 AM, Xiaobo Gu <gu...@gmail.com>
> wrote:
>
> > On Mon, May 30, 2011 at 2:30 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> > > Target values 1 ... n-1 correspond to columns 0 ... n-2 of the beta
> > matrix.
> > >  classifyFull puts a synthetic result at location 0.
> >
> > I think Target values 1 ... n-1 correspond to row 0, ... n-1 of the beta
> > matrix,
> >
> > is it ?
> >
> >
> >
> > >
> > > If you can afford the (very small) cost of allocating a larger vector,
> I
> > > recommend using classifyFull to make your life simpler.  I almost
> regret
> > > using the simpler name for the method that imposes complexity on the
> > user.
> > >
> > > On Sun, May 29, 2011 at 12:57 AM, XiaoboGu <gu...@gmail.com>
> > wrote:
> > >
> > >> Then which value is missed in the beta matrix of
> > OnlineLogisticRegression,
> > >> the last value of the target present to LR.train(), that is n - 1 is
> > missed?
> > >>
> > >>
> > >> > -----Original Message-----
> > >> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > >> > Sent: Sunday, May 29, 2011 1:44 PM
> > >> > To: user@mahout.apache.org
> > >> > Subject: Re: Are the OnlineLogisticRegression s of a
> > CrossFolderLearner
> > >> object equal after
> > >> > training?
> > >> >
> > >> > I think not.
> > >> >
> > >> > When you present the first symbol to the Dictionary, dict.size()
> will
> > be
> > >> > zero.  That value will be inserted into the table under that symbol.
> > >>  Each
> > >> > new symbol will be inserted with the size of the table as it was
> > *before*
> > >> > that symbol was inserted.
> > >> >
> > >> > I have added a line to CsvRecordFactoryTest.testDictionaryOrder to
> > >> > demonstrate and enforce this.  It won't be committed until the
> current
> > >> > release goes out.
> > >> >
> > >> > On Sat, May 28, 2011 at 9:57 PM, XiaoboGu <gu...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > >
> > >> > > Ok, then target values are always more than 0, I refer to this
> > >> > >
> > >> > > public class Dictionary {
> > >> > >  private final Map<String, Integer> dict =
> Maps.newLinkedHashMap();
> > >> > >
> > >> > >  public int intern(String s) {
> > >> > >    if (!dict.containsKey(s)) {
> > >> > >      dict.put(s, dict.size());
> > >> > >    }
> > >> > >    return dict.get(s);
> > >> > >   }
> > >> > >
> > >> > >
> > >> > >
> > >>
> > >>
> > >
> >
>

Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by Stanley Xu <we...@gmail.com>.
Nope. The target values from 1.... to n-1 will be mapper to a list of target
value from 0....to n-2.
And in the beta matrix, we will use all 0 weights for the first line vector,
so the weight generated would be beta[0]..... to beta[n-3].

Best wishes,
Stanley Xu



On Mon, May 30, 2011 at 10:16 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> On Mon, May 30, 2011 at 2:30 AM, Ted Dunning <te...@gmail.com>
> wrote:
> > Target values 1 ... n-1 correspond to columns 0 ... n-2 of the beta
> matrix.
> >  classifyFull puts a synthetic result at location 0.
>
> I think Target values 1 ... n-1 correspond to row 0, ... n-1 of the beta
> matrix,
>
> is it ?
>
>
>
> >
> > If you can afford the (very small) cost of allocating a larger vector, I
> > recommend using classifyFull to make your life simpler.  I almost regret
> > using the simpler name for the method that imposes complexity on the
> user.
> >
> > On Sun, May 29, 2011 at 12:57 AM, XiaoboGu <gu...@gmail.com>
> wrote:
> >
> >> Then which value is missed in the beta matrix of
> OnlineLogisticRegression,
> >> the last value of the target present to LR.train(), that is n - 1 is
> missed?
> >>
> >>
> >> > -----Original Message-----
> >> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> >> > Sent: Sunday, May 29, 2011 1:44 PM
> >> > To: user@mahout.apache.org
> >> > Subject: Re: Are the OnlineLogisticRegression s of a
> CrossFolderLearner
> >> object equal after
> >> > training?
> >> >
> >> > I think not.
> >> >
> >> > When you present the first symbol to the Dictionary, dict.size() will
> be
> >> > zero.  That value will be inserted into the table under that symbol.
> >>  Each
> >> > new symbol will be inserted with the size of the table as it was
> *before*
> >> > that symbol was inserted.
> >> >
> >> > I have added a line to CsvRecordFactoryTest.testDictionaryOrder to
> >> > demonstrate and enforce this.  It won't be committed until the current
> >> > release goes out.
> >> >
> >> > On Sat, May 28, 2011 at 9:57 PM, XiaoboGu <gu...@gmail.com>
> >> wrote:
> >> > >
> >> > >
> >> > > Ok, then target values are always more than 0, I refer to this
> >> > >
> >> > > public class Dictionary {
> >> > >  private final Map<String, Integer> dict = Maps.newLinkedHashMap();
> >> > >
> >> > >  public int intern(String s) {
> >> > >    if (!dict.containsKey(s)) {
> >> > >      dict.put(s, dict.size());
> >> > >    }
> >> > >    return dict.get(s);
> >> > >   }
> >> > >
> >> > >
> >> > >
> >>
> >>
> >
>

Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by Xiaobo Gu <gu...@gmail.com>.
On Mon, May 30, 2011 at 2:30 AM, Ted Dunning <te...@gmail.com> wrote:
> Target values 1 ... n-1 correspond to columns 0 ... n-2 of the beta matrix.
>  classifyFull puts a synthetic result at location 0.

I think Target values 1 ... n-1 correspond to row 0, ... n-1 of the beta matrix,

is it ?



>
> If you can afford the (very small) cost of allocating a larger vector, I
> recommend using classifyFull to make your life simpler.  I almost regret
> using the simpler name for the method that imposes complexity on the user.
>
> On Sun, May 29, 2011 at 12:57 AM, XiaoboGu <gu...@gmail.com> wrote:
>
>> Then which value is missed in the beta matrix of OnlineLogisticRegression,
>> the last value of the target present to LR.train(), that is n - 1 is missed?
>>
>>
>> > -----Original Message-----
>> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
>> > Sent: Sunday, May 29, 2011 1:44 PM
>> > To: user@mahout.apache.org
>> > Subject: Re: Are the OnlineLogisticRegression s of a CrossFolderLearner
>> object equal after
>> > training?
>> >
>> > I think not.
>> >
>> > When you present the first symbol to the Dictionary, dict.size() will be
>> > zero.  That value will be inserted into the table under that symbol.
>>  Each
>> > new symbol will be inserted with the size of the table as it was *before*
>> > that symbol was inserted.
>> >
>> > I have added a line to CsvRecordFactoryTest.testDictionaryOrder to
>> > demonstrate and enforce this.  It won't be committed until the current
>> > release goes out.
>> >
>> > On Sat, May 28, 2011 at 9:57 PM, XiaoboGu <gu...@gmail.com>
>> wrote:
>> > >
>> > >
>> > > Ok, then target values are always more than 0, I refer to this
>> > >
>> > > public class Dictionary {
>> > >  private final Map<String, Integer> dict = Maps.newLinkedHashMap();
>> > >
>> > >  public int intern(String s) {
>> > >    if (!dict.containsKey(s)) {
>> > >      dict.put(s, dict.size());
>> > >    }
>> > >    return dict.get(s);
>> > >   }
>> > >
>> > >
>> > >
>>
>>
>

Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by Ted Dunning <te...@gmail.com>.
Target values 1 ... n-1 correspond to columns 0 ... n-2 of the beta matrix.
 classifyFull puts a synthetic result at location 0.

If you can afford the (very small) cost of allocating a larger vector, I
recommend using classifyFull to make your life simpler.  I almost regret
using the simpler name for the method that imposes complexity on the user.

On Sun, May 29, 2011 at 12:57 AM, XiaoboGu <gu...@gmail.com> wrote:

> Then which value is missed in the beta matrix of OnlineLogisticRegression,
> the last value of the target present to LR.train(), that is n - 1 is missed?
>
>
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: Sunday, May 29, 2011 1:44 PM
> > To: user@mahout.apache.org
> > Subject: Re: Are the OnlineLogisticRegression s of a CrossFolderLearner
> object equal after
> > training?
> >
> > I think not.
> >
> > When you present the first symbol to the Dictionary, dict.size() will be
> > zero.  That value will be inserted into the table under that symbol.
>  Each
> > new symbol will be inserted with the size of the table as it was *before*
> > that symbol was inserted.
> >
> > I have added a line to CsvRecordFactoryTest.testDictionaryOrder to
> > demonstrate and enforce this.  It won't be committed until the current
> > release goes out.
> >
> > On Sat, May 28, 2011 at 9:57 PM, XiaoboGu <gu...@gmail.com>
> wrote:
> > >
> > >
> > > Ok, then target values are always more than 0, I refer to this
> > >
> > > public class Dictionary {
> > >  private final Map<String, Integer> dict = Maps.newLinkedHashMap();
> > >
> > >  public int intern(String s) {
> > >    if (!dict.containsKey(s)) {
> > >      dict.put(s, dict.size());
> > >    }
> > >    return dict.get(s);
> > >   }
> > >
> > >
> > >
>
>

RE: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by XiaoboGu <gu...@gmail.com>.
Then which value is missed in the beta matrix of OnlineLogisticRegression, the last value of the target present to LR.train(), that is n - 1 is missed?


> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Sunday, May 29, 2011 1:44 PM
> To: user@mahout.apache.org
> Subject: Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after
> training?
> 
> I think not.
> 
> When you present the first symbol to the Dictionary, dict.size() will be
> zero.  That value will be inserted into the table under that symbol.  Each
> new symbol will be inserted with the size of the table as it was *before*
> that symbol was inserted.
> 
> I have added a line to CsvRecordFactoryTest.testDictionaryOrder to
> demonstrate and enforce this.  It won't be committed until the current
> release goes out.
> 
> On Sat, May 28, 2011 at 9:57 PM, XiaoboGu <gu...@gmail.com> wrote:
> >
> >
> > Ok, then target values are always more than 0, I refer to this
> >
> > public class Dictionary {
> >  private final Map<String, Integer> dict = Maps.newLinkedHashMap();
> >
> >  public int intern(String s) {
> >    if (!dict.containsKey(s)) {
> >      dict.put(s, dict.size());
> >    }
> >    return dict.get(s);
> >   }
> >
> >
> >


Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by Ted Dunning <te...@gmail.com>.
I think not.

When you present the first symbol to the Dictionary, dict.size() will be
zero.  That value will be inserted into the table under that symbol.  Each
new symbol will be inserted with the size of the table as it was *before*
that symbol was inserted.

I have added a line to CsvRecordFactoryTest.testDictionaryOrder to
demonstrate and enforce this.  It won't be committed until the current
release goes out.

On Sat, May 28, 2011 at 9:57 PM, XiaoboGu <gu...@gmail.com> wrote:
>
>
> Ok, then target values are always more than 0, I refer to this
>
> public class Dictionary {
>  private final Map<String, Integer> dict = Maps.newLinkedHashMap();
>
>  public int intern(String s) {
>    if (!dict.containsKey(s)) {
>      dict.put(s, dict.size());
>    }
>    return dict.get(s);
>   }
>
>
>

RE: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by XiaoboGu <gu...@gmail.com>.

> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Sunday, May 29, 2011 12:48 PM
> To: user@mahout.apache.org
> Subject: Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after
> training?
> 
> The color column contains strings.  Internally, codes are assigned to those
> strings.
> 
> The book talks about the confusion that comes from using numerical values as
> codes for categorical values.  This is an example of that confusion.

Ok, then target values are always more than 0, I refer to this 

public class Dictionary {
  private final Map<String, Integer> dict = Maps.newLinkedHashMap();

  public int intern(String s) {
    if (!dict.containsKey(s)) {
      dict.put(s, dict.size());
    }
    return dict.get(s);
  }



> 
> On Sat, May 28, 2011 at 8:48 PM, XiaoboGu <gu...@gmail.com> wrote:
> 
> > >
> > >
> > > > "Multinomial models" means the number n of distinct values the target
> > is
> > > > more than 2, and they should be encoded as 0, 1, 2,......, n-1,
> > > >
> > >
> > > Yes.
> > But the values of the color column of donut.csv are encoded as 1 and 2.
> >
> >
> >


Re: Are the OnlineLogisticRegression s of a CrossFolderLearner object equal after training?

Posted by Ted Dunning <te...@gmail.com>.
The color column contains strings.  Internally, codes are assigned to those
strings.

The book talks about the confusion that comes from using numerical values as
codes for categorical values.  This is an example of that confusion.

On Sat, May 28, 2011 at 8:48 PM, XiaoboGu <gu...@gmail.com> wrote:

> >
> >
> > > "Multinomial models" means the number n of distinct values the target
> is
> > > more than 2, and they should be encoded as 0, 1, 2,......, n-1,
> > >
> >
> > Yes.
> But the values of the color column of donut.csv are encoded as 1 and 2.
>
>
>