You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by XiaoboGu <gu...@gmail.com> on 2011/06/04 17:31:48 UTC

How to get the predicted target lable using CrossFolderLearner?

Hi,
	When dealing with multinomial logistic regression with CrossFolderLearner, the Vector classfy(Vector) method returns a vector of scores for all the target values, Max( scores.max, 1 - scores.zSum) will return the max socre, according to which we can determine the predicted target value, which is encoded by the learner internally, but where is the map between original target labels and the encoded codes?

Regards,

Xiaobo Gu

RE: How to get the predicted target lable using CrossFolderLearner?

Posted by XiaoboGu <gu...@gmail.com>.

I have implemented remembering the mapping in CsvRecordFactory, see the latest MAHOUT-696.path I uploaded to jira,



> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Monday, June 06, 2011 6:18 PM
> To: dev@mahout.apache.org
> Cc: user@mahout.apache.org
> Subject: Re: How to get the predicted target lable using CrossFolderLearner?
> 
> You have to remember that mapping.  You will have created it when you
> encoded the target variable.
> 
> This is occasionally a nasty problem.  I have considered adding the ability
> to record a dictionary in the classification models, but have not done so.
> 
> What interface would you like to see?
> 
> Hector, you might like a vote on this.  What do you think?
> 
> Jeff, what do you think about the impact on the clustering/classification
> unification?
> 
> On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:
> 
> > How can I find the map between original target labels and the encoded
> > target codes?
> >

Re: How to get the predicted target lable using CrossFolderLearner?

Posted by Lance Norskog <go...@gmail.com>.

Could the 'metadata model' be a separate file?

On Tue, Jun 7, 2011 at 12:22 AM, Hector Yee <he...@gmail.com> wrote:
> I've used systems before that kept the original mapping to the classifier
> specific mapping.
> It can be nice because you can add new features and an old model may still
> work because the new features would be out of range of the old mappings.
> It can also provide a place to store score statistics (such as min / max /
> avg / std dev) for classifiers that need to normalize their features, such
> as the linear models.
>
> It could be something like this
>
> FeatureInfo
>  int32 original_index
>  int32 internal_index
>  float min_value
>  float max_value
>
> FeatureSetInfo
>  repeated FeatureInfo
>
> The drawback is potentially adding 32-bytes per feature, which could be
> detrimental in terms of size, especially for high dimensional feature spaces
> (e.g. text).
> If the writable interface could make this optional it would work.
> Or we could make all classifiers have a fixed header that we write
> containing the common meta-data followed by the actual model itself.
>
> On Mon, Jun 6, 2011 at 3:17 AM, Ted Dunning <te...@gmail.com> wrote:
>
>> You have to remember that mapping.  You will have created it when you
>> encoded the target variable.
>>
>> This is occasionally a nasty problem.  I have considered adding the ability
>> to record a dictionary in the classification models, but have not done so.
>>
>> What interface would you like to see?
>>
>> Hector, you might like a vote on this.  What do you think?
>>
>> Jeff, what do you think about the impact on the clustering/classification
>> unification?
>>
>> On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:
>>
>> > How can I find the map between original target labels and the encoded
>> > target codes?
>> >
>>
>
>
>
> --
> Yee Yang Li Hector
> http://hectorgon.blogspot.com/ (tech + travel)
> http://hectorgon.com (book reviews)
>



-- 
Lance Norskog
goksron@gmail.com

Re: How to get the predicted target lable using CrossFolderLearner?

Posted by Hector Yee <he...@gmail.com>.

I've used systems before that kept the original mapping to the classifier
specific mapping.
It can be nice because you can add new features and an old model may still
work because the new features would be out of range of the old mappings.
It can also provide a place to store score statistics (such as min / max /
avg / std dev) for classifiers that need to normalize their features, such
as the linear models.

It could be something like this

FeatureInfo
  int32 original_index
  int32 internal_index
  float min_value
  float max_value

FeatureSetInfo
  repeated FeatureInfo

The drawback is potentially adding 32-bytes per feature, which could be
detrimental in terms of size, especially for high dimensional feature spaces
(e.g. text).
If the writable interface could make this optional it would work.
Or we could make all classifiers have a fixed header that we write
containing the common meta-data followed by the actual model itself.

On Mon, Jun 6, 2011 at 3:17 AM, Ted Dunning <te...@gmail.com> wrote:

> You have to remember that mapping.  You will have created it when you
> encoded the target variable.
>
> This is occasionally a nasty problem.  I have considered adding the ability
> to record a dictionary in the classification models, but have not done so.
>
> What interface would you like to see?
>
> Hector, you might like a vote on this.  What do you think?
>
> Jeff, what do you think about the impact on the clustering/classification
> unification?
>
> On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:
>
> > How can I find the map between original target labels and the encoded
> > target codes?
> >
>

-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

RE: How to get the predicted target lable using CrossFolderLearner?

Posted by XiaoboGu <gu...@gmail.com>.


> -----Original Message-----
> From: XiaoboGu [mailto:guxiaobo1982@gmail.com]
> Sent: Monday, June 06, 2011 6:22 PM
> To: 'user@mahout.apache.org'; 'dev@mahout.apache.org'
> Subject: RE: How to get the predicted target lable using CrossFolderLearner?
> 
> I have implemented remembering the mapping in CsvRecordFactory, see the latest
> MAHOUT-696.path I uploaded to jira,


And a use case is shown in the latest RunAdaptiveLogistic.java


> 
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: Monday, June 06, 2011 6:18 PM
> > To: dev@mahout.apache.org
> > Cc: user@mahout.apache.org
> > Subject: Re: How to get the predicted target lable using CrossFolderLearner?
> >
> > You have to remember that mapping.  You will have created it when you
> > encoded the target variable.
> >
> > This is occasionally a nasty problem.  I have considered adding the ability
> > to record a dictionary in the classification models, but have not done so.
> >
> > What interface would you like to see?
> >
> > Hector, you might like a vote on this.  What do you think?
> >
> > Jeff, what do you think about the impact on the clustering/classification
> > unification?
> >
> > On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:
> >
> > > How can I find the map between original target labels and the encoded
> > > target codes?
> > >

Re: How to get the predicted target lable using CrossFolderLearner?

Posted by Hector Yee <he...@gmail.com>.

I've used systems before that kept the original mapping to the classifier
specific mapping.
It can be nice because you can add new features and an old model may still
work because the new features would be out of range of the old mappings.
It can also provide a place to store score statistics (such as min / max /
avg / std dev) for classifiers that need to normalize their features, such
as the linear models.

It could be something like this

FeatureInfo
  int32 original_index
  int32 internal_index
  float min_value
  float max_value

FeatureSetInfo
  repeated FeatureInfo

The drawback is potentially adding 32-bytes per feature, which could be
detrimental in terms of size, especially for high dimensional feature spaces
(e.g. text).
If the writable interface could make this optional it would work.
Or we could make all classifiers have a fixed header that we write
containing the common meta-data followed by the actual model itself.

On Mon, Jun 6, 2011 at 3:17 AM, Ted Dunning <te...@gmail.com> wrote:

> You have to remember that mapping.  You will have created it when you
> encoded the target variable.
>
> This is occasionally a nasty problem.  I have considered adding the ability
> to record a dictionary in the classification models, but have not done so.
>
> What interface would you like to see?
>
> Hector, you might like a vote on this.  What do you think?
>
> Jeff, what do you think about the impact on the clustering/classification
> unification?
>
> On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:
>
> > How can I find the map between original target labels and the encoded
> > target codes?
> >
>

-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

RE: How to get the predicted target lable using CrossFolderLearner?

Posted by XiaoboGu <gu...@gmail.com>.

I have implemented remembering the mapping in CsvRecordFactory, see the latest MAHOUT-696.path I uploaded to jira,



> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Monday, June 06, 2011 6:18 PM
> To: dev@mahout.apache.org
> Cc: user@mahout.apache.org
> Subject: Re: How to get the predicted target lable using CrossFolderLearner?
> 
> You have to remember that mapping.  You will have created it when you
> encoded the target variable.
> 
> This is occasionally a nasty problem.  I have considered adding the ability
> to record a dictionary in the classification models, but have not done so.
> 
> What interface would you like to see?
> 
> Hector, you might like a vote on this.  What do you think?
> 
> Jeff, what do you think about the impact on the clustering/classification
> unification?
> 
> On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:
> 
> > How can I find the map between original target labels and the encoded
> > target codes?
> >

RE: How to get the predicted target lable using CrossFolderLearner?

Posted by XiaoboGu <gu...@gmail.com>.


> -----Original Message-----
> From: XiaoboGu [mailto:guxiaobo1982@gmail.com]
> Sent: Monday, June 06, 2011 6:22 PM
> To: 'user@mahout.apache.org'; 'dev@mahout.apache.org'
> Subject: RE: How to get the predicted target lable using CrossFolderLearner?
> 
> I have implemented remembering the mapping in CsvRecordFactory, see the latest
> MAHOUT-696.path I uploaded to jira,


And a use case is shown in the latest RunAdaptiveLogistic.java


> 
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: Monday, June 06, 2011 6:18 PM
> > To: dev@mahout.apache.org
> > Cc: user@mahout.apache.org
> > Subject: Re: How to get the predicted target lable using CrossFolderLearner?
> >
> > You have to remember that mapping.  You will have created it when you
> > encoded the target variable.
> >
> > This is occasionally a nasty problem.  I have considered adding the ability
> > to record a dictionary in the classification models, but have not done so.
> >
> > What interface would you like to see?
> >
> > Hector, you might like a vote on this.  What do you think?
> >
> > Jeff, what do you think about the impact on the clustering/classification
> > unification?
> >
> > On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:
> >
> > > How can I find the map between original target labels and the encoded
> > > target codes?
> > >

Re: How to get the predicted target lable using CrossFolderLearner?

Posted by Ted Dunning <te...@gmail.com>.

You have to remember that mapping.  You will have created it when you
encoded the target variable.

This is occasionally a nasty problem.  I have considered adding the ability
to record a dictionary in the classification models, but have not done so.

What interface would you like to see?

Hector, you might like a vote on this.  What do you think?

Jeff, what do you think about the impact on the clustering/classification
unification?

On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:

> How can I find the map between original target labels and the encoded
> target codes?
>

Re: How to get the predicted target lable using CrossFolderLearner?

Posted by Ted Dunning <te...@gmail.com>.

You have to remember that mapping.  You will have created it when you
encoded the target variable.

This is occasionally a nasty problem.  I have considered adding the ability
to record a dictionary in the classification models, but have not done so.

What interface would you like to see?

Hector, you might like a vote on this.  What do you think?

Jeff, what do you think about the impact on the clustering/classification
unification?

On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <gu...@gmail.com> wrote:

> How can I find the map between original target labels and the encoded
> target codes?
>

RE: How to get the predicted target lable using CrossFolderLearner?

Posted by XiaoboGu <gu...@gmail.com>.

classifyFull then vector maxIndex do the same thing, actually my question is :

How can I find the map between original target labels and the encoded target codes?


> -----Original Message-----
> From: Hector Yee [mailto:hector.yee@gmail.com]
> Sent: Sunday, June 05, 2011 12:40 AM
> To: dev@mahout.apache.org
> Cc: <us...@mahout.apache.org>; <de...@mahout.apache.org>
> Subject: Re: How to get the predicted target lable using CrossFolderLearner?
> 
> You can use classifyFull and then vector maxIndex
> 
> Sent from my iPad
> 
> On Jun 4, 2011, at 8:31 AM, "XiaoboGu" <gu...@gmail.com> wrote:
> 
> > Hi,
> >    When dealing with multinomial logistic regression with CrossFolderLearner, the
> Vector classfy(Vector) method returns a vector of scores for all the target values,
> Max( scores.max, 1 - scores.zSum) will return the max socre, according to which we can
> determine the predicted target value, which is encoded by the learner internally, but where is
> the map between original target labels and the encoded codes?
> >
> > Regards,
> >
> > Xiaobo Gu
> >

RE: How to get the predicted target lable using CrossFolderLearner?

Posted by XiaoboGu <gu...@gmail.com>.

classifyFull then vector maxIndex do the same thing, actually my question is :

How can I find the map between original target labels and the encoded target codes?


> -----Original Message-----
> From: Hector Yee [mailto:hector.yee@gmail.com]
> Sent: Sunday, June 05, 2011 12:40 AM
> To: dev@mahout.apache.org
> Cc: <us...@mahout.apache.org>; <de...@mahout.apache.org>
> Subject: Re: How to get the predicted target lable using CrossFolderLearner?
> 
> You can use classifyFull and then vector maxIndex
> 
> Sent from my iPad
> 
> On Jun 4, 2011, at 8:31 AM, "XiaoboGu" <gu...@gmail.com> wrote:
> 
> > Hi,
> >    When dealing with multinomial logistic regression with CrossFolderLearner, the
> Vector classfy(Vector) method returns a vector of scores for all the target values,
> Max( scores.max, 1 - scores.zSum) will return the max socre, according to which we can
> determine the predicted target value, which is encoded by the learner internally, but where is
> the map between original target labels and the encoded codes?
> >
> > Regards,
> >
> > Xiaobo Gu
> >

Re: How to get the predicted target lable using CrossFolderLearner?

Posted by Hector Yee <he...@gmail.com>.

You can use classifyFull and then vector maxIndex

Sent from my iPad

On Jun 4, 2011, at 8:31 AM, "XiaoboGu" <gu...@gmail.com> wrote:

> Hi,
>    When dealing with multinomial logistic regression with CrossFolderLearner, the Vector classfy(Vector) method returns a vector of scores for all the target values, Max( scores.max, 1 - scores.zSum) will return the max socre, according to which we can determine the predicted target value, which is encoded by the learner internally, but where is the map between original target labels and the encoded codes?
> 
> Regards,
> 
> Xiaobo Gu
>

Re: How to get the predicted target lable using CrossFolderLearner?

Posted by Hector Yee <he...@gmail.com>.

You can use classifyFull and then vector maxIndex

Sent from my iPad

On Jun 4, 2011, at 8:31 AM, "XiaoboGu" <gu...@gmail.com> wrote:

> Hi,
>    When dealing with multinomial logistic regression with CrossFolderLearner, the Vector classfy(Vector) method returns a vector of scores for all the target values, Max( scores.max, 1 - scores.zSum) will return the max socre, according to which we can determine the predicted target value, which is encoded by the learner internally, but where is the map between original target labels and the encoded codes?
> 
> Regards,
> 
> Xiaobo Gu
>