You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sameer Tilak <ss...@live.com> on 2014/12/20 17:13:04 UTC

Interpreting MLLib's linear regression o/p

Hi All,I use LIBSVM format to specify my input feature vector, which used 1-based index. When I run regression the o/p is 0-indexed based. I have a master lookup file that maps back these indices to what they stand or. However, I need to add offset of 2 and not 1 to the regression outcome during the mapping. So for example to map the index of 800 from the regression output file, I look for 802 in my master lookup file and then things make sense. I can understand adding offset of 1, but not sure why adding offset 2 is working fine. Have others seem something like this as well?

Re: Interpreting MLLib's linear regression o/p

Posted by Sean Owen <so...@cloudera.com>.

(In your libsvm example, your indices are not ascending.)

The first weight corresponds to the first feature, of course. An
indexing scheme doesn't change that or somehow make the first feature
map to the second (where would the last one go then?). You'll find the
first weight at offset 0 in an array for example, but corresponds to
the feature you called F1 in the input.

On Tue, Dec 23, 2014 at 12:50 AM, Sameer Tilak <ss...@live.com> wrote:
> Hi,
>
> It is a text format in which each line represents a labeled sparse feature
> vector using the following format:
>
> label index1:value1 index2:value2 ...
>
> This was the confusing part in the documentation:
>
>
> "where the indices are one-based and in ascending order. After loading, the
> feature indices are converted to zero-based."
>
>
> Let us say that I have 40 features so I create an index file like this:
>
>
> Feature, index number:
>
> F1   1
>
> F2   2
>
> F3   3
>
> ...
>
> F4   40
>
>
> I then create my feature vectors and in the libsvm format something like:
>
> 1 10:1 20:0 8:1 4:0 24:1
>
> 1 1:1 40:0 2:1 8:0 9:1 23:1
>
> 0 23:1 18:0 13:1
>
> .....
>
>
>
> I run regression and get back models.weights which are 40 weights.
>
> Say I get
>
> 0.11
>
> 0.3445
>
> 0.00005
>
> ...
>
>
> In that case does the first weight (0.11) correspond to index 1/ F1 or does
> or correspond to index 2/F2? Since Input is 1-based and o/p is 0-based. Or
> is 0-based indexing is only for internal representation and what you get
> back at the end of regression is essentially 1-based indexed like your input
> so 0.11 maps onto  from F1and so on?
>
>
>
>
>> Date: Mon, 22 Dec 2014 16:31:57 -0800
>> Subject: Re: Interpreting MLLib's linear regression o/p
>> From: mengxr@gmail.com
>> To: sstilak@live.com
>> CC: user@spark.apache.org
>
>>
>> Did you check the indices in the LIBSVM data and the master file? Do
>> they match? -Xiangrui
>>
>> On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak <ss...@live.com> wrote:
>> > Hi All,
>> > I use LIBSVM format to specify my input feature vector, which used
>> > 1-based
>> > index. When I run regression the o/p is 0-indexed based. I have a master
>> > lookup file that maps back these indices to what they stand or. However,
>> > I
>> > need to add offset of 2 and not 1 to the regression outcome during the
>> > mapping. So for example to map the index of 800 from the regression
>> > output
>> > file, I look for 802 in my master lookup file and then things make
>> > sense. I
>> > can understand adding offset of 1, but not sure why adding offset 2 is
>> > working fine. Have others seem something like this as well?
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: Interpreting MLLib's linear regression o/p

Posted by Sameer Tilak <ss...@live.com>.

Hi,It is a text format in which each line represents a labeled sparse feature vector using the following format:label index1:value1 index2:value2 ...This was the confusing part in the documentation:
"where the indices are one-based and in ascending order. After loading, the feature indices are converted to zero-based."
Let us say that I have 40 features so I create an index file like this:
Feature, index number:F1   1F2   2F3   3...F4   40
I then create my feature vectors and in the libsvm format something like:1 10:1 20:0 8:1 4:0 24:11 1:1 40:0 2:1 8:0 9:1 23:10 23:1 18:0 13:1.....

I run regression and get back models.weights which are 40 weights.Say I get 0.110.34450.00005...
In that case does the first weight (0.11) correspond to index 1/ F1 or does or correspond to index 2/F2? Since Input is 1-based and o/p is 0-based. Or is 0-based indexing is only for internal representation and what you get back at the end of regression is essentially 1-based indexed like your input so 0.11 maps onto  from F1and so on?

> Date: Mon, 22 Dec 2014 16:31:57 -0800
> Subject: Re: Interpreting MLLib's linear regression o/p
> From: mengxr@gmail.com
> To: sstilak@live.com
> CC: user@spark.apache.org
> 
> Did you check the indices in the LIBSVM data and the master file? Do
> they match? -Xiangrui
> 
> On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak <ss...@live.com> wrote:
> > Hi All,
> > I use LIBSVM format to specify my input feature vector, which used 1-based
> > index. When I run regression the o/p is 0-indexed based. I have a master
> > lookup file that maps back these indices to what they stand or. However, I
> > need to add offset of 2 and not 1 to the regression outcome during the
> > mapping. So for example to map the index of 800 from the regression output
> > file, I look for 802 in my master lookup file and then things make sense. I
> > can understand adding offset of 1, but not sure why adding offset 2 is
> > working fine. Have others seem something like this as well?
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

Re: Interpreting MLLib's linear regression o/p

Posted by Xiangrui Meng <me...@gmail.com>.

Did you check the indices in the LIBSVM data and the master file? Do
they match? -Xiangrui

On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak <ss...@live.com> wrote:
> Hi All,
> I use LIBSVM format to specify my input feature vector, which used 1-based
> index. When I run regression the o/p is 0-indexed based. I have a master
> lookup file that maps back these indices to what they stand or. However, I
> need to add offset of 2 and not 1 to the regression outcome during the
> mapping. So for example to map the index of 800 from the regression output
> file, I look for 802 in my master lookup file and then things make sense. I
> can understand adding offset of 1, but not sure why adding offset 2 is
> working fine. Have others seem something like this as well?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org