You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Brian Krebs <bk...@tapheaven.com> on 2014/08/04 05:16:56 UTC

SGD logistic regression

Hi everyone,

I have a very basic question on the Apache SGD implementation. My training
set has about 50 features, most of which are categorical. Some of these
categories are binary, but others can have an unknown number of discrete
values (countries, cities, etc.).

Should I be encoding these with the ConstantValueEncoder? The
StaticWordValueEncoder?

Thanks,

*Brian Krebs*
CIO and Co-founder
TapHeaven
Mobile: 443.866.2137
Email: bkrebs@tapheaven.com
Twitter: @BKrebsTH
LinkedIn: www.linkedin.com/in/briankrebs
tapheaven.com

Re: SGD logistic regression

Posted by Brian Krebs <bk...@tapheaven.com>.
Thanks for the response Ted!

*Brian Krebs*
CIO and Co-founder
TapHeaven
Mobile:443.866.2137
Email: bkrebs@tapheaven.com
Twitter: @BKrebsTH
LinkedIn: www.linkedin.com/in/briankrebs
tapheaven.com
The static word encoder is appropriate for categorical variables with an
unknown number of values.




On Sun, Aug 3, 2014 at 9:16 PM, Brian Krebs <bk...@tapheaven.com> wrote:

> Hi everyone,
>
> I have a very basic question on the Apache SGD implementation. My training
> set has about 50 features, most of which are categorical. Some of these
> categories are binary, but others can have an unknown number of discrete
> values (countries, cities, etc.).
>
> Should I be encoding these with the ConstantValueEncoder? The
> StaticWordValueEncoder?
>
> Thanks,
>
> *Brian Krebs*
> CIO and Co-founder
> TapHeaven
> Mobile: 443.866.2137
> Email: bkrebs@tapheaven.com
> Twitter: @BKrebsTH
> LinkedIn: www.linkedin.com/in/briankrebs
> tapheaven.com
>

Re: SGD logistic regression

Posted by Ted Dunning <te...@gmail.com>.
The static word encoder is appropriate for categorical variables with an
unknown number of values.




On Sun, Aug 3, 2014 at 9:16 PM, Brian Krebs <bk...@tapheaven.com> wrote:

> Hi everyone,
>
> I have a very basic question on the Apache SGD implementation. My training
> set has about 50 features, most of which are categorical. Some of these
> categories are binary, but others can have an unknown number of discrete
> values (countries, cities, etc.).
>
> Should I be encoding these with the ConstantValueEncoder? The
> StaticWordValueEncoder?
>
> Thanks,
>
> *Brian Krebs*
> CIO and Co-founder
> TapHeaven
> Mobile: 443.866.2137
> Email: bkrebs@tapheaven.com
> Twitter: @BKrebsTH
> LinkedIn: www.linkedin.com/in/briankrebs
> tapheaven.com
>