You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Yanbo Liang <yb...@gmail.com> on 2016/07/01 11:00:16 UTC

Re: MinMaxScaler With features include category variables

You can combine the columns which are need to be normalized into a vector
by VectorAssembler and do normalization on it.
Do another assembling for columns should not be normalized. At last, you
can assemble the two vector into one vector as the feature column and feed
it into model training.

Thanks
Yanbo

2016-06-25 21:16 GMT-07:00 段石石 <bu...@gmail.com>:

> Hi all:
>
>
>     I use the MinMaxScaler for data normalization, but I found the the api
> is only for Vector, we must vectorized the features firtst. However, the
> feature usually include two parts: one is need to be Normalization, another
> should not be normalized such as categorical. I want to add a api with the
> DataFrame which aim to normalize the columns which we want to normalize.
> And then we can make it to be vector and sent to the ML model api to train.
> I think that will be very useful for the developer with machine learning.
>
>
>
> Best Regards
>
> Thanks
>

Re: MinMaxScaler With features include category variables

Posted by Yuhao Yang <hh...@gmail.com>.
You may also find VectorSlicer and SQLTransformer useful in your case. Just
out of curiosity, how would you typically handles categorical features,
except for OneHotEncoder.

Regards,
Yuhao

2016-07-01 4:00 GMT-07:00 Yanbo Liang <yb...@gmail.com>:

> You can combine the columns which are need to be normalized into a vector
> by VectorAssembler and do normalization on it.
> Do another assembling for columns should not be normalized. At last, you
> can assemble the two vector into one vector as the feature column and feed
> it into model training.
>
> Thanks
> Yanbo
>
> 2016-06-25 21:16 GMT-07:00 段石石 <bu...@gmail.com>:
>
>> Hi all:
>>
>>
>>     I use the MinMaxScaler for data normalization, but I found the the
>> api is only for Vector, we must vectorized the features firtst. However,
>> the feature usually include two parts: one is need to be Normalization,
>> another should not be normalized such as categorical. I want to add a api
>> with the DataFrame which aim to normalize the columns which we want to
>> normalize. And then we can make it to be vector and sent to the ML model
>> api to train. I think that will be very useful for the developer with
>> machine learning.
>>
>>
>>
>> Best Regards
>>
>> Thanks
>>
>
>