You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mina Aslani <as...@gmail.com> on 2018/05/14 20:30:50 UTC

How to use StringIndexer for multiple input /output columns in Spark Java

Hi,

There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
How multiple input/output columns can be specified then?

Regards,
Mina

Re: How to use StringIndexer for multiple input /output columns in Spark Java

Posted by Bryan Cutler <cu...@gmail.com>.
Yes, the workaround is to create multiple StringIndexers as you described.
OneHotEncoderEstimator is only in Spark 2.3.0, you will have to use just
OneHotEncoder.

On Tue, May 15, 2018, 8:40 AM Mina Aslani <as...@gmail.com> wrote:

> Hi,
>
> So, what is the workaround? Should I create multiple indexer(one for each
> column), and then create pipeline and set stages to have all the
> StringIndexers?
> I am using 2.2.1 as I cannot move to 2.3.0. Looks like
> oneHotEncoderEstimator is broken, please see my email sent today with
> subject:
> OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql
> .Dataset.withColumns
>
> Regards,
> Mina
>
> On Tue, May 15, 2018 at 2:37 AM, Nick Pentreath <ni...@gmail.com>
> wrote:
>
>> Multi column support for StringIndexer didn’t make it into Spark 2.3.0
>>
>> The PR is still in progress I think - should be available in 2.4.0
>>
>> On Mon, 14 May 2018 at 22:32, Mina Aslani <as...@gmail.com> wrote:
>>
>>> Please take a look at the api doc:
>>> https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html
>>>
>>> On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <as...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
>>>> How multiple input/output columns can be specified then?
>>>>
>>>> Regards,
>>>> Mina
>>>>
>>>
>>>
>

Re: How to use StringIndexer for multiple input /output columns in Spark Java

Posted by Mina Aslani <as...@gmail.com>.
Hi,

So, what is the workaround? Should I create multiple indexer(one for each
column), and then create pipeline and set stages to have all the
StringIndexers?
I am using 2.2.1 as I cannot move to 2.3.0. Looks like
oneHotEncoderEstimator is broken, please see my email sent today with
subject:
OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql
.Dataset.withColumns

Regards,
Mina

On Tue, May 15, 2018 at 2:37 AM, Nick Pentreath <ni...@gmail.com>
wrote:

> Multi column support for StringIndexer didn’t make it into Spark 2.3.0
>
> The PR is still in progress I think - should be available in 2.4.0
>
> On Mon, 14 May 2018 at 22:32, Mina Aslani <as...@gmail.com> wrote:
>
>> Please take a look at the api doc: https://spark.apache.org/
>> docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html
>>
>> On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <as...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
>>> How multiple input/output columns can be specified then?
>>>
>>> Regards,
>>> Mina
>>>
>>
>>

Re: How to use StringIndexer for multiple input /output columns in Spark Java

Posted by Nick Pentreath <ni...@gmail.com>.
Multi column support for StringIndexer didn’t make it into Spark 2.3.0

The PR is still in progress I think - should be available in 2.4.0

On Mon, 14 May 2018 at 22:32, Mina Aslani <as...@gmail.com> wrote:

> Please take a look at the api doc:
> https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html
>
> On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <as...@gmail.com> wrote:
>
>> Hi,
>>
>> There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
>> How multiple input/output columns can be specified then?
>>
>> Regards,
>> Mina
>>
>
>

Re: How to use StringIndexer for multiple input /output columns in Spark Java

Posted by Mina Aslani <as...@gmail.com>.
Please take a look at the api doc:
https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html

On Mon, May 14, 2018 at 4:30 PM, Mina Aslani <as...@gmail.com> wrote:

> Hi,
>
> There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
> How multiple input/output columns can be specified then?
>
> Regards,
> Mina
>