You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2017/05/03 06:06:04 UTC

[jira] [Commented] (SPARK-20574) Allow Bucketizer to handle non-Double column

    [ https://issues.apache.org/jira/browse/SPARK-20574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994315#comment-15994315 ] 

Apache Spark commented on SPARK-20574:
--------------------------------------

User 'actuaryzhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/17840

> Allow Bucketizer to handle non-Double column
> --------------------------------------------
>
>                 Key: SPARK-20574
>                 URL: https://issues.apache.org/jira/browse/SPARK-20574
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Wayne Zhang
>
> Bucketizer currently requires input column to be Double, but the logic should work on any numeric data types. Many practical problems have integer/float data types, and it could get very tedious to manually cast them into Double before calling bucketizer. This transformer could be extended to handle all numeric types.  
> The example below shows failure of Bucketizer on integer data. 
> {code}
> val splits = Array(-3.0, 0.0, 3.0)
> val data: Array[Int] = Array(-2, -1, 0, 1, 2)
> val expectedBuckets = Array(0.0, 0.0, 1.0, 1.0, 1.0)
> val dataFrame = data.zip(expectedBuckets).toSeq.toDF("feature", "expected")
> val bucketizer = new Bucketizer()
>   .setInputCol("feature")
>   .setOutputCol("result")
>   .setSplits(splits)
> bucketizer.transform(dataFrame)  
> java.lang.IllegalArgumentException: requirement failed: Column feature must be of type DoubleType but was actually IntegerType.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org