You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2016/06/23 16:32:16 UTC

[jira] [Created] (SPARK-16171) Filter UDFs in StringIndexer shouldn't throw exception

Xiangrui Meng created SPARK-16171:
-------------------------------------

             Summary: Filter UDFs in StringIndexer shouldn't throw exception
                 Key: SPARK-16171
                 URL: https://issues.apache.org/jira/browse/SPARK-16171
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Xiangrui Meng


[~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional filters. It seems that during filter pushdown, we changed the ordering in the logical plan. I'm not sure whether we should treat this as a bug.

{code}
val df1 = (0 until 3).map(_.toString).toDF
val indexer = new StringIndexer()
  .setInputCol("value")
  .setOutputCol("idx")
  .setHandleInvalid("skip")
  .fit(df1)
val df2 = (0 until 5).map(_.toString).toDF
val predictions = indexer.transform(df2)
predictions.show() // this is okay
predictions.where('idx > 2).show() // this will throw an exception
{code}

Please see the notebook at https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html for error messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org