You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2016/06/23 07:11:16 UTC

[jira] [Commented] (SPARK-16164) Filter pushdown should keep the ordering in the logical plan

    [ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345945#comment-15345945 ] 

Dongjoon Hyun commented on SPARK-16164:
---------------------------------------

Hi, [~mengxr].
The root cause seems to be `CombineFilters`. (It is called after predicate push down as you mentioned.)
Currently, `CombineFilters` makes the final condition with `parent condition & child condition`.
We should switch them to solve this issue.
I'll make a PR soon.

> Filter pushdown should keep the ordering in the logical plan
> ------------------------------------------------------------
>
>                 Key: SPARK-16164
>                 URL: https://issues.apache.org/jira/browse/SPARK-16164
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>
> [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with additional filters. It seems that during filter pushdown, we changed the ordering in the logical plan. I'm not sure whether we should treat this as a bug.
> {code}
> val df1 = (0 until 3).map(_.toString).toDF
> val indexer = new StringIndexer()
>   .setInputCol("value")
>   .setOutputCol("idx")
>   .setHandleInvalid("skip")
>   .fit(df1)
> val df2 = (0 until 5).map(_.toString).toDF
> val predictions = indexer.transform(df2)
> predictions.show() // this is okay
> predictions.where('idx > 2).show() // this will throw an exception
> {code}
> Please see the notebook at https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html for error messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org