You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2017/09/15 05:33:00 UTC

[jira] [Resolved] (SPARK-22018) Catalyst Optimizer does not preserve top-level metadata while collapsing projects

     [ https://issues.apache.org/jira/browse/SPARK-22018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tathagata Das resolved SPARK-22018.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 19240
[https://github.com/apache/spark/pull/19240]

> Catalyst Optimizer does not preserve top-level metadata while collapsing projects
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-22018
>                 URL: https://issues.apache.org/jira/browse/SPARK-22018
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, Structured Streaming
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>             Fix For: 3.0.0
>
>
> If there are two projects like as follows.
> {code}
> Project [a_with_metadata#27 AS b#26]
> +- Project [a#0 AS a_with_metadata#27]
>    +- LocalRelation <empty>, [a#0, b#1]
> {code}
> Child Project has an output column with a metadata in it, and the parent Project has an alias that implicitly forwards the metadata. So this metadata is visible for higher operators. Upon applying CollapseProject optimizer rule, the metadata is not preserved.
> {code}
> Project [a#0 AS b#26]
> +- LocalRelation <empty>, [a#0, b#1]
> {code}
> This is incorrect, as downstream operators that expect certain metadata (e.g. watermark in structured streaming) to identify certain fields will fail to do so.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org