You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/08/31 14:42:00 UTC

[jira] [Commented] (SPARK-32755) Maintain the order of expressions in AttributeSet and ExpressionSet

    [ https://issues.apache.org/jira/browse/SPARK-32755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187779#comment-17187779 ] 

Apache Spark commented on SPARK-32755:
--------------------------------------

User 'dbaliafroozeh' has created a pull request for this issue:
https://github.com/apache/spark/pull/29598

> Maintain the order of expressions in AttributeSet and ExpressionSet 
> --------------------------------------------------------------------
>
>                 Key: SPARK-32755
>                 URL: https://issues.apache.org/jira/browse/SPARK-32755
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Ali Afroozeh
>            Priority: Major
>
> Expressions identity is based on the ExprId which is an auto-incremented number. This means that the same query can yield a query plan with different expression ids in different runs. AttributeSet and ExpressionSet internally use a HashSet as the underlying data structure, and therefore cannot guarantee the a fixed order of operations in different runs. This can be problematic in cases we like to check for plan changes in different runs.
> We change do the following changes to AttributeSet and ExpressionSet to maintain the insertion order of the elements:
>  * We change the underlying data structure of AttributeSet from HashSet to LinkedHashSet to maintain the insertion order.
>  * ExpressionSet already uses a list to keep track of the expressions, however, since it is extending Scala's immutable.Set class, operations such as map and flatMap are delegated to the immutable.Set itself. This means that the result of these operations is not an instance of ExpressionSet anymore, rather it's a implementation picked up by the parent class. We also remove this inheritance from immutable.Set and implement the needed methods directly. ExpressionSet has a very specific semantics and it does not make sense to extend immutable.Set anyway.
>  * We change the PlanStabilitySuite to not sort the attributes, to be able to catch changes in the order of expressions in different runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org