You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Vogelbacher (Jira)" <ji...@apache.org> on 2022/07/11 17:25:00 UTC

[jira] [Created] (SPARK-39746) Binary array operations can be faster if one side is a constant

David Vogelbacher created SPARK-39746:
-----------------------------------------

             Summary: Binary array operations can be faster if one side is a constant
                 Key: SPARK-39746
                 URL: https://issues.apache.org/jira/browse/SPARK-39746
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: David Vogelbacher


Array operations such as [ArraysOverlap|https://github.com/apache/spark/blob/79f133b7bbc1d9aa6a20dd8a34ec120902f96155/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L1367] are optimized to put all the elements of the smaller array into a HashSet, if elements properly support equals. 
However, if one of the arrays is a constant, we could do much better as we don't have to reconstruct the HashSet for each row, we could construct it just once and send it to all the executors. This would improve runtime by a constant factor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org