You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Vogelbacher (Jira)" <ji...@apache.org> on 2022/07/11 17:25:00 UTC
[jira] [Created] (SPARK-39746) Binary array operations can be faster if one side is a constant
David Vogelbacher created SPARK-39746:
-----------------------------------------
Summary: Binary array operations can be faster if one side is a constant
Key: SPARK-39746
URL: https://issues.apache.org/jira/browse/SPARK-39746
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.3.0
Reporter: David Vogelbacher
Array operations such as [ArraysOverlap|https://github.com/apache/spark/blob/79f133b7bbc1d9aa6a20dd8a34ec120902f96155/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L1367] are optimized to put all the elements of the smaller array into a HashSet, if elements properly support equals.
However, if one of the arrays is a constant, we could do much better as we don't have to reconstruct the HashSet for each row, we could construct it just once and send it to all the executors. This would improve runtime by a constant factor.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org