You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nikolas Vanderhoof (JIRA)" <ji...@apache.org> on 2019/04/03 22:59:00 UTC
[jira] [Created] (SPARK-27359) Joins on some array functions can be
optimized
Nikolas Vanderhoof created SPARK-27359:
------------------------------------------
Summary: Joins on some array functions can be optimized
Key: SPARK-27359
URL: https://issues.apache.org/jira/browse/SPARK-27359
Project: Spark
Issue Type: Improvement
Components: Optimizer, SQL
Affects Versions: 3.0.0
Reporter: Nikolas Vanderhoof
I encounter these cases frequently, and implemented the optimization manually (as shown here). If others experience this as well, perhaps it would be good to add appropriate tree transformations into catalyst. I can create some rough draft implementations but expect I will need assistance when it comes to resolving the generating expressions in the logical plan.
h1. Case 1
A join like this:
{code:scala}
left.join(
right,
arrays_overlap(left("a"), right("b")) // Creates a cartesian product in the logical plan
)
{code}
will produce the same results as:
{code:scala}
{
val leftPrime = left.withColumn("exploded_a", explode(col("a")))
val rightPrime = right.withColumn("exploded_b", explode(col("b")))
leftPrime.join(
rightPrime,
leftPrime("exploded_a") === rightPrime("exploded_b")
// Equijoin doesn't produce cartesian product
).drop("exploded_a", "exploded_b").distinct
}
{code}
h1. Case 2
A join like this:
{code:scala}
left.join(
right,
array_contains(left("arr"), right("value")) // Cartesian product in logical plan
)
{code}
will produce the same results as:
{code:scala}
{
val leftPrime = left.withColumn("exploded_arr", explode(col("arr")))
leftPrime.join(
right,
leftPrime("exploded_arr") === right("value") // Fast equijoin
).drop("exploded_arr").distinct
}
{code}
h1. Case 3
A join like this:
{code:scala}
left.join(
right,
array_contains(right("arr"), left("value")) // Cartesian product in logical plan
)
{code}
will produce the same results as:
{code:scala}
{
val rightPrime = right.withColumn("exploded_arr", explode(col("arr")))
left.join(
rightPrime,
left("value") === rightPrime("exploded_arr") // Fast equijoin
).drop("exploded_arr").distinct
}
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org