You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Emil Ejbyfeldt (Jira)" <ji...@apache.org> on 2023/10/18 13:35:00 UTC
[jira] [Created] (SPARK-45592) AQE and InMemoryTableScanExec correctness bug
Emil Ejbyfeldt created SPARK-45592:
--------------------------------------
Summary: AQE and InMemoryTableScanExec correctness bug
Key: SPARK-45592
URL: https://issues.apache.org/jira/browse/SPARK-45592
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.5.0
Reporter: Emil Ejbyfeldt
The following query should return 1000000
{code:java}
import org.apache.spark.storage.StorageLevelval
df = spark.range(0, 1000000, 1, 5).map(l => (l, l))
val ee = df.select($"_1".as("src"), $"_2".as("dst"))
.persist(StorageLevel.MEMORY_AND_DISK)
ee.count()
val minNbrs1 = ee
.groupBy("src").agg(min(col("dst")).as("min_number"))
.persist(StorageLevel.MEMORY_AND_DISK)
val join = ee.join(minNbrs1, "src")
join.count(){code}
but on spark 3.5.0 there is a correctness bug causing it to return `104800` or some other smaller value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org