You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Emil Ejbyfeldt (Jira)" <ji...@apache.org> on 2023/10/18 13:35:00 UTC

[jira] [Created] (SPARK-45592) AQE and InMemoryTableScanExec correctness bug

Emil Ejbyfeldt created SPARK-45592:
--------------------------------------

             Summary: AQE and InMemoryTableScanExec correctness bug
                 Key: SPARK-45592
                 URL: https://issues.apache.org/jira/browse/SPARK-45592
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Emil Ejbyfeldt


The following query should return 1000000
{code:java}
import org.apache.spark.storage.StorageLevelval

df = spark.range(0, 1000000, 1, 5).map(l => (l, l))
val ee = df.select($"_1".as("src"), $"_2".as("dst"))
  .persist(StorageLevel.MEMORY_AND_DISK)

ee.count()
val minNbrs1 = ee
  .groupBy("src").agg(min(col("dst")).as("min_number"))
  .persist(StorageLevel.MEMORY_AND_DISK)
val join = ee.join(minNbrs1, "src")
join.count(){code}
but on spark 3.5.0 there is a correctness bug causing it to return `104800` or some other smaller value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org