You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/02 19:53:00 UTC

[jira] [Updated] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the files with zero record

     [ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-26709:
----------------------------------
    Affects Version/s: 2.1.0

> OptimizeMetadataOnlyQuery does not correctly handle the files with zero record
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-26709
>                 URL: https://issues.apache.org/jira/browse/SPARK-26709
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0, 2.1.3, 2.2.3, 2.3.2, 2.4.0
>            Reporter: Xiao Li
>            Assignee: Gengliang Wang
>            Priority: Blocker
>              Labels: correctness
>             Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> {code:java}
> import org.apache.spark.sql.functions.lit
> withSQLConf(SQLConf.OPTIMIZER_METADATA_ONLY.key -> "true") {
>   withTempPath { path =>
>     val tabLocation = path.getAbsolutePath
>     val partLocation = new Path(path.getAbsolutePath, "partCol1=3")
>     val df = spark.emptyDataFrame.select(lit(1).as("col1"))
>     df.write.parquet(partLocation.toString)
>     val readDF = spark.read.parquet(tabLocation)
>     checkAnswer(readDF.selectExpr("max(partCol1)"), Row(null))
>     checkAnswer(readDF.selectExpr("max(col1)"), Row(null))
>   }
> }
> {code}
> OptimizeMetadataOnlyQuery has a correctness bug to handle the file with the empty records for partitioned tables. The above test will fail in 2.4, which can generate an empty file, but the underlying issue in the read path still exists in 2.3, 2.2 and 2.1. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org