You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2022/10/31 06:01:00 UTC

[jira] [Resolved] (SPARK-40918) Mismatch between ParquetFileFormat and FileSourceScanExec in # columns for WSCG.isTooManyFields when using _metadata

     [ https://issues.apache.org/jira/browse/SPARK-40918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-40918.
---------------------------------
    Fix Version/s: 3.3.2
       Resolution: Fixed

Issue resolved by pull request 38431
[https://github.com/apache/spark/pull/38431]

> Mismatch between ParquetFileFormat and FileSourceScanExec in # columns for WSCG.isTooManyFields when using _metadata
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40918
>                 URL: https://issues.apache.org/jira/browse/SPARK-40918
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Juliusz Sompolski
>            Assignee: Juliusz Sompolski
>            Priority: Major
>             Fix For: 3.3.2
>
>
> _metadata.columns are taken into account in FileSourceScanExec.supportColumnar, but not when the parquet reader is created. This can result in Parquet reader outputting columnar (because it has less columns than WSCG.isTooManyFields), whereas FileSourceScanExec wants row output (because with the extra metadata columns it hits the isTooManyFields limit).
> I have a fix forthcoming.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org