You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2022/12/12 10:17:00 UTC

[jira] [Assigned] (SPARK-40775) V2 file scans have duplicative descriptions

     [ https://issues.apache.org/jira/browse/SPARK-40775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan reassigned SPARK-40775:
-----------------------------------

    Assignee: Adam Binford

> V2 file scans have duplicative descriptions
> -------------------------------------------
>
>                 Key: SPARK-40775
>                 URL: https://issues.apache.org/jira/browse/SPARK-40775
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Adam Binford
>            Assignee: Adam Binford
>            Priority: Major
>
> V2 file scans have duplication in the description. This is because FileScan uses the metadata to create the description, but each file type overrides metadata and the description adding the same metadata.
> Example from a parquet agg pushdown explain:
> {{ *+- BatchScan parquet file:/...[min(_3)#814, max(_3)#815, min(_1)#816, max(_1)#817, count(*)#818L, count(_1)#819L, count(_2)#820L, count(_3)#821L] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/..., PartitionFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<min(_3):int,max(_3):int,min(_1):int,max(_1):int,count(*):bigint,count(_1):bigint,count(_2)..., PushedFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedGroupBy: [] RuntimeFilters: []*}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org