You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2023/04/21 04:20:00 UTC

[jira] [Updated] (SPARK-43226) Define extractors for file-constant metadata columns

     [ https://issues.apache.org/jira/browse/SPARK-43226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-43226:
---------------------------------
    Target Version/s:   (was: 3.5.0)

> Define extractors for file-constant metadata columns
> ----------------------------------------------------
>
>                 Key: SPARK-43226
>                 URL: https://issues.apache.org/jira/browse/SPARK-43226
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Ryan Johnson
>            Priority: Major
>
> File-source constant metadata columns are often derived indirectly from file-level metadata values rather than exposing those values directly. For example, {{_metadata.file_name}} is currently hard-coded in {{FileFormat.updateMetadataInternalRow}} as:
>  
> {code:java}
> UTF8String.fromString(filePath.getName){code}
>  
> We should add support for metadata extractors, functions that map from {{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns in a generic way instead of hard-coding them.
> We can't just add them to the metadata map because then they have to be pre-computed even if it turns out the query does not select that field.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org