You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2023/04/21 04:20:00 UTC
[jira] [Updated] (SPARK-43226) Define extractors for file-constant metadata columns
[ https://issues.apache.org/jira/browse/SPARK-43226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-43226:
---------------------------------
Target Version/s: (was: 3.5.0)
> Define extractors for file-constant metadata columns
> ----------------------------------------------------
>
> Key: SPARK-43226
> URL: https://issues.apache.org/jira/browse/SPARK-43226
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 3.4.0
> Reporter: Ryan Johnson
> Priority: Major
>
> File-source constant metadata columns are often derived indirectly from file-level metadata values rather than exposing those values directly. For example, {{_metadata.file_name}} is currently hard-coded in {{FileFormat.updateMetadataInternalRow}} as:
>
> {code:java}
> UTF8String.fromString(filePath.getName){code}
>
> We should add support for metadata extractors, functions that map from {{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns in a generic way instead of hard-coding them.
> We can't just add them to the metadata map because then they have to be pre-computed even if it turns out the query does not select that field.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org