You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Omer Ozarslan (Jira)" <ji...@apache.org> on 2021/02/11 15:55:00 UTC
[jira] [Commented] (SPARK-34423) Allow FileTable.fileIndex to be
reused for custom partition schema in DataSourceV2 read path
[ https://issues.apache.org/jira/browse/SPARK-34423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283107#comment-17283107 ]
Omer Ozarslan commented on SPARK-34423:
---------------------------------------
If this sounds good, I can happily submit a PR. Thanks.
> Allow FileTable.fileIndex to be reused for custom partition schema in DataSourceV2 read path
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-34423
> URL: https://issues.apache.org/jira/browse/SPARK-34423
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.1
> Reporter: Omer Ozarslan
> Priority: Minor
>
> It is currently possible to provide custom partition schema in DataSourceV2 read path with custom implementations of PartitionAwareFileIndex/PartitionSpec and by overriding fileIndex in a subclass of FileTable. Since fileIndex is lazy val it's not possible to reuse it from the subclass however (i.e. super.fileIndex).
> [https://github.com/apache/spark/blob/e0053853c90d39ef6de9d59fb933525e20bae1fa/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala#L44-L61]
> Duplicating this code in the subclass is possible but somewhat hacky e.g. DataSource globbing function is private API. I was wondering if this logic can be refactored into something like this:
> {code:java}
> def createFileIndex(): PartitionAwareFileIndex = {
> ...[current fileIndex logic]...
> }
> lazy val fileIndex: PartitionAwareFileIndex = createFileIndex(){code}
> This would allow reusing fileIndex logic downstream by wrapping it up with custom implementations.
> (Note that this proposed change considers custom partition schema in read path only. Write path is out of the scope of this change.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org