You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Omer Ozarslan (Jira)" <ji...@apache.org> on 2021/02/11 15:55:00 UTC

[jira] [Commented] (SPARK-34423) Allow FileTable.fileIndex to be reused for custom partition schema in DataSourceV2 read path

    [ https://issues.apache.org/jira/browse/SPARK-34423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283107#comment-17283107 ] 

Omer Ozarslan commented on SPARK-34423:
---------------------------------------

If this sounds good, I can happily submit a PR. Thanks.

> Allow FileTable.fileIndex to be reused for custom partition schema in DataSourceV2 read path
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34423
>                 URL: https://issues.apache.org/jira/browse/SPARK-34423
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.1
>            Reporter: Omer Ozarslan
>            Priority: Minor
>
> It is currently possible to provide custom partition schema in DataSourceV2 read path with custom implementations of PartitionAwareFileIndex/PartitionSpec and by overriding fileIndex in a subclass of FileTable. Since fileIndex is lazy val it's not possible to reuse it from the subclass however (i.e. super.fileIndex).
> [https://github.com/apache/spark/blob/e0053853c90d39ef6de9d59fb933525e20bae1fa/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala#L44-L61]
> Duplicating this code in the subclass is possible but somewhat hacky e.g. DataSource globbing function is private API. I was wondering if this logic can be refactored into something like this:
> {code:java}
> def createFileIndex(): PartitionAwareFileIndex = {
>   ...[current fileIndex logic]...
> }
> lazy val fileIndex: PartitionAwareFileIndex = createFileIndex(){code}
> This would allow reusing fileIndex logic downstream by wrapping it up with custom implementations.
> (Note that this proposed change considers custom partition schema in read path only. Write path is out of the scope of this change.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org