You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andreas Chatzistergiou (Jira)" <ji...@apache.org> on 2022/01/18 16:54:00 UTC

[jira] [Commented] (SPARK-37955) PartitioningAwareFileIndex->basePath incorrectly contains the partition filters

    [ https://issues.apache.org/jira/browse/SPARK-37955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478032#comment-17478032 ] 

Andreas Chatzistergiou commented on SPARK-37955:
------------------------------------------------

I am working on a PR about this. I will post the PR here as soon as it is ready for review. 

> PartitioningAwareFileIndex->basePath incorrectly contains the partition filters
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-37955
>                 URL: https://issues.apache.org/jira/browse/SPARK-37955
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.0
>            Reporter: Andreas Chatzistergiou
>            Priority: Minor
>
> PartitioningAwareFileIndex.getBasePath method returns paths that contain the partitioning directories. This violates the definition of the basePath per FileIndex, i.e. the parent directory of a file path with all the partitioning directories being stripped off. 
> This PR fixes the issue by separating the notion of the partitioningPaths and the basePaths in the PartitioningAwareFileIndex. The basePaths are derived by removing from the partitioningPaths any partitioning columns with the aid of the PartitioningSchema.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org