You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/09/09 16:07:00 UTC

[jira] [Commented] (MAPREDUCE-7403) Support spark dynamic partitioning in the Manifest Committer

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17602434#comment-17602434 ] 

Dongjoon Hyun commented on MAPREDUCE-7403:
------------------------------------------

Could you updated the Fix Version? Currently, it seems to have 3.3.9, [~stevel@apache.org] .

> Support spark dynamic partitioning in the Manifest Committer
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-7403
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7403
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.3.9
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.9
>
>
> Currently the spark integration with PathOutputCommitters rejects attempt to instantiate them if dynamic partitioning is enabled. That is because the spark partitioning code assumes that
> # file rename works as a fast and safe commit algorithm
> # the working directory is in the same FS as the final directory
> Assumption 1 doesn't hold on s3a, and #2 isn't true for the staging committers.
> The new abfs/gcs manifest committer and the target stores do meet both requirements. So we no longer need to reject the operation, provided the spark side binding-code can can identify when all is good.
> Proposed: add a new hasCapability() probe which, if, a committer implements StreamCapabilities can be used to see if the committer will work. ManifestCommitter will declare that it holds. As the API has existed since 2.10, it will be immediately available.
> spark's PathOutputCommitProtocol to query the committer in setupCommitter, and fail if dynamicPartitionOverwrite is requested but not available.
> BindingParquetOutputCommitter to implement and forward StreamCapabilities.hasCapability. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org