You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicolas Laduguie (Jira)" <ji...@apache.org> on 2019/09/24 07:21:00 UTC
[jira] [Updated] (SPARK-28558) DatasetWriter partitionBy is
changing the group file permissions in 2.4 for parquets
[ https://issues.apache.org/jira/browse/SPARK-28558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolas Laduguie updated SPARK-28558:
-------------------------------------
Attachment: image-2019-09-24-09-20-07-225.png
> DatasetWriter partitionBy is changing the group file permissions in 2.4 for parquets
> ------------------------------------------------------------------------------------
>
> Key: SPARK-28558
> URL: https://issues.apache.org/jira/browse/SPARK-28558
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.3
> Environment: Hadoop 2.7
> Scala 2.11
> Tested:
> * Spark 2.3.3 - Works
> * Spark 2.4.x - All have the same issue
> Reporter: Stephen Pearson
> Priority: Minor
> Attachments: image-2019-09-24-09-20-07-225.png
>
>
> When writing a parquet using partitionBy the group file permissions are being changed as shown below. This causes members of the group to get "org.apache.hadoop.security.AccessControlException: Open failed for file.... error: Permission denied (13)"
> This worked in 2.3. I found a workaround which was to set "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" which gives the correct behaviour
>
> Code I used to reproduce issue:
> {quote}Seq(("H", 1), ("I", 2))
> .toDF("Letter", "Number")
> .write
> .partitionBy("Letter")
> .parquet(...){quote}
>
> {quote}sparktesting$ tree -dp
> ├── [drwxrws---] letter_testing2.3-defaults
> │ ├── [drwxrws---] Letter=H
> │ └── [drwxrws---] Letter=I
> ├── [drwxrws---] letter_testing2.4-defaults
> │ ├── [drwxrwS---] Letter=H
> │ └── [drwxrwS---] Letter=I
> └── [drwxrws---] letter_testing2.4-file-writer2
> ├── [drwxrws---] Letter=H
> └── [drwxrws---] Letter=I
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org