You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sanket Reddy (Jira)" <ji...@apache.org> on 2020/01/07 05:40:00 UTC

[jira] [Comment Edited] (SPARK-30411) saveAsTable does not honor spark.hadoop.hive.warehouse.subdir.inherit.perms

    [ https://issues.apache.org/jira/browse/SPARK-30411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009381#comment-17009381 ] 

Sanket Reddy edited comment on SPARK-30411 at 1/7/20 5:39 AM:
--------------------------------------------------------------

[~yumwang]  [PR-22078|https://github.com/apache/spark/pull/22078#issuecomment-458851287] makes sense however and it fixes Hive 3.0.0 and it is not backward compatible change afaik.

My concern is inconsistency in the API's, it should preserve or should not preserve perms and needs to be documented for DDL, DML ops imho. (saveAsTable/insertInto)

Would be useful for users to not go ahead and manually change permissions on the File systems/use umask as a work around.

[~hyukjin.kwon] sure will try the hive implementation and get back but I doubt it would work, will give a try thanks for the quick reply


was (Author: sanket991):
[~yumwang]  [PR-22078|https://github.com/apache/spark/pull/22078#issuecomment-458851287] makes sense however and it fixes Hive 3.0.0 and it is not backward compatible change afaik.

Would be useful for users to not go ahead and manually change permissions on the File systems/use umask as a work around.

[~hyukjin.kwon] sure will try the hive implementation and get back but I doubt it would work, will give a try thanks for the quick reply

> saveAsTable does not honor spark.hadoop.hive.warehouse.subdir.inherit.perms
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-30411
>                 URL: https://issues.apache.org/jira/browse/SPARK-30411
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Sanket Reddy
>            Priority: Minor
>
> {code}
> -bash-4.2$ hdfs dfs -ls /tmp | grep my_databases
>  drwxr-x--T - redsanket users 0 2019-12-04 20:15 /tmp/my_databases
> {code}
> {code}
> >>> spark.sql("CREATE TABLE redsanket_db.example(bcookie string, ip int) STORED AS orc");
> {code}
> {code}
> -bash-4.2$ hdfs dfs -ls /tmp/my_databases | grep example
>  drwxr-x--T - redsanket users 0 2019-12-04 20:20 /tmp/my_databases/example
> {code}
> Now after {{saveAsTable}}
> {code}
>  >>> data = [('First', 1), ('Second', 2), ('Third', 3), ('Fourth', 4), ('Fifth', 5)]
>  >>> df = spark.createDataFrame(data)
>  >>> df.write.format("orc").mode('overwrite').saveAsTable('redsanket_db.example')
> {code}
> {code}
> -bash-4.2$ hdfs dfs -ls /tmp/my_databases | grep example
>  drwx------ - redsanket users 0 2019-12-04 20:23 /tmp/my_databases/example
> {code}
>  Overwrites the permissions
> Insert into honors preserving parent directory permissions.
> {code}
>  >>> spark.sql("DROP table redsanket_db.example");
>  DataFrame[]
>  >>> spark.sql("CREATE TABLE redsanket_db.example(bcookie string, ip int) STORED AS orc");
>  DataFrame[]
>  >>> df.write.format("orc").insertInto('redsanket_db.example')
> {code}
> {code}
> -bash-4.2$ hdfs dfs -ls /tmp/my_databases | grep example
>  drwxr-x--T - redsanket users 0 2019-12-04 20:43 /tmp/my_databases/example
> {code}
>  It is either limitation of the API based on the mode and the behavior has to be documented or needs to be fixed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org