You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2020/05/15 08:23:00 UTC
[jira] [Commented] (SPARK-31675) Fail to insert data to a table with remote location which causes by hive encryption check

    [ https://issues.apache.org/jira/browse/SPARK-31675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108073#comment-17108073 ] 

Wenchen Fan commented on SPARK-31675:
-------------------------------------

This is not a new bug in 3.0, and shouldn't be marked as blocker. I'm changing to major.

> Fail to insert data to a table with remote location which causes by hive encryption check
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-31675
>                 URL: https://issues.apache.org/jira/browse/SPARK-31675
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.6, 3.0.0, 3.1.0
>            Reporter: Kent Yao
>            Priority: Blocker
>
> Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive 2.2.0, when moving files from staging dir to the final table dir, Hive will do encryption check for the srcPaths and destPaths
> {code:java}
> // Some comments here
>      if (!isSrcLocal) {
>         // For NOT local src file, rename the file
>         if (hdfsEncryptionShim != null && (hdfsEncryptionShim.isPathEncrypted(srcf) || hdfsEncryptionShim.isPathEncrypted(destf))
>             && !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf))
>         {
>           LOG.info("Copying source " + srcf + " to " + destf + " because HDFS encryption zones are different.");
>           success = FileUtils.copy(srcf.getFileSystem(conf), srcf, destf.getFileSystem(conf), destf,
>               true,    // delete source
>               replace, // overwrite destination
>               conf);
>         } else {
> {code}
> The hdfsEncryptionShim instance holds a global FileSystem instance belong to the default fileSystem. It causes failures when checking a path that belongs to a remote file system.
> For example, I 
> {code:sql}
> key	int	NULL
> # Detailed Table Information
> Database	bdms_hzyaoqin_test_2
> Table	abc
> Owner	bdms_hzyaoqin
> Created Time	Mon May 11 15:14:15 CST 2020
> Last Access	Thu Jan 01 08:00:00 CST 1970
> Created By	Spark 2.4.3
> Type	MANAGED
> Provider	hive
> Table Properties	[transient_lastDdlTime=1589181255]
> Location	hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc
> Serde Library	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat	org.apache.hadoop.mapred.TextInputFormat
> OutputFormat	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Storage Properties	[serialization.format=1]
> Partition Provider	Catalog
> Time taken: 0.224 seconds, Fetched 18 row(s)
> {code}
> The table abc belongs to the remote hdfs 'hdfs://cluster2', and when we run command below via a spark sql job with default fs is ' 'hdfs://cluster1'
> {code:sql}
> insert into bdms_hzyaoqin_test_2.abc values(1);
> {code}
> {code:java}
> Error in query: java.lang.IllegalArgumentException: Wrong FS: hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-10000/part-00000-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, expected: hdfs://cluster1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org