You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/05/21 14:29:00 UTC
[jira] [Commented] (SPARK-34558) warehouse path should be resolved
ahead of populating and use
[ https://issues.apache.org/jira/browse/SPARK-34558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349295#comment-17349295 ]
Apache Spark commented on SPARK-34558:
--------------------------------------
User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/32622
> warehouse path should be resolved ahead of populating and use
> -------------------------------------------------------------
>
> Key: SPARK-34558
> URL: https://issues.apache.org/jira/browse/SPARK-34558
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.2, 3.1.2
> Reporter: Kent Yao
> Assignee: Kent Yao
> Priority: Major
> Fix For: 3.2.0
>
>
> Currently, the warehouse path gets fully qualified in the caller side for creating a database, table, partition, etc. An unqualified path is populated into Spark and Hadoop confs, which leads to inconsistent API behaviors. We should make it qualified ahead.
> When the value is a relative path `spark.sql.warehouse.dir=lakehouse`, for example.
> If the default database is absent at runtime, the app fails with
> {code:java}
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./datalake
> at org.apache.hadoop.fs.Path.initialize(Path.java:263)
> at org.apache.hadoop.fs.Path.<init>(Path.java:254)
> at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:133)
> at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:137)
> at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:150)
> at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:163)
> at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:636)
> at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
> at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
> at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
> ... 73 more
> {code}
> If the default database is present at runtime, the app can work with it, and if we create a database, it gets fully qualified, for example
> {code:sql}
> spark-sql> create database test2 location 'datalake';
> 21/02/26 21:52:57 WARN ObjectStore: Failed to get database test2, returning NoSuchObjectException
> Time taken: 0.052 seconds
> spark-sql> desc database test;
> Database Name test
> Comment
> Location file:/Users/kentyao/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210226/datalake/test.db
> Owner kentyao
> Time taken: 0.023 seconds, Fetched 4 row(s)
> {code}
> Another thing is that the log becomes nubilous, for example.
> {code:java}
> 21/02/27 13:54:17 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('datalake').
> 21/02/27 13:54:17 INFO SharedState: Warehouse path is 'datalake'.
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org