You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2020/08/06 13:33:44 UTC
[spark] branch branch-3.0 updated: [SPARK-32546][SQL][3.0] Get
table names directly from Hive tables
This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new d3eea05 [SPARK-32546][SQL][3.0] Get table names directly from Hive tables
d3eea05 is described below
commit d3eea05f5ffe626bd1390723795e133f912cf8f9
Author: Max Gekk <ma...@gmail.com>
AuthorDate: Thu Aug 6 13:30:42 2020 +0000
[SPARK-32546][SQL][3.0] Get table names directly from Hive tables
### What changes were proposed in this pull request?
Get table names directly from a sequence of Hive tables in `HiveClientImpl.listTablesByType()` by skipping conversions Hive tables to Catalog tables.
### Why are the changes needed?
A Hive metastore can be shared across many clients. A client can create tables using a SerDe which is not available on other clients, for instance `ROW FORMAT SERDE "com.ibm.spss.hive.serde2.xml.XmlSerDe"`. In the current implementation, other clients get the following exception while getting views:
```
java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class com.ibm.spss.hive.serde2.xml.XmlSerDe not found)
```
when `com.ibm.spss.hive.serde2.xml.XmlSerDe` is not available.
### Does this PR introduce _any_ user-facing change?
Yes. For example, `SHOW VIEWS` returns a list of views instead of throwing an exception.
### How was this patch tested?
- By existing test suites like:
```
$ build/sbt -Phive-2.3 "test:testOnly org.apache.spark.sql.hive.client.VersionsSuite"
```
- And manually:
1. Build Spark with Hive 1.2: `./build/sbt package -Phive-1.2 -Phive -Dhadoop.version=2.8.5`
2. Run spark-shell with a custom Hive SerDe, for instance download [json-serde-1.3.8-jar-with-dependencies.jar](https://github.com/cdamak/Twitter-Hive/blob/master/json-serde-1.3.8-jar-with-dependencies.jar) from https://github.com/cdamak/Twitter-Hive:
```
$ ./bin/spark-shell --jars ../Downloads/json-serde-1.3.8-jar-with-dependencies.jar
```
3. Create a Hive table using this SerDe:
```scala
scala> :paste
// Entering paste mode (ctrl-D to finish)
sql(s"""
|CREATE TABLE json_table2(page_id INT NOT NULL)
|ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
|""".stripMargin)
// Exiting paste mode, now interpreting.
res0: org.apache.spark.sql.DataFrame = []
scala> sql("SHOW TABLES").show
+--------+-----------+-----------+
|database| tableName|isTemporary|
+--------+-----------+-----------+
| default|json_table2| false|
+--------+-----------+-----------+
scala> sql("SHOW VIEWS").show
+---------+--------+-----------+
|namespace|viewName|isTemporary|
+---------+--------+-----------+
+---------+--------+-----------+
```
4. Quit from the current `spark-shell` and run it without jars:
```
$ ./bin/spark-shell
```
5. Show views. Without the fix, it throws the exception:
```scala
scala> sql("SHOW VIEWS").show
20/08/06 10:53:36 ERROR log: error in initSerDe: java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found
java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2273)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:385)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605)
```
After the fix:
```scala
scala> sql("SHOW VIEWS").show
+---------+--------+-----------+
|namespace|viewName|isTemporary|
+---------+--------+-----------+
+---------+--------+-----------+
```
Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>
(cherry picked from commit dc96f2f8d6e08c4bc30bc11d6b29109d2aeb604b)
Signed-off-by: Max Gekk <max.gekkgmail.com>
Closes #29377 from MaxGekk/fix-listTablesByType-for-views-3.0.
Authored-by: Max Gekk <ma...@gmail.com>
Signed-off-by: Wenchen Fan <we...@databricks.com>
---
.../scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 6ad5e9d..d131a6b 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -759,15 +759,17 @@ private[hive] class HiveClientImpl(
dbName: String,
pattern: String,
tableType: CatalogTableType): Seq[String] = withHiveState {
+ val hiveTableType = toHiveTableType(tableType)
try {
// Try with Hive API getTablesByType first, it's supported from Hive 2.3+.
- shim.getTablesByType(client, dbName, pattern, toHiveTableType(tableType))
+ shim.getTablesByType(client, dbName, pattern, hiveTableType)
} catch {
case _: UnsupportedOperationException =>
// Fallback to filter logic if getTablesByType not supported.
val tableNames = client.getTablesByPattern(dbName, pattern).asScala
- val tables = getTablesByName(dbName, tableNames).filter(_.tableType == tableType)
- tables.map(_.identifier.table)
+ getRawTablesByName(dbName, tableNames)
+ .filter(_.getTableType == hiveTableType)
+ .map(_.getTableName)
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org