You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ricardo Martinelli de Oliveira <rm...@redhat.com> on 2020/09/29 13:38:06 UTC
Should SHOW TABLES statement return a hive-compatible output?
Hello,
I came across an issue[1] in PyHive which involves the SHOW TABLES output
from Thrift Server.
When you run a SHOW TABLES statement in beeline, it will return a table
with the following fields: (i) schema name, (ii) table name, (iii)
temporary table flag.
This output is different from what Hive does, which returns a single column
containing all table names.
From the spark[2] docs: "The Thrift JDBC/ODBC server implemented here
corresponds to the HiveServer2 in built-in Hive.". With that being said,
there is a compatibility issue in that particular statement because it
breaks libraries like PyHive.
Now my questions:
1) Is it expected for Thrift Server to be 100% Hive compatible?
2) If the answer to the previous question is yes, is this a bug in spark?
3) What possible problems could bring to spark if we make SHOW TABLES
return just like what Hive returns and make Thrift Server resolve a SHOW
TABLES EXTENDED statement to return what SparkSQL returns?
[1] https://github.com/dropbox/PyHive/issues/146
[2] https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html
--
Ricardo Martinelli De Oliveira
Data Engineer, AI CoE
Red Hat Brazil <https://www.redhat.com/>
@redhatjobs <https://twitter.com/redhatjobs> redhatjobs
<https://www.facebook.com/redhatjobs> @redhatjobs
<https://instagram.com/redhatjobs>
<https://www.redhat.com/>