You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ricardo Martinelli de Oliveira <rm...@redhat.com> on 2020/09/29 13:38:06 UTC

Should SHOW TABLES statement return a hive-compatible output?

Hello,

I came across an issue[1] in PyHive which involves the SHOW TABLES output
from Thrift Server.

When you run a SHOW TABLES statement in beeline, it will return a table
with the following fields: (i) schema name, (ii) table name, (iii)
temporary table flag.

This output is different from what Hive does, which returns a single column
containing all table names.

From the spark[2] docs: "The Thrift JDBC/ODBC server implemented here
corresponds to the HiveServer2 in built-in Hive.". With that being said,
there is a compatibility issue in that particular statement because it
breaks libraries like PyHive.

Now my questions:

1) Is it expected for Thrift Server to be 100% Hive compatible?
2) If the answer to the previous question is yes, is this a bug in spark?
3) What possible problems could bring to spark if we make SHOW TABLES
return just like what Hive returns and make Thrift Server resolve a SHOW
TABLES EXTENDED statement to return what SparkSQL returns?


[1] https://github.com/dropbox/PyHive/issues/146
[2] https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html

-- 

Ricardo Martinelli De Oliveira

Data Engineer, AI CoE

Red Hat Brazil <https://www.redhat.com/>
@redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
<https://www.facebook.com/redhatjobs> @redhatjobs
<https://instagram.com/redhatjobs>
<https://www.redhat.com/>