You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Nithin (Jira)" <ji...@apache.org> on 2020/05/18 16:05:00 UTC

[jira] [Created] (SPARK-31751) spark serde property path overwrites table property location

Nithin created SPARK-31751:
------------------------------

             Summary: spark serde property path overwrites table property location
                 Key: SPARK-31751
                 URL: https://issues.apache.org/jira/browse/SPARK-31751
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.1
            Reporter: Nithin


This is an issue that have caused us so many data errors. 

1) using spark ( with hive context enabled )

df = spark.createDataFrame([\{"a": "x", "b": "y", "c": "3"}])
df.write.format("orc").option("compression", "ZLIB").mode("overwrite").saveAsTable('test_spark');

 

2) from hive 

alter table test_spark rename to test_spark2

 

3)from spark-sql from command line ( note : not pyspark or spark-shell )  

select * from test_spark2

 

will give output 

NULL NULL NULL
Time taken: 0.334 seconds, Fetched 1 row(s)

 

This will throw NULL because , pyspark write API will add a serde property called path into the hive metastore. when hive renames the table , it do not understand this serde and hence keep it as it is. Now when spark-sql tries to read it , it will honor the serde property first and then tries to read from the non-existent hdfs location. If it had given an error , then also it would have been fine , but throwing out NULL will cause applications to fail pretty bad. Spark claims to support hive tables , hence it should respect hive metastore location property rather than spark serde property when trying to read a table. This cannot be classified as a expected behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org