You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by dominic kim <yo...@linecorp.com> on 2020/03/16 09:44:58 UTC

pyspark(sparksql-v 2.4) cannot read hive table which is created

I use related spark config value but not works like below(success in spark
2.1.1) :
spark.hive.mapred.supports.subdirectories=true
spark.hive.supports.subdirectories=true
spark.mapred.input.dir.recursive=true
spark.hive.mapred.supports.subdirectories=true

And when I query, I also use related hive config but not works like below:
mapred.input.dir.recursive=true
hive.mapred.supports.subdirectories=true

I already know if load the path like
'/user/test/warehouse/somedb.db/dt=20200312/*/' as Dataframein pyspark, it
works. But for complex business logic, I should use spark.sql().

Please give me advise.
Thanks !

* Code
from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext, SparkSession
if __name__ == "__main__":
spark = SparkSession \
.builder \
.appName("Sub-Directory Test") \
.enableHiveSupport() \
.getOrCreate()
spark.sql("select * from somedb.table where dt = '20200301' limit
10").show()

* Hive table directory path
/user/test/warehouse/somedb.db/dt=20200312/1/000000_0
/user/test/warehouse/somedb.db/dt=20200312/1/000000_1
.
.
/user/test/warehouse/somedb.db/dt=20200312/2/000000_0
/user/test/warehouse/somedb.db/dt=20200312/3/000000_0



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: pyspark(sparksql-v 2.4) cannot read hive table which is created

Posted by dominic kim <yo...@linecorp.com>.

I solved the problem with the option below
spark.sql ("SET spark.hadoop.metastore.catalog.default = hive") 
spark.sql ("SET spark.sql.hive.convertMetastoreOrc = false")



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: pyspark(sparksql-v 2.4) cannot read hive table which is created

Posted by dominic kim <yo...@linecorp.com>.

I solved the problem with the option below
spark.sql ("SET spark.hadoop.metastore.catalog.default = hive") 
spark.sql ("SET spark.sql.hive.convertMetastoreOrc = false")



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org