You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by dominic kim <yo...@linecorp.com> on 2020/03/16 09:44:58 UTC
pyspark(sparksql-v 2.4) cannot read hive table which is created
I use related spark config value but not works like below(success in spark
2.1.1) :
spark.hive.mapred.supports.subdirectories=true
spark.hive.supports.subdirectories=true
spark.mapred.input.dir.recursive=true
spark.hive.mapred.supports.subdirectories=true
And when I query, I also use related hive config but not works like below:
mapred.input.dir.recursive=true
hive.mapred.supports.subdirectories=true
I already know if load the path like
'/user/test/warehouse/somedb.db/dt=20200312/*/' as Dataframein pyspark, it
works. But for complex business logic, I should use spark.sql().
Please give me advise.
Thanks !
* Code
from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext, SparkSession
if __name__ == "__main__":
spark = SparkSession \
.builder \
.appName("Sub-Directory Test") \
.enableHiveSupport() \
.getOrCreate()
spark.sql("select * from somedb.table where dt = '20200301' limit
10").show()
* Hive table directory path
/user/test/warehouse/somedb.db/dt=20200312/1/000000_0
/user/test/warehouse/somedb.db/dt=20200312/1/000000_1
.
.
/user/test/warehouse/somedb.db/dt=20200312/2/000000_0
/user/test/warehouse/somedb.db/dt=20200312/3/000000_0
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: pyspark(sparksql-v 2.4) cannot read hive table which is created
Posted by dominic kim <yo...@linecorp.com>.
I solved the problem with the option below
spark.sql ("SET spark.hadoop.metastore.catalog.default = hive")
spark.sql ("SET spark.sql.hive.convertMetastoreOrc = false")
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: pyspark(sparksql-v 2.4) cannot read hive table which is created
Posted by dominic kim <yo...@linecorp.com>.
I solved the problem with the option below
spark.sql ("SET spark.hadoop.metastore.catalog.default = hive")
spark.sql ("SET spark.sql.hive.convertMetastoreOrc = false")
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org