You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Douglas Drinka (JIRA)" <ji...@apache.org> on 2019/06/18 20:14:00 UTC
[jira] [Created] (SPARK-28098) Native ORC reader doesn't support
subdirectories with Hive tables
Douglas Drinka created SPARK-28098:
--------------------------------------
Summary: Native ORC reader doesn't support subdirectories with Hive tables
Key: SPARK-28098
URL: https://issues.apache.org/jira/browse/SPARK-28098
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.4.3
Reporter: Douglas Drinka
The Hive ORC reader supports recursive directory reads from S3. Spark's native ORC reader supports recursive directory reads, but not when used with Hive.
{code:java}
val testData = List(1,2,3,4,5)
val dataFrame = testData.toDF()
dataFrame
.coalesce(1)
.write
.mode(SaveMode.Overwrite)
.format("orc")
.option("compression", "zlib")
.save("s3://ddrinka.sparkbug/dirTest/dir1/dir2/")
spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.dirTest")
spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.dirTest (val INT) STORED AS ORC LOCATION 's3://ddrinka.sparkbug/dirTest/'")
spark.conf.set("hive.mapred.supports.subdirectories","true")
spark.conf.set("mapred.input.dir.recursive","true")
spark.conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true")
spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true")
println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
//0
spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
//5{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org