You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Douglas Drinka (JIRA)" <ji...@apache.org> on 2019/06/18 20:14:00 UTC

[jira] [Created] (SPARK-28098) Native ORC reader doesn't support subdirectories with Hive tables

Douglas Drinka created SPARK-28098:
--------------------------------------

             Summary: Native ORC reader doesn't support subdirectories with Hive tables
                 Key: SPARK-28098
                 URL: https://issues.apache.org/jira/browse/SPARK-28098
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.3
            Reporter: Douglas Drinka


The Hive ORC reader supports recursive directory reads from S3.  Spark's native ORC reader supports recursive directory reads, but not when used with Hive.

 
{code:java}
val testData = List(1,2,3,4,5)
val dataFrame = testData.toDF()
dataFrame
.coalesce(1)
.write
.mode(SaveMode.Overwrite)
.format("orc")
.option("compression", "zlib")
.save("s3://ddrinka.sparkbug/dirTest/dir1/dir2/")

spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.dirTest")
spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.dirTest (val INT) STORED AS ORC LOCATION 's3://ddrinka.sparkbug/dirTest/'")

spark.conf.set("hive.mapred.supports.subdirectories","true")
spark.conf.set("mapred.input.dir.recursive","true")
spark.conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true")

spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true")
println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
//0

spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
//5{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org