You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/04/16 05:57:00 UTC

[jira] [Assigned] (SPARK-28098) Native ORC reader doesn't support subdirectories with Hive tables

     [ https://issues.apache.org/jira/browse/SPARK-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-28098:
------------------------------------

    Assignee: Apache Spark

> Native ORC reader doesn't support subdirectories with Hive tables
> -----------------------------------------------------------------
>
>                 Key: SPARK-28098
>                 URL: https://issues.apache.org/jira/browse/SPARK-28098
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Douglas Drinka
>            Assignee: Apache Spark
>            Priority: Major
>
> The Hive ORC reader supports recursive directory reads from S3.  Spark's native ORC reader supports recursive directory reads, but not when used with Hive.
>  
> {code:java}
> val testData = List(1,2,3,4,5)
> val dataFrame = testData.toDF()
> dataFrame
> .coalesce(1)
> .write
> .mode(SaveMode.Overwrite)
> .format("orc")
> .option("compression", "zlib")
> .save("s3://ddrinka.sparkbug/dirTest/dir1/dir2/")
> spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.dirTest")
> spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.dirTest (val INT) STORED AS ORC LOCATION 's3://ddrinka.sparkbug/dirTest/'")
> spark.conf.set("hive.mapred.supports.subdirectories","true")
> spark.conf.set("mapred.input.dir.recursive","true")
> spark.conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true")
> spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true")
> println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
> //0
> spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
> println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
> //5{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org