You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/06/20 04:19:00 UTC

[jira] [Commented] (SPARK-28098) Native ORC reader doesn't support subdirectories with Hive tables

    [ https://issues.apache.org/jira/browse/SPARK-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868250#comment-16868250 ] 

Hyukjin Kwon commented on SPARK-28098:
--------------------------------------

[~ddrinka], do you mind if I ask to check similar stuff as said in https://issues.apache.org/jira/browse/SPARK-28099?focusedCommentId=16868249&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16868249?

> Native ORC reader doesn't support subdirectories with Hive tables
> -----------------------------------------------------------------
>
>                 Key: SPARK-28098
>                 URL: https://issues.apache.org/jira/browse/SPARK-28098
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: Douglas Drinka
>            Priority: Major
>
> The Hive ORC reader supports recursive directory reads from S3.  Spark's native ORC reader supports recursive directory reads, but not when used with Hive.
>  
> {code:java}
> val testData = List(1,2,3,4,5)
> val dataFrame = testData.toDF()
> dataFrame
> .coalesce(1)
> .write
> .mode(SaveMode.Overwrite)
> .format("orc")
> .option("compression", "zlib")
> .save("s3://ddrinka.sparkbug/dirTest/dir1/dir2/")
> spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.dirTest")
> spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.dirTest (val INT) STORED AS ORC LOCATION 's3://ddrinka.sparkbug/dirTest/'")
> spark.conf.set("hive.mapred.supports.subdirectories","true")
> spark.conf.set("mapred.input.dir.recursive","true")
> spark.conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true")
> spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true")
> println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
> //0
> spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
> println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
> //5{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org