You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/06/20 04:19:00 UTC
[jira] [Commented] (SPARK-28098) Native ORC reader doesn't support
subdirectories with Hive tables
[ https://issues.apache.org/jira/browse/SPARK-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868250#comment-16868250 ]
Hyukjin Kwon commented on SPARK-28098:
--------------------------------------
[~ddrinka], do you mind if I ask to check similar stuff as said in https://issues.apache.org/jira/browse/SPARK-28099?focusedCommentId=16868249&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16868249?
> Native ORC reader doesn't support subdirectories with Hive tables
> -----------------------------------------------------------------
>
> Key: SPARK-28098
> URL: https://issues.apache.org/jira/browse/SPARK-28098
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.3
> Reporter: Douglas Drinka
> Priority: Major
>
> The Hive ORC reader supports recursive directory reads from S3. Spark's native ORC reader supports recursive directory reads, but not when used with Hive.
>
> {code:java}
> val testData = List(1,2,3,4,5)
> val dataFrame = testData.toDF()
> dataFrame
> .coalesce(1)
> .write
> .mode(SaveMode.Overwrite)
> .format("orc")
> .option("compression", "zlib")
> .save("s3://ddrinka.sparkbug/dirTest/dir1/dir2/")
> spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.dirTest")
> spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.dirTest (val INT) STORED AS ORC LOCATION 's3://ddrinka.sparkbug/dirTest/'")
> spark.conf.set("hive.mapred.supports.subdirectories","true")
> spark.conf.set("mapred.input.dir.recursive","true")
> spark.conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true")
> spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true")
> println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
> //0
> spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
> println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
> //5{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org