You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/03/16 22:51:08 UTC
[jira] [Updated] (SPARK-28098) Native ORC reader doesn't support
subdirectories with Hive tables
[ https://issues.apache.org/jira/browse/SPARK-28098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-28098:
----------------------------------
Affects Version/s: (was: 3.0.0)
3.1.0
> Native ORC reader doesn't support subdirectories with Hive tables
> -----------------------------------------------------------------
>
> Key: SPARK-28098
> URL: https://issues.apache.org/jira/browse/SPARK-28098
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Douglas Drinka
> Priority: Major
>
> The Hive ORC reader supports recursive directory reads from S3. Spark's native ORC reader supports recursive directory reads, but not when used with Hive.
>
> {code:java}
> val testData = List(1,2,3,4,5)
> val dataFrame = testData.toDF()
> dataFrame
> .coalesce(1)
> .write
> .mode(SaveMode.Overwrite)
> .format("orc")
> .option("compression", "zlib")
> .save("s3://ddrinka.sparkbug/dirTest/dir1/dir2/")
> spark.sql("DROP TABLE IF EXISTS ddrinka_sparkbug.dirTest")
> spark.sql("CREATE EXTERNAL TABLE ddrinka_sparkbug.dirTest (val INT) STORED AS ORC LOCATION 's3://ddrinka.sparkbug/dirTest/'")
> spark.conf.set("hive.mapred.supports.subdirectories","true")
> spark.conf.set("mapred.input.dir.recursive","true")
> spark.conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true")
> spark.conf.set("spark.sql.hive.convertMetastoreOrc", "true")
> println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
> //0
> spark.conf.set("spark.sql.hive.convertMetastoreOrc", "false")
> println(spark.sql("SELECT * FROM ddrinka_sparkbug.dirTest").count)
> //5{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org