You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bo Hai (JIRA)" <ji...@apache.org> on 2019/02/24 09:23:00 UTC

[jira] [Comment Edited] (SPARK-26932) Orc compatibility between hive and spark

    [ https://issues.apache.org/jira/browse/SPARK-26932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776181#comment-16776181 ] 

Bo Hai edited comment on SPARK-26932 at 2/24/19 9:22 AM:
---------------------------------------------------------

To reproduce this issue, please create ORC table by Spark 2.3.2/2.4 and read by Hive 2.1.1 like :

spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc AS SELECT * FROM tmp.orcTable1 limit 10;'

hive -e 'select * from tmp.orcTable2;'

Hive will throw exception showing below:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6
at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:74)
at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:385)
at org.apache.orc.OrcFile.createReader(OrcFile.java:222)
at org.apache.orc.tools.FileDump.getReader(FileDump.java:255)
at org.apache.orc.tools.FileDump.printMetaDataImpl(FileDump.java:328)
at org.apache.orc.tools.FileDump.printMetaData(FileDump.java:307)
at org.apache.orc.tools.FileDump.main(FileDump.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


was (Author: haiboself):
To reproduce this issue, please create ORC table by Spark 2.4 and read by Hive 2.1.1 like :

spark-sql -e 'CREATE TABLE tmp.orcTable2 USING orc AS SELECT * FROM tmp.orcTable1 limit 10;'

hive -e 'select * from tmp.orcTable2;'

Hive will throw exception showing below:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6
at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:74)
at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:385)
at org.apache.orc.OrcFile.createReader(OrcFile.java:222)
at org.apache.orc.tools.FileDump.getReader(FileDump.java:255)
at org.apache.orc.tools.FileDump.printMetaDataImpl(FileDump.java:328)
at org.apache.orc.tools.FileDump.printMetaData(FileDump.java:307)
at org.apache.orc.tools.FileDump.main(FileDump.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

> Orc compatibility between hive and spark
> ----------------------------------------
>
>                 Key: SPARK-26932
>                 URL: https://issues.apache.org/jira/browse/SPARK-26932
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>            Reporter: Bo Hai
>            Priority: Minor
>
> As of Spark 2.3 and Hive 2.3, both supports using apache/orc as orc writer and reader. In older version of Hive, orc reader(isn't forward-compitaient) implemented by its own.
> So Hive 2.2 and older can not read orc table created by spark 2.3 and newer which using apache/orc instead of Hive orc.
> I think we should add these information into Spark2.4 orc configuration file : https://spark.apache.org/docs/2.4.0/sql-data-sources-orc.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org