You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2019/07/19 15:11:48 UTC
Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NoSuchMethodError

Just upgraded Hive  from Hive-3.0 to 3.1.1

Connected to: Apache Hive (version 3.1.1)
Driver: Hive JDBC (version 3.1.1)

Created an ORC table through Spark as below:

sql("use accounts")
//
// Drop and create table ll_18740868
//
sql("DROP TABLE IF EXISTS accounts.ll_18740868")
var sqltext = ""
sqltext = """
CREATE TABLE accounts.ll_18740868 (
TransactionDate            DATE
,TransactionType           String
,SortCode                  String
,AccountNumber             String
,TransactionDescription    String
,DebitAmount               Double
,CreditAmount              Double
,Balance                   Double
)
COMMENT 'from csv file from excel sheet'
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="ZLIB" )
"""
sql(sqltext)

Table is created Ok and populated from CSV files from HDFS

Data is inserted through a Hive temp table created on DataFrame  "a" as
below:

a.toDF.registerTempTable("tmp")

INSERT INTO TABLE accounts.ll_18740868
SELECT
……...
FROM tmp

So the data is there as I can select rows from the ORC table

// example

scala> sql("Select TransactionDate, DebitAmount, CreditAmount, Balance from
ll_18740868 limit 3 ").collect.foreach(println)
[2011-12-30,50.0,null,304.89]
[2011-12-30,19.01,null,354.89]
[2011-12-29,80.1,null,373.9]

However, this select does not work from beeline

0: jdbc:hive2://rhes75:10099/default> Beeline version 3.1.1 by Apache Hive
0: jdbc:hive2://rhes75:10099/default> use accounts;
No rows affected (0.011 seconds)
0: jdbc:hive2://rhes75:10099/default> Select TransactionDate, DebitAmount,
CreditAmount, Balance from ll_18740868 limit 3;
Error: java.io.IOException: java.lang.RuntimeException: ORC split
generation failed with exception: java.lang.NoSuchMethodError:
org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
(state=,code=0)

I thought this problem would have gone away in this release?

So it works through because it uses Spark Tungesten optimiser but not
through Hive!

explain Select TransactionDate, DebitAmount, CreditAmount, Balance from
ll_18740868 limit 3;
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| STAGE DEPENDENCIES:                                |
|   Stage-0 is a root stage                          |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: 3                                     |
|       Processor Tree:                              |
|         TableScan                                  |
|           alias: ll_18740868                       |
|           Statistics: Num rows: 80 Data size: 53535 Basic stats: COMPLETE
Column stats: NONE |
|           Select Operator                          |
|             expressions: transactiondate (type: date), debitamount (type:
double), creditamount (type: double), balance (type: double) |
|             outputColumnNames: _col0, _col1, _col2, _col3 |
|             Statistics: Num rows: 80 Data size: 53535 Basic stats:
COMPLETE Column stats: NONE |
|             Limit                                  |
|               Number of rows: 3                    |
|               Statistics: Num rows: 3 Data size: 2007 Basic stats:
COMPLETE Column stats: NONE |
|               ListSink                             |
|                                                    |
+----------------------------------------------------+

BTW, I tried different settings for

set hive.exec.orc.split.strategy

None worked

Thanks