You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Philippe Verhaeghe (JIRA)" <ji...@apache.org> on 2015/04/13 16:38:12 UTC

[jira] [Created] (HIVE-10316) same query works with TEXTFILE and fails with ORC

Philippe Verhaeghe created HIVE-10316:
-----------------------------------------

             Summary: same query works with TEXTFILE and fails with ORC
                 Key: HIVE-10316
                 URL: https://issues.apache.org/jira/browse/HIVE-10316
             Project: Hive
          Issue Type: Bug
          Components: Compression
    Affects Versions: 0.14.0
         Environment: hortonworks HDP 2.2 running on Linux
            Reporter: Philippe Verhaeghe


See also related answer in mailing list : 
http://mail-archives.apache.org/mod_mbox/hive-user/201504.mbox/%3CD15184D6.27779%25gopal%40hortonworks.com%3E

I’m getting an error in Hive when executing a query on a table in ORC format.
After several trials, I succeeded to run the same query on the same table in TEXTFILE format.
I ‘ve been able to reproduce the error with the simple sql script below.
I create the same table in TEXFILE and in ORC and I run a SELECT …GROUP BY on the tables.
The first SELECT issued on the TEXTFILE table succeeds.

The second SELECT issued on the ORC table fails.
NB : There is a CONCAT in the query. If I remove the CONCAT the query is running ok with both tables …

Example script to reproduce the error :

USE pvr_temp;
DROP TABLE IF EXISTS students_text;
CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE;
INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-');
DROP TABLE IF EXISTS students_orc;
CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC;
INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '-');
13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);


Log where you can see the error :

[pvr@tpcalr01s ~]$ cat test.log
scan complete in 9ms
Connecting to jdbc:hive2://tpcrmm03s:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Hive (version 0.14.0.2.2.0.0-2041)
Driver: Hive JDBC (version 0.14.0.2.2.0.0-2041)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://tpcrmm03s:10000> USE pvr_temp;
No rows affected (0.061 seconds)
0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_text;
No rows affected (0.12 seconds)
0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE;
No rows affected (0.057 seconds)
0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
INFO  : Tez session hasn't been created yet. Opening session
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

INFO  : Map 1: -/-
INFO  : Map 1: 0/1
No rows affected (14.134 seconds)
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table pvr_temp.students_text from hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-08_445_2811483497310651606-20/-ext-10000
INFO  : Table pvr_temp.students_text stats: [numFiles=1, numRows=2, totalSize=86, rawDataSize=84]
0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-');
INFO  : Session is already open
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

INFO  : Map 1: -/-      Reducer 2: 0/1
INFO  : Map 1: 0/1      Reducer 2: 0/1
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
INFO  : Map 1: 1/1      Reducer 2: 0(+1)/1
INFO  : Map 1: 1/1      Reducer 2: 1/1
+--------------+------+--+
|     _c0      | _c1  |
+--------------+------+--+
| 2015-04-13-  | 3.6  |
+--------------+------+--+
1 row selected (3.258 seconds)
0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_orc;
No rows affected (0.109 seconds)
0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC;
No rows affected (0.063 seconds)
0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
No rows affected (2.125 seconds)
INFO  : Session is already open
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table pvr_temp.students_orc from hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-26_056_1247475009666467472-20/-ext-10000
INFO  : Table pvr_temp.students_orc stats: [numFiles=1, numRows=2, totalSize=590, rawDataSize=508]
0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '-');
INFO  : Session is already open
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

INFO  : Map 1: -/-      Reducer 2: 0/1
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
INFO  : Map 1: 0(+1,-1)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-1)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-2)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-2)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-3)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-3)/1       Reducer 2: 0/1
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1428656093356_0047_4_00, diagnostics=[Task failed, taskId=task_1428656093356_0047_4_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
        at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
        at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
        ... 14 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
        at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
        at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
        ... 14 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
        at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
        at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
        ... 14 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
        at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
        at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
        ... 14 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1428656093356_0047_4_00 [Map 1] killed/failed due to:null]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1428656093356_0047_4_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1428656093356_0047_4_01 [Reducer 2] killed/failed due to:null]
ERROR : DAG failed due to vertex failure. failedVertices:1 killedVertices:1
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2)

Closing: 0: jdbc:hive2://tpcrmm03s:10000




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)