You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Verhaeghe Philippe <Ph...@worldline.com> on 2015/04/13 14:27:36 UTC

same query works with TEXTFILE and fails with ORC

I'm getting an error in Hive when executing a query on a table in ORC format.
After several trials, I succeeded to run the same query on the same table in TEXTFILE format.
I 've been able to reproduce the error with the simple sql script below.
I create the same table in TEXFILE and in ORC and I run a SELECT ...GROUP BY on the tables.
The first SELECT issued on the TEXTFILE table succeeds.
The second SELECT issued on the ORC table fails.
NB : There is a CONCAT in the query. If I remove the CONCAT the query is running ok with both tables ...

Example script to reproduce the error :

USE pvr_temp;
DROP TABLE IF EXISTS students_text;
CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE;
INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-');
DROP TABLE IF EXISTS students_orc;
CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC;
INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '-');
13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);


Log where you can see the error :

[pvr@tpcalr01s ~]$ cat test.log
scan complete in 9ms
Connecting to jdbc:hive2://tpcrmm03s:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Hive (version 0.14.0.2.2.0.0-2041)
Driver: Hive JDBC (version 0.14.0.2.2.0.0-2041)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://tpcrmm03s:10000> USE pvr_temp;
No rows affected (0.061 seconds)
0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_text;
No rows affected (0.12 seconds)
0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE;
No rows affected (0.057 seconds)
0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
INFO  : Tez session hasn't been created yet. Opening session
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

INFO  : Map 1: -/-
INFO  : Map 1: 0/1
No rows affected (14.134 seconds)
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table pvr_temp.students_text from hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-08_445_2811483497310651606-20/-ext-10000
INFO  : Table pvr_temp.students_text stats: [numFiles=1, numRows=2, totalSize=86, rawDataSize=84]
0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-');
INFO  : Session is already open
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

INFO  : Map 1: -/-      Reducer 2: 0/1
INFO  : Map 1: 0/1      Reducer 2: 0/1
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
INFO  : Map 1: 1/1      Reducer 2: 0(+1)/1
INFO  : Map 1: 1/1      Reducer 2: 1/1
+--------------+------+--+
|     _c0      | _c1  |
+--------------+------+--+
| 2015-04-13-  | 3.6  |
+--------------+------+--+
1 row selected (3.258 seconds)
0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_orc;
No rows affected (0.109 seconds)
0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC;
No rows affected (0.063 seconds)
0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
No rows affected (2.125 seconds)
INFO  : Session is already open
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table pvr_temp.students_orc from hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-26_056_1247475009666467472-20/-ext-10000
INFO  : Table pvr_temp.students_orc stats: [numFiles=1, numRows=2, totalSize=590, rawDataSize=508]
0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '-');
INFO  : Session is already open
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id application_1428656093356_0047)

INFO  : Map 1: -/-      Reducer 2: 0/1
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
INFO  : Map 1: 0(+1,-1)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-1)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-2)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-2)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-3)/1       Reducer 2: 0/1
INFO  : Map 1: 0(+1,-3)/1       Reducer 2: 0/1
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1428656093356_0047_4_00, diagnostics=[Task failed, taskId=task_1428656093356_0047_4_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
        at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
        at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
        ... 14 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
        at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
        at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
        ... 14 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
        at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
        at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
        ... 14 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector output type: StringGroup
        at org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
        at org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
        at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
        ... 14 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1428656093356_0047_4_00 [Map 1] killed/failed due to:null]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1428656093356_0047_4_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1428656093356_0047_4_01 [Reducer 2] killed/failed due to:null]
ERROR : DAG failed due to vertex failure. failedVertices:1 killedVertices:1
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2)

Closing: 0: jdbc:hive2://tpcrmm03s:10000

________________________________

Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant ?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre recherch?e quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.

RE: same query works with TEXTFILE and fails with ORC

Posted by Verhaeghe Philippe <Ph...@worldline.com>.
Bug created in JIRA as HIVE-10316

-----Message d'origine-----
De : Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] De la part de Gopal Vijayaraghavan
Envoyé : Monday, April 13, 2015 11:46 PM
À : user@hive.apache.org
Objet : Re: same query works with TEXTFILE and fails with ORC

> I¹m getting an error in Hive when executing a query on a table in ORC
>format.

This is not an ORC bug, this looks like a vectorization issue.

Can you try comparing both query plans (³explain <query>²) for the Execution mode: vectorized markers?

TextFile queries are not vectorized today, since you cannot find if any column is marked as isRepeating=true in a row-major format.

> SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_orc
>GROUP BY CONCAT(TO_DATE(datetime), '-Œ);

...
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>Unsuported vector output type: StringGroup
>        at
>org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(Vector
>Col
>umnSetInfo.java:139)
>        at
>org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compile
>Key
>WrapperBatch(VectorHashKeyWrapperBatch.java:521)
>        at
>org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeO
>p(V
>ectorGroupByOperator.java:786)

The correct fix would be to handle this query pattern for vectorization (or automatically disable vectorization, like it has to do for Unions).

Can you log a bug on Apache JIRA against the correct version of hive which threw this error up?

Cheers,
Gopal



Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.

Re: same query works with TEXTFILE and fails with ORC

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> I¹m getting an error in Hive when executing a query on a table in ORC
>format.

This is not an ORC bug, this looks like a vectorization issue.

Can you try comparing both query plans (³explain <query>²) for the
Execution mode: vectorized markers?

TextFile queries are not vectorized today, since you cannot find if any
column is marked as isRepeating=true in a row-major format.

> SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_orc
>GROUP BY CONCAT(TO_DATE(datetime), '-Œ);

...
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported
>vector output type: StringGroup
>        at 
>org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorCol
>umnSetInfo.java:139)
>        at 
>org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKey
>WrapperBatch(VectorHashKeyWrapperBatch.java:521)
>        at 
>org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(V
>ectorGroupByOperator.java:786)

The correct fix would be to handle this query pattern for vectorization
(or automatically disable vectorization, like it has to do for Unions).

Can you log a bug on Apache JIRA against the correct version of hive which
threw this error up?

Cheers,
Gopal