You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2017/02/21 18:12:44 UTC
[jira] [Commented] (HIVE-15756) Update/deletes on ACID table throws ArrayIndexOutOfBoundsException

    [ https://issues.apache.org/jira/browse/HIVE-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876413#comment-15876413 ] 

Eugene Koifman commented on HIVE-15756:
---------------------------------------

this is caused by hive.enforce.bucketing=false, which is not supported.  In fact, this property doesn't even exist in Hive 2.2

> Update/deletes on ACID table throws ArrayIndexOutOfBoundsException
> ------------------------------------------------------------------
>
>                 Key: HIVE-15756
>                 URL: https://issues.apache.org/jira/browse/HIVE-15756
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.0.0
>            Reporter: Kavan Suresh
>            Assignee: Eugene Koifman
>            Priority: Critical
>
> Update and delete queries on ACID tables fail throwing ArrayIndexOutOfBoundsException.
> {noformat}
> hive> update customer_acid set c_comment = 'foo bar' where c_custkey % 100 = 1;
> Query ID = cstm-hdfs_20170128005823_efa1cdb7-2ad2-4371-ac80-0e35868ad17c
> Total jobs = 1
> Launching Job 1 out of 1
> Tez session was closed. Reopening...
> Session re-established.
> Status: Running (Executing on YARN cluster with App id application_1485331877667_0036)
> --------------------------------------------------------------------------------
>         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
> --------------------------------------------------------------------------------
> Map 1 ..........   SUCCEEDED     14         14        0        0       0       0
> Reducer 2             FAILED      1          0        0        1       1       0
> --------------------------------------------------------------------------------
> VERTICES: 01/02  [========================>>--] 93%   ELAPSED TIME: 23.68 s    
> --------------------------------------------------------------------------------
> Status: Failed
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1485331877667_0036_1_01, diagnostics=[Task failed, taskId=task_1485331877667_0036_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> 	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> 	... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
> 	... 16 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> 	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:780)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> 	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
> 	... 17 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1485331877667_0036_1_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1485331877667_0036_1_01, diagnostics=[Task failed, taskId=task_1485331877667_0036_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> 	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> 	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> 	... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":72,"bucketid":1,"rowid":0}},"value":{"_col0":103601,"_col1":"Customer#000103601","_col2":"3cYSrJtAA36vth35 emuIk","_col3":20,"_col4":"30-526-248-3190","_col5":8047.21,"_col6":"MACHINERY "}}
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
> 	... 16 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> 	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:780)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> 	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
> 	... 17 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1485331877667_0036_1_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> {noformat}
> {noformat}
> hive> explain extended update customer_acid set c_comment = 'foo bar' where c_custkey % 100 = 1;
> OK
> ABSTRACT SYNTAX TREE:
>   
> TOK_UPDATE_TABLE
>    TOK_TABNAME
>       customer_acid
>    TOK_SET_COLUMNS_CLAUSE
>       =
>          TOK_TABLE_OR_COL
>             c_comment
>          'foo bar'
>    TOK_WHERE
>       =
>          %
>             TOK_TABLE_OR_COL
>                c_custkey
>             100
>          1
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: cstm-hdfs_20170128012834_4d41e184-1e40-443c-9990-147cfdc6ea15:5
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName: 
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: customer_acid
>                   filterExpr: ((c_custkey % 100) = 1) (type: boolean)
>                   Statistics: Num rows: 25219 Data size: 8700894 Basic stats: COMPLETE Column stats: NONE
>                   GatherStats: false
>                   Filter Operator
>                     isSamplingPred: false
>                     predicate: ((c_custkey % 100) = 1) (type: boolean)
>                     Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: ROW__ID (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>), c_custkey (type: int), c_name (type: string), c_address (type: string), c_nationkey (type: int), c_phone (type: char(15)), c_acctbal (type: decimal(15,2)), c_mktsegment (type: char(10))
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
>                       Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>)
>                         sort order: +
>                         Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
>                         tag: -1
>                         value expressions: _col1 (type: int), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: char(15)), _col6 (type: decimal(15,2)), _col7 (type: char(10))
>                         auto parallelism: true
>             Path -> Alias:
>               hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid [customer_acid]
>             Path -> Partition:
>               hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid 
>                 Partition
>                   base file name: customer_acid
>                   input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                   output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                   properties:
>                     bucket_count 8
>                     bucket_field_name c_custkey
>                     columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
>                     columns.comments 
>                     columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
>                     file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                     file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                     location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
>                     name tpch.customer_acid
>                     numFiles 12
>                     numRows 0
>                     rawDataSize 0
>                     serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
>                     serialization.format 1
>                     serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                     totalSize 8700894
>                     transactional true
>                     transient_lastDdlTime 1485548417
>                   serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                 
>                     input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                     properties:
>                       bucket_count 8
>                       bucket_field_name c_custkey
>                       columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
>                       columns.comments 
>                       columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
>                       file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                       file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                       location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
>                       name tpch.customer_acid
>                       numFiles 12
>                       numRows 0
>                       rawDataSize 0
>                       serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
>                       serialization.format 1
>                       serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                       totalSize 8700894
>                       transactional true
>                       transient_lastDdlTime 1485548417
>                     serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                     name: tpch.customer_acid
>                   name: tpch.customer_acid
>             Truncated Path -> Alias:
>               /tpch.db/customer_acid [customer_acid]
>         Reducer 2 
>             Needs Tagging: false
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>), VALUE._col0 (type: int), VALUE._col1 (type: string), VALUE._col2 (type: string), VALUE._col3 (type: int), VALUE._col4 (type: char(15)), VALUE._col5 (type: decimal(15,2)), VALUE._col6 (type: char(10)), 'foo bar' (type: string)
>                 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
>                 Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   GlobalTableId: 1
>                   directory: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000
>                   NumFilesPerFileSink: 1
>                   Statistics: Num rows: 12609 Data size: 4350274 Basic stats: COMPLETE Column stats: NONE
>                   Stats Publishing Key Prefix: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000/
>                   table:
>                       input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                       output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                       properties:
>                         bucket_count 8
>                         bucket_field_name c_custkey
>                         columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
>                         columns.comments 
>                         columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
>                         file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                         file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                         location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
>                         name tpch.customer_acid
>                         numFiles 12
>                         numRows 0
>                         rawDataSize 0
>                         serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
>                         serialization.format 1
>                         serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                         totalSize 8700894
>                         transactional true
>                         transient_lastDdlTime 1485548417
>                       serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                       name: tpch.customer_acid
>                   TotalFiles: 1
>                   GatherStats: true
>                   MultiFileSpray: false
>   Stage: Stage-2
>     Dependency Collection
>   Stage: Stage-0
>     Move Operator
>       tables:
>           replace: false
>           source: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               properties:
>                 bucket_count 8
>                 bucket_field_name c_custkey
>                 columns c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
>                 columns.comments 
>                 columns.types int:string:string:int:char(15):decimal(15,2):char(10):string
>                 file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                 location hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid
>                 name tpch.customer_acid
>                 numFiles 12
>                 numRows 0
>                 rawDataSize 0
>                 serialization.ddl struct customer_acid { i32 c_custkey, string c_name, string c_address, i32 c_nationkey, char(15) c_phone, decimal(15,2) c_acctbal, char(10) c_mktsegment, string c_comment}
>                 serialization.format 1
>                 serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                 totalSize 8700894
>                 transactional true
>                 transient_lastDdlTime 1485548417
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: tpch.customer_acid
>   Stage: Stage-3
>     Stats-Aggr Operator
>       Stats Aggregation Key Prefix: hdfs://hive-acid-upgrade-issue-5.openstacklocal:8020/apps/hive/warehouse/tpch.db/customer_acid/.hive-staging_hive_2017-01-28_01-28-34_547_5091220054599015088-1/-ext-10000/
> Time taken: 0.422 seconds, Fetched: 189 row(s)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)