You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/04/19 00:21:00 UTC

[jira] [Work logged] (HIVE-27006) ParallelEdgeFixer inserts misconfigured operator and does not connect it in Tez DAG

     [ https://issues.apache.org/jira/browse/HIVE-27006?focusedWorklogId=857767&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-857767 ]

ASF GitHub Bot logged work on HIVE-27006:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Apr/23 00:20
            Start Date: 19/Apr/23 00:20
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on PR #4043:
URL: https://github.com/apache/hive/pull/4043#issuecomment-1513946248

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 857767)
    Time Spent: 40m  (was: 0.5h)

> ParallelEdgeFixer inserts misconfigured operator and does not connect it in Tez DAG
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-27006
>                 URL: https://issues.apache.org/jira/browse/HIVE-27006
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Seonggon Namgung
>            Assignee: Seonggon Namgung
>            Priority: Major
>              Labels: hive-4.0.0-must, pull-request-available
>         Attachments: after.PEF.png, tez-dag.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hive fails to run the below query on 1TB ORC formatted TPC-DS dataset because of runtime error happens in one Operator.
> I found that the problematic operator is inserted by ParallelEdgeFixer.
> Also I observed that the corresponding vertex has no descendant vertex although its ReduceSinkOperator has a SemiJoin edge connected to TableScanOperator.
> (I attached the figure of Tez DAG and OperatorGraph. One can check that Cluster6 and Cluster7 are connected while Reducer4 and Map7 are not.)
>  
> Query
> {code:java}
> set hive.optimize.shared.work=true;
> set hive.optimize.shared.work.parallel.edge.support=true;
> with
>   inv00 as (select inv_item_sk, inv_warehouse_sk from inventory, date_dim where inv_date_sk = d_date_sk and d_year = 2000),
>   inv01 as (select inv_item_sk, inv_warehouse_sk from inventory, date_dim where inv_date_sk = d_date_sk and d_year = 2001),
>   inv02 as (select inv_item_sk, inv_warehouse_sk from inventory, date_dim where inv_date_sk = d_date_sk and d_year = 2002),
>   sd00 as (select inv_item_sk id, w_zip zip from inv00 full outer join warehouse on inv_warehouse_sk = w_warehouse_sk where w_state = 'SD'),
>   sd01 as (select inv_item_sk id, w_zip zip from inv01 full outer join warehouse on inv_warehouse_sk = w_warehouse_sk where w_state = 'SD'),
>   sd02 as (select inv_item_sk id, w_zip zip from inv02 full outer join warehouse on inv_warehouse_sk = w_warehouse_sk where w_state = 'SD')
> select * from sd00, sd01, sd02 where sd00.id = sd01.id and sd00.id = sd02.id; {code}
>  
> Error message
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
>         at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:385)
>         at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:301)
>         ... 18 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col0 from []
>         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:384)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>         at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
>         ... 19 more
> Caused by: java.lang.RuntimeException: cannot find field _col0 from []
>         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:550)
>         at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:153)
>         at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
>         at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
>         at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
>         at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:305)
>         ... 22 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)