You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jason Dere (JIRA)" <ji...@apache.org> on 2017/03/31 01:03:41 UTC

[jira] [Created] (HIVE-16341) Tez Task Execution Summary has incorrect input record counts on some operators

Jason Dere created HIVE-16341:
---------------------------------

             Summary: Tez Task Execution Summary has incorrect input record counts on some operators
                 Key: HIVE-16341
                 URL: https://issues.apache.org/jira/browse/HIVE-16341
             Project: Hive
          Issue Type: Bug
          Components: Tez
            Reporter: Jason Dere
            Assignee: Jason Dere


{noformat}
Task Execution Summary
--------------------------------------------------------------------------------------------------------------------------------
  VERTICES  TOTAL_TASKS  FAILED_ATTEMPTS  KILLED_TASKS   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  OUTPUT_RECORDS
--------------------------------------------------------------------------------------------------------------------------------
     Map 1          167                0             0       17640.00     2,109,200       23,068    150,000,004      11,995,136
    Map 11            5                0             0       10559.00        71,960          633      4,023,690         799,900
    Map 13            1                0             0        2244.00         6,090           29             25               3
     Map 3            1                0             0        2849.00         7,080           99             25               3
     Map 5          271                0             0       55834.00    12,934,890      358,376  1,500,000,001   1,500,000,161
     Map 7          241                0             0       91243.00     5,020,860       71,182  1,827,250,341     652,413,443
Reducer 10            1                0             0        1010.00         1,900            0              4               0
Reducer 12            1                0             0        3854.00         1,320            0        799,900               1
Reducer 14            1                0             0        1420.00         3,790           45              3               1
 Reducer 2            1                0             0        9720.00         6,220          122     11,995,136               1
 Reducer 4            1                0             0         810.00         2,100          105              3               1
 Reducer 6            1                0             0       24863.00         3,260            5  1,500,000,161               1
 Reducer 8          412                0             0       88215.00    17,106,440      184,524  2,165,208,640           1,864
 Reducer 9            2                0             0       29752.00         3,980            0          1,864               4
--------------------------------------------------------------------------------------------------------------------
{noformat}

Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS look incorrect for the reducers that are responsible for aggregating the min/max/bloomfilter (Reducers 12, 14, 2, 6). For example Reducer 2 shows 12M input records. However looking at the task logs for Reducer 2, there were only 167 input records.

It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer 8), but the total output rows for Map 1 (rather than just the rows going to each specific vertex) is being counted in the input rows for both Reducer 2 and Reducer 8.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)