You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/04/29 07:02:00 UTC

[jira] [Work logged] (HIVE-25071) Number of reducers limited to fixed 1 when updating/deleting

     [ https://issues.apache.org/jira/browse/HIVE-25071?focusedWorklogId=590862&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-590862 ]

ASF GitHub Bot logged work on HIVE-25071:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Apr/21 07:01
            Start Date: 29/Apr/21 07:01
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #2231:
URL: https://github.com/apache/hive/pull/2231


   ### What changes were proposed in this pull request?
   When creating Reducer for bucketing
   1. remove limitation for number of reducers when updating/deleting
   2. Add ROWID to sort columns
   
   ### Why are the changes needed?
   Limiting number of reducers may leads to performance degradation. See Jira for details.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest=TestTxnCommands#testDeleteIn -pl ql -Drat.skip
   ```
   ```
   mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=llap_acid.q -pl itests/qtest -Pitests
   ```
   Search for log entries like
   ```
   2021-04-28T23:40:53,335 DEBUG [ec46860c-7238-4cbd-b14d-80b2e12fc54e main] parse.ParseDriver: Parsing command: 
   update orc_llap_n1 set cbigint = 2 where cint = 1
   ...
   2021-04-28T23:40:53,394  INFO [ec46860c-7238-4cbd-b14d-80b2e12fc54e main] optimizer.SetReducerParallelism: Number of reducers determined to be: 2
   ```
   in `hive.log`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 590862)
    Remaining Estimate: 0h
            Time Spent: 10m

> Number of reducers limited to fixed 1 when updating/deleting
> ------------------------------------------------------------
>
>                 Key: HIVE-25071
>                 URL: https://issues.apache.org/jira/browse/HIVE-25071
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When updating/deleting bucketed tables an extra ReduceSink operator is created to enforce bucketing. After HIVE-22538 number of reducers limited to fixed 1 in these RS operators.
> This can lead to performance degradation.
> Prior HIVE-22538 multiple reducers was available such cases. The reason for limiting the number of reducers is to ensure RowId ascending order in delete delta files produced by the update/delete statements.
> This is the plan of delete statement like:
> {code}
> DELETE FROM t1 WHERE a = 1;
> {code}
> {code}
> TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7]
> {code}
> RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of reducers were limited to bucket number in the table or hive.exec.reducers.max. However RS[5] does not provide any ordering so above plan may generate unsorted deleted deltas which leads to corrupted data reads.
> Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication and the resulting RS kept the ordering and enabled multiple reducers. It could do because ReduceSinkDeduplication was prepared for ACID writes. This was removed by HIVE-22538 to get a more generic ReduceSinkDeduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)