You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/04/09 02:27:00 UTC

[jira] [Work logged] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations

     [ https://issues.apache.org/jira/browse/HIVE-24854?focusedWorklogId=579697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579697 ]

ASF GitHub Bot logged work on HIVE-24854:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Apr/21 02:26
            Start Date: 09/Apr/21 02:26
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on a change in pull request #2119:
URL: https://github.com/apache/hive/pull/2119#discussion_r610292866



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java
##########
@@ -263,6 +279,10 @@ public boolean isInsideView() {
     return insideView;
   }
 
+  public boolean isFetchDeletedRows() {
+    return fetchDeletedRows;
+  }
+
   // We need to include isInsideView inside digest to differentiate direct
   // tables and tables inside view. Otherwise, Calcite will treat them as the same.
   // Also include partition list key to trigger cost evaluation even if an

Review comment:
       The `computeDigest` method should include `fetchDeletedRows` in the computed digest.
   Otherwise, if you would have a plan with two TS on the same table, one with `fetchDeletedRows` set to `true` and another one set to `false`, the planner may merge those TS operators and take a single one of them.
   Maybe you can try to reproduce it with a custom test and add it to the patch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 579697)
    Time Spent: 20m  (was: 10m)

> Incremental Materialized view refresh in presence of update/delete operations
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-24854
>                 URL: https://issues.apache.org/jira/browse/HIVE-24854
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current implementation of incremental Materialized can not be used if any of the Materialized view source tables has update or delete operation since the last rebuild. In such cases a full rebuild should be performed.
> Steps to enable incremental rebuild:
> 1. Introduce a new virtual column to mark a row deleted
> 2. Execute the query in the view definition 
> 2.a. Add filter to each table scan in order to pull only the rows from each source table which has a higher writeId than the writeId of the last rebuild - this is already implemented by current incremental rebuild
> 2.b Add row is deleted virtual column to each table scan. In join nodes if any of the branches has a deleted row the result row is also deleted.
> We should distinguish two type of view definition queries: with and without Aggregate.
> 3.a No aggregate path:
> Rewrite the plan of the full rebuild to a multi insert statement with two insert branches. One branch to insert new rows into the materialized view table and the second one for insert deleted rows to the materialized view delete delta.
> 3.b Aggregate path: TBD
> Prerequisite:
> source tables haven't compacted since the last MV revuild



--
This message was sent by Atlassian Jira
(v8.3.4#803005)