You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/07/08 14:24:00 UTC

[jira] [Work logged] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

     [ https://issues.apache.org/jira/browse/HIVE-23817?focusedWorklogId=456166&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456166 ]

ASF GitHub Bot logged work on HIVE-23817:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jul/20 14:23
            Start Date: 08/Jul/20 14:23
    Worklog Time Spent: 10m 
      Work Description: zeroflag opened a new pull request #1228:
URL: https://github.com/apache/hive/pull/1228


   ## NOTICE (work in progress)
   
   ### Pushing the TopNKey operator through PK-FK inner joins.
   
   Example: 
   
   Customer table:
   
   ID (PK) | LAST_NAME
   -- | --
   1 | Robinson
   2 | Jones
   3 | Smith
   4 | Heisenberg
   
   Order table:
   
   CUSTOMER_ID (FK) | AMOUNT
   -- | --
   1 | 100
   1 | 50
   2 | 200
   3 | 30
   3 | 40
   
   #### Requirements for doing TopN Key pushdown.
   
   * The PRIMARY KEY constraint on Customer.ID that forbids NULL and duplicate values.
   * The NOT_NULL constraint on Order.CUSTOMER_ID that forbids NULL values.
   * Plus the FOREIGN KEY constraint between Customer.ID and Order.CUSTOMER_ID ensures that exactly one row exists in the Customer table for any given row in the Order table.
   
   In general if the first n of the order by columns are coming from the child table (FK) then we can copy the TopNKey operator with the first n columns and put it before the join. If all columns are coming from the child table we can move the TopNKey operator without keeping the original.
   
   ```
   SELECT * FROM Customer, Order 
   WHERE Customer.ID = Order.CUSTOMER_ID 
   ORDER BY Order.AMOUNT, [Order.*], [Customer.*] LIMIT 3;
   ```
   
   Result:
   
   CUSTOMER.ID (PK) | CUSTOMER.LAST_NAME | ORDER.AMOUNT
   -- | -- | --
   3 | Smith | 30
   3 | Smith | 40
   1 | Robinson | 50
   1 | Robinson | 100
   2 | Jones | 200
   
   Plan
   
   ```
   Top N Key Operator
             sort order: +
             keys: ORDER.AMOUNT, [ORDER.*]
             top n: 3
     Select Operator (Order)
     [...]
     Join
       [...]
        Top N Key Operator
             sort order: +
             keys: ORDER.AMOUNT, [ORDER.*], [Customer.*]
             top n: 3
    
   ```
   
   #### Implementation notes
   
   PkFk join information is extracted on the calcite side and it is attached (child table index & name) to the AST as a query hint.
   At the physical plan level we make use of this information to decide if we can push through the topn key operator. We also need to get the origins of the columns (in the order by) to see if they're coming from the child table.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 456166)
    Remaining Estimate: 0h
            Time Spent: 10m

> Pushing TopN Key operator PKFK inner joins
> ------------------------------------------
>
>                 Key: HIVE-23817
>                 URL: https://issues.apache.org/jira/browse/HIVE-23817
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If there is primary key foreign key relationship between the tables we can push the topnkey operator through the join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)