You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/27 03:48:09 UTC

[GitHub] [doris] EmmyMiao87 opened a new pull request, #10437: [fix](vectorized) Support outer join for vectorized exec engine

EmmyMiao87 opened a new pull request, #10437:
URL: https://github.com/apache/doris/pull/10437

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   This code mainly merges the contents of two PRs:
   1.  [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323)
   2. [Fix](Join) Fix the bug of outer join function under vectorization #9954
   
   The following is the specific description of the first PR
   In a vectorized scenario, the query plan will generate a new tuple for the join node.
   This tuple mainly describes the output schema of the join node.
   Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema.
   For example:
   1. The case where the null side column caused by outer join is converted to nullable.
   2. The projection of the outer tuple.
   
   The following is the specific description of the second PR
   This pr mainly fixes the following problems:
   1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function.
   2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage.
   
   ### About 1:
   For example:
   ```
   select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1
   ```
   At this time, the nullable property of column k1 in the `tmp` inline view should be true.
   
   In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the **correctness** of the column nullable property of this tableRef is very important.
   In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable.
   
   In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness.
   That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result.
   
   ### About 2
   The vectorized nullable attribute requirements are very strict. 
   Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. 
   Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. 
   So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. 
   (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.)
   
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   4. Has document been added or modified: (Yes/No/No Need)
   5. Does it need to update dependencies: (Yes/No)
   6. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman closed pull request #10437: [fix](vectorized) Support outer join for vectorized exec engine

Posted by GitBox <gi...@apache.org>.
morningman closed pull request #10437: [fix](vectorized) Support outer join for vectorized exec engine
URL: https://github.com/apache/doris/pull/10437


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org