You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/09/25 10:35:45 UTC

[GitHub] [incubator-doris] EmmyMiao87 opened a new issue #6746: [Feature] Lateral view

EmmyMiao87 opened a new issue #6746:
URL: https://github.com/apache/incubator-doris/issues/6746


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   The functions in SQL can be divided into the following categories according to the number of input and output rows:
   
   1. Scalar function
   2. Aggregate function
   3. Table Function: One line of input corresponds to multiple lines of output.
   
   Doris currently supports the first two functions, but does not support the third.
   
   Table Function is usually used in operations such as column to row. For example, the following table data:
   
   | id | names |
   |---|---|
   | 1 | a, b, c |
   | 2 | d, e, f |
   
   If you want to output multiple comma-separated data in the names column in rows as follows:
   
   | id | name |
   |---|---|
   | 1 | a |
   | 1 | b |
   | 1 | c |
   | 2 | d |
   | 2 | e |
   | 2 | f |
   
   You need to use the Table Function, which converts one row of data into multiple rows.
   
   Common Table Function, such as explode in hive, or unnest in presto, can convert array or list types into multi-line output. like:
   
   `select explode(array_col1) from tbl;`
   
   # Design
   
   ## Syntax
   
   ```
   lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)
   fromClause: FROM baseTable (lateralView)
   ```
   
   ### Multiple Lateral Views
   
   A FROM clause can have multiple LATERAL VIEW clauses. Subsequent LATERAL VIEWS can reference columns from any of the tables appearing to the left of the LATERAL VIEW.
   
   For example, the following could be a valid query:
   
   ```
   SELECT * FROM exampleTable
   LATERAL VIEW explode_split(col1, ",") myTable1 AS myCol1
   LATERAL VIEW explode(col2, ",") myTable2 AS myCol2;
   ```
   
   ## TableFunctionNode
   
   The result of Table Function is multiple rows, which can also be regarded as a relational table. So we need to implement a new TableFunctionNode. It contains the TableFunction that needs to be processed. For each row of TableFunctionNode input, execute the corresponding TableFunction, and then perform the correlation operation and output.
   
   ```
   MySQL [test]> explain select k1, e1, e2 from test_explode lateral view explode_split (col1, ",") tmp as e1  lateral view explode_split (col2, ",") tmp as e2;
   +---------------------------------------------------------------------------+
   | Explain String                                                            |
   +---------------------------------------------------------------------------+
   | PLAN FRAGMENT 0                                                           |
   |  OUTPUT EXPRS:`k1` | `e1` | 'e2'                                                 |
   |                                                                           |
   |   RESULT SINK                                                             |
   |                                                                           |
   |   1:TABLE FUNCTION NODE                                                   |
   |   |  table function: explode_split(`col1`, ','), explode_split('col2', ',')                             |
   |   |                                                                       |
   |   0:OlapScanNode                                                          |
   |      TABLE: test_explode                                                  |
   +---------------------------------------------------------------------------+
   ```
   
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #6746: [Feature] Lateral view

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #6746:
URL: https://github.com/apache/incubator-doris/issues/6746#issuecomment-982366698


   ## Execution of Table Function Node
   
   Table Function Node (TFN) contains one or more Table Functions, and its main logic is to expand the data received from the child nodes into multiple rows through the Table Function and return the data to the upper layer. The main execution process is as follows:
   
   1. Get a row of data from the child node child row.
   2. Pass the child row into each table function, and each table function will calculate and get a result set: S1, S2,...
   3. Do the Cartesian product of child row and each result set and send it to the upper layer.
   
   for example. Suppose the child row has 3 columns, k1, v1, v2:
   
   | k1 | v1 | v2 |
   |---|---|---|
   | 1 | "a,b,c" | "4,5,6" |
   
   Two Table Functions: `explod_split(v1,',')` and `explode_split(v2,',')` respectively produce the following result sets:
   
   | `explod_split(v1,',')` |
   |---|
   | "a" |
   | "b" |
   | "c" |
   
   | `explode_split(v2,',')` |
   |---|
   | "4" |
   | "5" |
   | "6" |
   
   The final Cartesian product result is:
   
   | k1 | `explod_split(v1,',')` | `explode_split(v2,',')` |
   |---|---|---|
   | 1 | "a" | "4" |
   | 1 | "a" | "5" |
   | 1 | "a" | "6" |
   | 1 | "b" | "4" |
   | 1 | "b" | "5" |
   | 1 | "b" | "6" |
   | 1 | "c" | "4" |
   | 1 | "c" | "5" |
   | 1 | "c" | "6" |
   
   ### Table Function Interface Design
   
   Because Doris does not currently support complex data types (such as Array), and Table Function is essentially an expression that returns an array type. So in this implementation, we will do special treatment for Table Function.
   
   1. DummyTableFunctions
   
       This is a deception class. Its main purpose is to generate the scalar function signature of the table function on the BE side to facilitate query planning on the FE side, and to use the existing scalar function framework when the BE performs parameter expression calculations. In other words, in the planning and execution preparation stages of the entire query, Table Function is treated as a scalar function.
   
   2. TableFunctionFactory
   
       The factory class of Table Function returns real Table Function instances based on the function name. Currently only supports matching Function by function name.
       
   3. TableFunction
   
       The actual Table Function implementation class. Provide the following interfaces:
       
       1. prepare()/open()
       
           Some preparations, such as calculation of constant expressions, memory allocation for intermediate result sets, and so on.
       
       3. process(row)
   
           According to the incoming data (row), calculate the Table Function result set.
       
       4. reset()
   
           Because of the Cartesian product relationship between multiple Table Functions, all the result sets of a Function may be traversed multiple times. This method will set the cursor of the result set to the initial position in order to continue the traversal.
       
       5. get_value()
   
           Get the value of the position pointed by the current cursor.
       
       6. forward()
   
           Move the cursor forward, then you can call get_value() to get the next value
       
       6. close()
   
           The cleanup work after Function execution.
           
       The subclasses of TableFunction are concrete implementations of each Table Function. The following three functions are implemented in this issue:
       
       1. `explode_split(str, delimiter)`
   
           Split str into multiple strings according to delimiter.
           
       2. `explode_json_array_xxx(json_str)`
   
           Split a json array. According to the type of elements in the json array, xxx can be string, int or double
           
       3. `explode_bitmap(bitmap)`
   
           Expand a bitmap and return the value of each element in the bitmap.
           
   ### Table Function Node Interface Design
   
    Table Function Node inherits from Exec Node. There are the following interfaces:
    
    1. init()
   
       Some initialization work, including obtaining Table Function objects, etc.
       
   2. prepare()/open()
   
       Some preparations. For example, prepare()/open() of the call expression
       
   3. get_next()
   
       Get a batch of results. Here, get_next() of the child node will be called to get the child node data first, then calculate the result of the Table Function, and return the data after the association.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on issue #6746: [Feature] Lateral view

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on issue #6746:
URL: https://github.com/apache/incubator-doris/issues/6746#issuecomment-966963086


   TODO:
   1. table function from subquery 
   ```
   select * from (select k1, bitmap_union_string(v1) as new_v1 from table) lateral view explode_split(new_v1) t as tb_v1;
   ```
   2. int result type of explode_split(string, ",")
   3. reduce memory copy in output tuple of table function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg closed issue #6746: [Feature] Lateral view

Posted by GitBox <gi...@apache.org>.
yangzhg closed issue #6746:
URL: https://github.com/apache/incubator-doris/issues/6746


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg closed issue #6746: [Feature] Lateral view

Posted by GitBox <gi...@apache.org>.
yangzhg closed issue #6746:
URL: https://github.com/apache/incubator-doris/issues/6746


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org