You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/11/30 07:45:17 UTC

[GitHub] [incubator-doris] morningman commented on issue #6746: [Feature] Lateral view

morningman commented on issue #6746:
URL: https://github.com/apache/incubator-doris/issues/6746#issuecomment-982366698


   ## Execution of Table Function Node
   
   Table Function Node (TFN) contains one or more Table Functions, and its main logic is to expand the data received from the child nodes into multiple rows through the Table Function and return the data to the upper layer. The main execution process is as follows:
   
   1. Get a row of data from the child node child row.
   2. Pass the child row into each table function, and each table function will calculate and get a result set: S1, S2,...
   3. Do the Cartesian product of child row and each result set and send it to the upper layer.
   
   for example. Suppose the child row has 3 columns, k1, v1, v2:
   
   | k1 | v1 | v2 |
   |---|---|---|
   | 1 | "a,b,c" | "4,5,6" |
   
   Two Table Functions: `explod_split(v1,',')` and `explode_split(v2,',')` respectively produce the following result sets:
   
   | `explod_split(v1,',')` |
   |---|
   | "a" |
   | "b" |
   | "c" |
   
   | `explode_split(v2,',')` |
   |---|
   | "4" |
   | "5" |
   | "6" |
   
   The final Cartesian product result is:
   
   | k1 | `explod_split(v1,',')` | `explode_split(v2,',')` |
   |---|---|---|
   | 1 | "a" | "4" |
   | 1 | "a" | "5" |
   | 1 | "a" | "6" |
   | 1 | "b" | "4" |
   | 1 | "b" | "5" |
   | 1 | "b" | "6" |
   | 1 | "c" | "4" |
   | 1 | "c" | "5" |
   | 1 | "c" | "6" |
   
   ### Table Function Interface Design
   
   Because Doris does not currently support complex data types (such as Array), and Table Function is essentially an expression that returns an array type. So in this implementation, we will do special treatment for Table Function.
   
   1. DummyTableFunctions
   
       This is a deception class. Its main purpose is to generate the scalar function signature of the table function on the BE side to facilitate query planning on the FE side, and to use the existing scalar function framework when the BE performs parameter expression calculations. In other words, in the planning and execution preparation stages of the entire query, Table Function is treated as a scalar function.
   
   2. TableFunctionFactory
   
       The factory class of Table Function returns real Table Function instances based on the function name. Currently only supports matching Function by function name.
       
   3. TableFunction
   
       The actual Table Function implementation class. Provide the following interfaces:
       
       1. prepare()/open()
       
           Some preparations, such as calculation of constant expressions, memory allocation for intermediate result sets, and so on.
       
       3. process(row)
   
           According to the incoming data (row), calculate the Table Function result set.
       
       4. reset()
   
           Because of the Cartesian product relationship between multiple Table Functions, all the result sets of a Function may be traversed multiple times. This method will set the cursor of the result set to the initial position in order to continue the traversal.
       
       5. get_value()
   
           Get the value of the position pointed by the current cursor.
       
       6. forward()
   
           Move the cursor forward, then you can call get_value() to get the next value
       
       6. close()
   
           The cleanup work after Function execution.
           
       The subclasses of TableFunction are concrete implementations of each Table Function. The following three functions are implemented in this issue:
       
       1. `explode_split(str, delimiter)`
   
           Split str into multiple strings according to delimiter.
           
       2. `explode_json_array_xxx(json_str)`
   
           Split a json array. According to the type of elements in the json array, xxx can be string, int or double
           
       3. `explode_bitmap(bitmap)`
   
           Expand a bitmap and return the value of each element in the bitmap.
           
   ### Table Function Node Interface Design
   
    Table Function Node inherits from Exec Node. There are the following interfaces:
    
    1. init()
   
       Some initialization work, including obtaining Table Function objects, etc.
       
   2. prepare()/open()
   
       Some preparations. For example, prepare()/open() of the call expression
       
   3. get_next()
   
       Get a batch of results. Here, get_next() of the child node will be called to get the child node data first, then calculate the result of the Table Function, and return the data after the association.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org