You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ya...@apache.org on 2020/08/24 00:50:36 UTC

[spark] branch branch-3.0 updated: [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new da60de5  [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function
da60de5 is described below

commit da60de563a92bb85902681fb0569b43bbc489559
Author: Huaxin Gao <hu...@us.ibm.com>
AuthorDate: Mon Aug 24 09:43:41 2020 +0900

    [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function
    
    # What changes were proposed in this pull request?
    There are two types of TVF. We only documented one type. Adding the doc for the 2nd type.
    
    ### Why are the changes needed?
    complete Table-valued Function doc
    
    ### Does this PR introduce _any_ user-facing change?
    <img width="1099" alt="Screen Shot 2020-08-06 at 5 30 25 PM" src="https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png">
    
    <img width="1100" alt="Screen Shot 2020-08-06 at 5 30 49 PM" src="https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png">
    
    <img width="1101" alt="Screen Shot 2020-08-06 at 5 31 19 PM" src="https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png">
    
    <img width="1100" alt="Screen Shot 2020-08-06 at 5 31 40 PM" src="https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png">
    
    ### How was this patch tested?
    Manually build and check
    
    Closes #29355 from huaxingao/tvf.
    
    Authored-by: Huaxin Gao <hu...@us.ibm.com>
    Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
    (cherry picked from commit db74fd0d3320f120540133094a9975963941b98c)
    Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
---
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ++++++++++++++++++++++++++++-------
 1 file changed, 80 insertions(+), 19 deletions(-)

diff --git a/docs/sql-ref-syntax-qry-select-tvf.md b/docs/sql-ref-syntax-qry-select-tvf.md
index cc8d7c34..b04e2f5 100644
--- a/docs/sql-ref-syntax-qry-select-tvf.md
+++ b/docs/sql-ref-syntax-qry-select-tvf.md
@@ -21,28 +21,14 @@ license: |
 
 ### Description
 
-A table-valued function (TVF) is a function that returns a relation or a set of rows.
-
-### Syntax
-
-```sql
-function_name ( expression [ , ... ] ) [ table_alias ]
-```
-
-### Parameters
-
-* **expression**
-
-    Specifies a combination of one or more values, operators and SQL functions that results in a value.
-
-* **table_alias**
-
-    Specifies a temporary name with an optional column name list.
-
-    **Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]`
+A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL:
+1. a TVF that can be specified in a FROM clause, e.g. range;
+2. a TVF that can be specified in SELECT/LATERAL VIEW clauses, e.g. explode.
 
 ### Supported Table-valued Functions
 
+#### TVFs that can be specified in a FROM clause:
+
 |Function|Argument Type(s)|Description|
 |--------|----------------|-----------|
 |**range** ( *end* )|Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from 0 to *end* (exclusive) with step value 1.|
@@ -50,6 +36,20 @@ function_name ( expression [ , ... ] ) [ table_alias ]
 |**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value.|
 |**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.|
 
+#### TVFs that can be specified in SELECT/LATERAL VIEW clauses:
+
+|Function|Argument Type(s)|Description|
+|--------|----------------|-----------|
+|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.|
+|**explode_outer** <br> ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.|
+|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.|
+|**inline_outer** <br> ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.|
+|**posexplode** <br> ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows with positions, or the elements of map *expr* into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map.|
+|**posexplode_outer** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows with positions, or the elements of map *expr* into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map.|
+|**stack** ( *n, expr1, ..., exprk* )|Seq[Expression]|Separates *expr1, ..., exprk* into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.|
+|**json_tuple** <br> ( *jsonStr, p1, p2, ..., pn* )|Seq[Expression]|Returns a tuple like the function *get_json_object*, but it takes multiple names. All the input parameters and output column types are string.|
+|**parse_url** <br> ( *url, partToExtract[, key]* )|Seq[Expression]|Extracts a part from a URL.|
+
 ### Examples
 
 ```sql
@@ -98,8 +98,69 @@ SELECT * FROM range(5, 8) AS test;
 |  6|
 |  7|
 +---+
+
+SELECT explode(array(10, 20));
++---+
+|col|
++---+
+| 10|
+| 20|
++---+
+
+SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
++----+----+
+|col1|col2|
++----+----+
+|   1|   a|
+|   2|   b|
++----+----+
+
+SELECT posexplode(array(10,20));
++---+---+
+|pos|col|
++---+---+
+|  0| 10|
+|  1| 20|
++---+---+
+
+SELECT stack(2, 1, 2, 3);
++----+----+
+|col0|col1|
++----+----+
+|   1|   2|
+|   3|null|
++----+----+
+
+SELECT json_tuple('{"a":1, "b":2}', 'a', 'b');
++---+---+
+| c0| c1|
++---+---+
+|  1|  2|
++---+---+
+
+SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST');
++-----------------------------------------------------+
+|parse_url(http://spark.apache.org/path?query=1, HOST)|
++-----------------------------------------------------+
+|                                     spark.apache.org|
++-----------------------------------------------------+
+
+-- Use explode in a LATERAL VIEW clause
+CREATE TABLE test (c1 INT);
+INSERT INTO test VALUES (1);
+INSERT INTO test VALUES (2);
+SELECT * FROM test LATERAL VIEW explode (ARRAY(3,4)) AS c2;
++--+--+
+|c1|c2|
++--+--+
+| 1| 3|
+| 1| 4|
+| 2| 3|
+| 2| 4|
++--+--+
 ```
 
 ### Related Statements
 
 * [SELECT](sql-ref-syntax-qry-select.html)
+* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org