You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/03/11 23:54:09 UTC
[spark] branch branch-3.0 updated: [SPARK-31110][DOCS][SQL] refine sql doc for SELECT

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new ffcc4a2  [SPARK-31110][DOCS][SQL] refine sql doc for SELECT
ffcc4a2 is described below

commit ffcc4a27041abe97991f4bd14d0b5abf3c50a542
Author: Wenchen Fan <we...@databricks.com>
AuthorDate: Wed Mar 11 16:52:40 2020 -0700

    [SPARK-31110][DOCS][SQL] refine sql doc for SELECT
    
    ### What changes were proposed in this pull request?
    
    A few improvements to the sql ref SELECT doc:
    1. correct the syntax of SELECT query
    2. correct the default of null sort order
    3. correct the GROUP BY syntax
    4. several minor fixes
    
    ### Why are the changes needed?
    
    refine document
    
    ### Does this PR introduce any user-facing change?
    
    N/A
    
    ### How was this patch tested?
    
    N/A
    
    Closes #27866 from cloud-fan/doc.
    
    Authored-by: Wenchen Fan <we...@databricks.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit 0f0ccdadb123d5839c34244e25a4ee17dde0fcdc)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 docs/sql-ref-syntax-qry-select-clusterby.md     | 18 ++++----
 docs/sql-ref-syntax-qry-select-distribute-by.md | 18 ++++----
 docs/sql-ref-syntax-qry-select-groupby.md       | 48 ++++++++++-----------
 docs/sql-ref-syntax-qry-select-having.md        | 12 +++---
 docs/sql-ref-syntax-qry-select-limit.md         | 23 +++++++----
 docs/sql-ref-syntax-qry-select-orderby.md       | 24 +++++------
 docs/sql-ref-syntax-qry-select-sortby.md        | 28 ++++++-------
 docs/sql-ref-syntax-qry-select-where.md         | 10 ++---
 docs/sql-ref-syntax-qry-select.md               | 55 ++++++++++++++-----------
 docs/sql-ref-syntax-qry.md                      |  8 ++--
 10 files changed, 126 insertions(+), 118 deletions(-)

diff --git a/docs/sql-ref-syntax-qry-select-clusterby.md b/docs/sql-ref-syntax-qry-select-clusterby.md
index bb60e8b..8f1dc59 100644
--- a/docs/sql-ref-syntax-qry-select-clusterby.md
+++ b/docs/sql-ref-syntax-qry-select-clusterby.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -41,20 +41,20 @@ CLUSTER BY { expression [ , ... ] }
 ### Examples
 {% highlight sql %}
 CREATE TABLE person (name STRING, age INT);
-INSERT INTO person VALUES 
-    ('Zen Hui', 25), 
-    ('Anil B', 18), 
-    ('Shone S', 16), 
+INSERT INTO person VALUES
+    ('Zen Hui', 25),
+    ('Anil B', 18),
+    ('Shone S', 16),
     ('Mike A', 25),
-    ('John A', 18), 
+    ('John A', 18),
     ('Jack N', 16);
 
 -- Reduce the number of shuffle partitions to 2 to illustrate the behavior of `CLUSTER BY`.
 -- It's easier to see the clustering and sorting behavior with less number of partitions.
 SET spark.sql.shuffle.partitions = 2;
-                        
+
 -- Select the rows with no ordering. Please note that without any sort directive, the results
--- of the query is not deterministic. It's included here to show the difference in behavior 
+-- of the query is not deterministic. It's included here to show the difference in behavior
 -- of a query when `CLUSTER BY` is not used vs when it's used. The query below produces rows
 -- where age column is not sorted.
 SELECT age, name FROM person;
diff --git a/docs/sql-ref-syntax-qry-select-distribute-by.md b/docs/sql-ref-syntax-qry-select-distribute-by.md
index 5ade9c1..957df9c 100644
--- a/docs/sql-ref-syntax-qry-select-distribute-by.md
+++ b/docs/sql-ref-syntax-qry-select-distribute-by.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -20,7 +20,7 @@ license: |
 ---
 The <code>DISTRIBUTE BY</code> clause is used to repartition the data based
 on the input expressions. Unlike the [CLUSTER BY](sql-ref-syntax-qry-select-clusterby.html)
-clause, this does not sort the data within each partition. 
+clause, this does not sort the data within each partition.
 
 ### Syntax
 {% highlight sql %}
@@ -39,19 +39,19 @@ DISTRIBUTE BY { expression [ , ... ] }
 {% highlight sql %}
 CREATE TABLE person (name STRING, age INT);
 INSERT INTO person VALUES
-    ('Zen Hui', 25), 
-    ('Anil B', 18), 
-    ('Shone S', 16), 
+    ('Zen Hui', 25),
+    ('Anil B', 18),
+    ('Shone S', 16),
     ('Mike A', 25),
-    ('John A', 18), 
+    ('John A', 18),
     ('Jack N', 16);
 
 -- Reduce the number of shuffle partitions to 2 to illustrate the behavior of `DISTRIBUTE BY`.
 -- It's easier to see the clustering and sorting behavior with less number of partitions.
 SET spark.sql.shuffle.partitions = 2;
-                        
+
 -- Select the rows with no ordering. Please note that without any sort directive, the result
--- of the query is not deterministic. It's included here to just contrast it with the 
+-- of the query is not deterministic. It's included here to just contrast it with the
 -- behavior of `DISTRIBUTE BY`. The query below produces rows where age columns are not
 -- clustered together.
 SELECT age, name FROM person;
diff --git a/docs/sql-ref-syntax-qry-select-groupby.md b/docs/sql-ref-syntax-qry-select-groupby.md
index ab1c5d6..49a11ca 100644
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@@ -9,42 +9,43 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ---
-The <code>GROUP BY</code> clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on 
-the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple 
+The <code>GROUP BY</code> clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on
+the group of rows based on one or more specified aggregate functions. Spark also supports advanced aggregations to do multiple
 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, `ROLLUP` clauses.
 
 ### Syntax
 {% highlight sql %}
-GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [ , ... ] ]
-    [ ( WITH ROLLUP | WITH CUBE | GROUPING SETS grouping_sets ) ) ]
+GROUP BY group_expression [ , group_expression [ , ... ] ]
+  [ { WITH ROLLUP | WITH CUBE | GROUPING SETS (grouping_set [ , ...]) } ]
+
+GROUP BY GROUPING SETS (grouping_set [ , ...])
 {% endhighlight %}
 
 ### Parameters
 <dl>
   <dt><code><em>GROUPING SETS</em></code></dt>
   <dd>
-    Groups the rows for each subset of the expressions specified in the grouping sets. For example, 
+    Groups the rows for each subset of the expressions specified in the grouping sets. For example,
     <code>GROUP BY GROUPING SETS (warehouse, product)</code> is semantically equivalent
     to union of results of <code>GROUP BY warehouse</code> and <code>GROUP BY product</code>. This clause
-    is shorthand for a <code>UNION ALL</code> where each leg of the <code>UNION ALL</code> 
+    is a shorthand for a <code>UNION ALL</code> where each leg of the <code>UNION ALL</code>
     operator performs aggregation of subset of the columns specified in the <code>GROUPING SETS</code> clause.
   </dd>
-  <dt><code><em>grouping_sets</em></code></dt>
+  <dt><code><em>grouping_set</em></code></dt>
   <dd>
-    Specifies one of more groupings based on which the <code>GROUP BY</code> clause performs aggregations. A grouping
-    set is specified by a list of comma-separated expressions in parentheses.<br><br>
+    A grouping set is specified by zero or more comma-separated expressions in parentheses.<br><br>
     <b>Syntax:</b>
       <code>
-        ( () | ( expression [ , ... ] ) )
+        ([expression [, ...]])
       </code>
   </dd>
   <dt><code><em>grouping_expression</em></code></dt>
@@ -55,17 +56,18 @@ GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [
   </dd>
   <dt><code><em>ROLLUP</em></code></dt>
   <dd>
-    Specifies multiple levels of aggregations in a single statement. This clause is used to compute aggregations 
-    based on multiple grouping sets. <code>ROLLUP</code> is shorthand for <code>GROUPING SETS</code>. For example,
-    GROUP BY warehouse, product  WITH ROLLUP is equivalent to GROUP BY <code>warehouse, product</code> GROUPING SETS
-    <code> ((warehouse, product), (warehouse), ())</code>.
+    Specifies multiple levels of aggregations in a single statement. This clause is used to compute aggregations
+    based on multiple grouping sets. <code>ROLLUP</code> is a shorthand for <code>GROUPING SETS</code>. For example,
+    <code>GROUP BY warehouse, product WITH ROLLUP</code> is equivalent to <code>GROUP BY GROUPING SETS
+    ((warehouse, product), (warehouse), ())</code>.
     The N elements of a <code>ROLLUP</code> specification results in N+1 <code>GROUPING SETS</code>.
   </dd>
   <dt><code><em>CUBE</em></code></dt>
   <dd>
-    <code>CUBE</code> clause is used to perform aggregations based on combination of grouping columns specified in the 
-    <code>GROUP BY</code> clause. For example, <code>GROUP BY warehouse, product  WITH CUBE</code> is equivalent 
-    to GROUP BY <code>warehouse, product</code> GROUPING SETS <code>((warehouse, product), (warehouse), (product), ())</code>.
+    <code>CUBE</code> clause is used to perform aggregations based on combination of grouping columns specified in the
+    <code>GROUP BY</code> clause. <code>CUBE</code> is a shorthand for <code>GROUPING SETS</code>. For example,
+    <code>GROUP BY warehouse, product WITH CUBE</code> is equivalent to <code>GROUP BY GROUPING SETS
+    ((warehouse, product), (warehouse), (product), ())</code>.
     The N elements of a <code>CUBE</code> specification results in 2^N <code>GROUPING SETS</code>.
   </dd>
 </dl>
@@ -73,7 +75,7 @@ GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [
 ### Examples
 {% highlight sql %}
 CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
-INSERT INTO dealer VALUES 
+INSERT INTO dealer VALUES
     (100, 'Fremont', 'Honda Civic', 10),
     (100, 'Fremont', 'Honda Accord', 15),
     (100, 'Fremont', 'Honda CRV', 7),
@@ -107,7 +109,7 @@ SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;
 
 -- Multiple aggregations.
 -- 1. Sum of quantity per dealership.
--- 2. Max quantity per dealership. 
+-- 2. Max quantity per dealership.
 SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id ORDER BY id;
 
   +---+---+---+
@@ -148,7 +150,7 @@ SELECT city, car_model, sum(quantity) AS sum FROM dealer
   |San Jose|Honda Civic |5  |
   +--------+------------+---+
 
--- Alternate syntax for `GROUPING SETS` in which both `GROUP BY` and `GROUPING SETS` 
+-- Alternate syntax for `GROUPING SETS` in which both `GROUP BY` and `GROUPING SETS`
 -- specifications are present.
 SELECT city, car_model, sum(quantity) AS sum FROM dealer
    GROUP BY city, car_model GROUPING SETS ((city, car_model), (city), (car_model), ())
@@ -200,7 +202,7 @@ SELECT city, car_model, sum(quantity) AS sum FROM dealer
 -- Group by processing with `CUBE` clause.
 -- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
 SELECT city, car_model, sum(quantity) AS sum FROM dealer
-   GROUP BY city, car_model WITH CUBE 
+   GROUP BY city, car_model WITH CUBE
    ORDER BY city, car_model;
 
   +--------+------------+---+
diff --git a/docs/sql-ref-syntax-qry-select-having.md b/docs/sql-ref-syntax-qry-select-having.md
index 94d9be6..dee1e3c 100644
--- a/docs/sql-ref-syntax-qry-select-having.md
+++ b/docs/sql-ref-syntax-qry-select-having.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -33,7 +33,7 @@ HAVING boolean_expression
   <dt><code><em>boolean_expression</em></code></dt>
   <dd>
     Specifies any expression that evaluates to a result type <code>boolean</code>. Two or
-    more expressions may be combined together using the logical 
+    more expressions may be combined together using the logical
     operators ( <code>AND</code>, <code>OR</code> ).<br><br>
 
     <b>Note</b><br>
@@ -70,7 +70,7 @@ SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING city = 'Fremo
 
 -- `HAVING` clause referring to aggregate function.
 SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING sum(quantity) > 15;
- 
+
   +-------+---+
   |   city|sum|
   +-------+---+
@@ -100,7 +100,7 @@ SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING max(quantity)
 
 -- `HAVING` clause referring to constant expression.
 SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING 1 > 0 ORDER BY city;
-  
+
   +--------+---+
   |    city|sum|
   +--------+---+
@@ -116,7 +116,7 @@ SELECT sum(quantity) AS sum FROM dealer HAVING sum(quantity) > 10;
   +---+
   | 78|
   +---+
- 
+
 {% endhighlight %}
 
 ### Related Clauses
diff --git a/docs/sql-ref-syntax-qry-select-limit.md b/docs/sql-ref-syntax-qry-select-limit.md
index 06925e6..356930c 100644
--- a/docs/sql-ref-syntax-qry-select-limit.md
+++ b/docs/sql-ref-syntax-qry-select-limit.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -37,7 +37,7 @@ LIMIT { ALL | integer_expression }
   </dd>
   <dt><code><em>integer_expression</em></code></dt>
   <dd>
-    Specifies an expression that returns an integer. 
+    Specifies a foldable expression that returns an integer.
   </dd>
 </dl>
 
@@ -45,13 +45,13 @@ LIMIT { ALL | integer_expression }
 {% highlight sql %}
 CREATE TABLE person (name STRING, age INT);
 INSERT INTO person VALUES
-    ('Zen Hui', 25), 
-    ('Anil B', 18), 
-    ('Shone S', 16), 
+    ('Zen Hui', 25),
+    ('Anil B', 18),
+    ('Shone S', 16),
     ('Mike A', 25),
-    ('John A', 18), 
+    ('John A', 18),
     ('Jack N', 16);
-                        
+
 -- Select the first two rows.
 SELECT name, age FROM person ORDER BY name LIMIT 2;
 
@@ -76,7 +76,7 @@ SELECT name, age FROM person ORDER BY name LIMIT ALL;
   |Zen Hui|25 |
   +-------+---+
 
--- A function expression as an input to limit.
+-- A function expression as an input to LIMIT.
 SELECT name, age FROM person ORDER BY name LIMIT length('SPARK')
 
   +-------+---+
@@ -88,6 +88,11 @@ SELECT name, age FROM person ORDER BY name LIMIT length('SPARK')
   | Mike A| 25|
   |Shone S| 16|
   +-------+---+
+
+-- A non-foldable expression as an input to LIMIT is not allowed.
+SELECT name, age FROM person ORDER BY name LIMIT length(name)
+
+org.apache.spark.sql.AnalysisException: The limit expression must evaluate to a constant value ...
 {% endhighlight %}
 
 ### Related Clauses
diff --git a/docs/sql-ref-syntax-qry-select-orderby.md b/docs/sql-ref-syntax-qry-select-orderby.md
index c4b4ced..eb99dbb 100644
--- a/docs/sql-ref-syntax-qry-select-orderby.md
+++ b/docs/sql-ref-syntax-qry-select-orderby.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -20,7 +20,7 @@ license: |
 ---
 The <code>ORDER BY</code> clause is used to return the result rows in a sorted manner
 in the user specified order. Unlike the [SORT BY](sql-ref-syntax-qry-select-sortby.html)
-clause, this clause guarantees a total order in the output. 
+clause, this clause guarantees a total order in the output.
 
 ### Syntax
 {% highlight sql %}
@@ -47,20 +47,18 @@ ORDER BY { expression [ sort_direction | nulls_sort_oder ] [ , ... ] }
   </dd>
   <dt><code><em>nulls_sort_order</em></code></dt>
   <dd>
-    Optionally specifies whether NULL values are returned before/after non-NULL values, based on the 
-    sort direction. In Spark, NULL values are considered to be lower than any non-NULL values by default.
-    Therefore the ordering of NULL values depend on the sort direction. If <code>null_sort_order</code> is
-    not specified, then NULLs sort first if sort order is <code>ASC</code> and NULLS sort last if 
-    sort order is <code>DESC</code>.<br><br>
+    Optionally specifies whether NULL values are returned before/after non-NULL values. If
+    <code>null_sort_order</code> is not specified, then NULLs sort first if sort order is
+    <code>ASC</code> and NULLS sort last if sort order is <code>DESC</code>.<br><br>
     <ol>
-      <li> If <code>NULLS FIRST</code> (the default) is specified, then NULL values are returned first 
+      <li> If <code>NULLS FIRST</code> is specified, then NULL values are returned first
            regardless of the sort order.</li>
       <li>If <code>NULLS LAST</code> is specified, then NULL values are returned last regardless of
            the sort order. </li>
     </ol><br>
     <b>Syntax:</b>
     <code>
-       [ NULLS { FIRST | LAST } ] 
+       [ NULLS { FIRST | LAST } ]
     </code>
   </dd>
 </dl>
@@ -75,7 +73,7 @@ INSERT INTO person VALUES
     (400, 'Jerry', NULL),
     (500, 'Dan',  50);
 
--- Sort rows by age. By default rows are sorted in ascending manner.
+-- Sort rows by age. By default rows are sorted in ascending manner with NULL FIRST.
 SELECT name, age FROM person ORDER BY age;
 
   +-----+----+
@@ -101,9 +99,9 @@ SELECT name, age FROM person ORDER BY age NULLS LAST;
   |Jerry|null|
   +-----+----+
 
--- Sort rows by age in descending manner.
+-- Sort rows by age in descending manner, which defaults to NULL LAST.
 SELECT name, age FROM person ORDER BY age DESC;
- 
+
   +-----+----+
   |name |age |
   +-----+----+
diff --git a/docs/sql-ref-syntax-qry-select-sortby.md b/docs/sql-ref-syntax-qry-select-sortby.md
index 1818a69..9b52738 100644
--- a/docs/sql-ref-syntax-qry-select-sortby.md
+++ b/docs/sql-ref-syntax-qry-select-sortby.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -49,20 +49,18 @@ SORT BY { expression [ sort_direction | nulls_sort_order ] [ , ... ] }
   </dd>
   <dt><code><em>nulls_sort_order</em></code></dt>
   <dd>
-    Optionally specifies whether NULL values are returned before/after non-NULL values, based on the 
-    sort direction. In Spark, NULL values are considered to be lower than any non-NULL values by default.
-    Therefore the ordering of NULL values depend on the sort direction. If <code>null_sort_order</code> is
-    not specified, then NULLs sort first if sort order is <code>ASC</code> and NULLS sort last if 
-    sort order is <code>DESC</code>.<br><br>
+    Optionally specifies whether NULL values are returned before/after non-NULL values. If
+    <code>null_sort_order</code> is not specified, then NULLs sort first if sort order is
+    <code>ASC</code> and NULLS sort last if sort order is <code>DESC</code>.<br><br>
     <ol>
-      <li> If <code>NULLS FIRST</code> (the default) is specified, then NULL values are returned first 
+      <li> If <code>NULLS FIRST</code> is specified, then NULL values are returned first
            regardless of the sort order.</li>
       <li>If <code>NULLS LAST</code> is specified, then NULL values are returned last regardless of
            the sort order. </li>
     </ol><br>
     <b>Syntax:</b>
     <code>
-       [ NULLS { FIRST | LAST } ] 
+       [ NULLS { FIRST | LAST } ]
     </code>
   </dd>
 </dl>
@@ -71,15 +69,15 @@ SORT BY { expression [ sort_direction | nulls_sort_order ] [ , ... ] }
 {% highlight sql %}
 CREATE TABLE person (zip_code INT, name STRING, age INT);
 INSERT INTO person VALUES
-    (94588, 'Zen Hui', 50), 
-    (94588, 'Dan Li', 18), 
+    (94588, 'Zen Hui', 50),
+    (94588, 'Dan Li', 18),
     (94588, 'Anil K', 27),
     (94588, 'John V', NULL),
     (94511, 'David K', 42),
     (94511, 'Aryan B.', 18),
     (94511, 'Lalit B.', NULL);
 
--- Use `REPARTITION` hint to partition the data by `zip_code` to 
+-- Use `REPARTITION` hint to partition the data by `zip_code` to
 -- examine the `SORT BY` behavior. This is used in rest of the
 -- examples.
 
@@ -128,9 +126,9 @@ SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY age
   |null|Lalit B.|94511   |
   +----+--------+--------+
 
--- Sort rows by age within each partition in descending manner.
+-- Sort rows by age within each partition in descending manner, which defaults to NULL LAST.
 SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY age DESC;
- 
+
   +----+--------+--------+
   |age |name    |zip_code|
   +----+--------+--------+
@@ -143,7 +141,7 @@ SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY age
   |null|Lalit B.|94511   |
   +----+--------+--------+
 
--- Sort rows by age within each partition in ascending manner keeping null values to be first.
+-- Sort rows by age within each partition in descending manner keeping null values to be first.
 SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY age DESC NULLS FIRST;
 
   +----+--------+--------+
diff --git a/docs/sql-ref-syntax-qry-select-where.md b/docs/sql-ref-syntax-qry-select-where.md
index a493623..106053d 100644
--- a/docs/sql-ref-syntax-qry-select-where.md
+++ b/docs/sql-ref-syntax-qry-select-where.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -31,7 +31,7 @@ WHERE boolean_expression
   <dt><code><em>boolean_expression</em></code></dt>
   <dd>
     Specifies any expression that evaluates to a result type <code>boolean</code>. Two or
-    more expressions may be combined together using the logical 
+    more expressions may be combined together using the logical
     operators ( <code>AND</code>, <code>OR</code> ).
   </dd>
 </dl>
@@ -99,8 +99,8 @@ SELECT * FROM person WHERE age > (SELECT avg(age) FROM person);
   |300|Mike|80 |
   +---+----+---+
 
--- Correlated column reference in `WHERE` clause of subquery.
-SELECT * FROM person AS parent 
+-- Correlated Subquery in `WHERE` clause.
+SELECT * FROM person AS parent
 WHERE EXISTS (
               SELECT 1 FROM person AS child
               WHERE parent.id = child.id AND child.age IS NULL
diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md
index 80b930f..e87c4a5 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -1,7 +1,7 @@
 ---
 layout: global
 title: SELECT
-displayTitle: SELECT 
+displayTitle: SELECT
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -19,39 +19,43 @@ license: |
   limitations under the License.
 ---
 Spark supports a `SELECT` statement and conforms to the ANSI SQL standard. Queries are
-used to retrieve result sets from one or more tables. The following section 
+used to retrieve result sets from one or more tables. The following section
 describes the overall query syntax and the sub-sections cover different constructs
-of a query along with examples. 
+of a query along with examples.
 
 ### Syntax
 {% highlight sql %}
 [ WITH with_query [ , ... ] ]
+select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
+[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
+[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
+[ CLUSTER BY { expression [ , ...] } ]
+[ DISTRIBUTE BY { expression [, ...] } ]
+[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
+[ LIMIT { ALL | expression } ]
+{% endhighlight %}
+
+While `select_statement` is defined as
+{% highlight sql %}
 SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
   FROM { from_item [ , ...] }
   [ WHERE boolean_expression ]
   [ GROUP BY expression [ , ...] ]
   [ HAVING boolean_expression ]
-  [ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
-  [ SORT  BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
-  [ CLUSTER BY { expression [ , ...] } ]
-  [ DISTRIBUTE BY { expression [, ...] } ]
-  { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
-  [ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
-  [ LIMIT { ALL | expression } ]
 {% endhighlight %}
 
 ### Parameters
 <dl>
   <dt><code><em>with_query</em></code></dt>
   <dd>
-    Specifies the common table expressions (CTEs) before the main <code>SELECT</code> query block.
-    These table expressions are allowed to be referenced later in the main query. This is useful to abstract
-    out repeated subquery blocks in the main query and improves readability of the query.
+    Specifies the common table expressions (CTEs) before the main query block.
+    These table expressions are allowed to be referenced later in the FROM clause. This is useful to abstract
+    out repeated subquery blocks in the FROM clause and improves readability of the query.
   </dd>
   <dt><code><em>hints</em></code></dt>
   <dd>
     Hints can be specified to help spark optimizer make better planning decisions. Currently spark supports hints
-    that influence selection of join strategies and repartitioning of the data. 
+    that influence selection of join strategies and repartitioning of the data.
   </dd>
   <dt><code><em>ALL</em></code></dt>
   <dd>
@@ -77,7 +81,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
       <li>Join relation</li>
       <li>Table valued function</li>
       <li>Inlined table</li>
-      <li>Subquery</li>    
+      <li>Subquery</li>
     </ol>
   </dd>
   <dt><code><em>WHERE</em></code></dt>
@@ -87,12 +91,13 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
   <dt><code><em>GROUP BY</em></code></dt>
   <dd>
     Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions
-    (MIN, MAX, COUNT, SUM, AVG) to group rows based on the grouping expressions.
+    (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping expressions and aggregate values in each group.
   </dd>
   <dt><code><em>HAVING</em></code></dt>
   <dd>
     Specifies the predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to
-    filter rows after the grouping is performed.
+    filter rows after the grouping is performed. If HAVING is specified without GROUP BY, it indicates a GROUP BY
+    without grouping expressions (global aggregate).
   </dd>
   <dt><code><em>ORDER BY</em></code></dt>
   <dd>
@@ -108,17 +113,17 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
   <dt><code><em>CLUSTER BY</em></code></dt>
   <dd>
     Specifies a set of expressions that is used to repartition and sort the rows. Using this clause has
-    the same effect of using <code>DISTRIBUTE BY</code> and <code>SORT BY</code> together. 
+    the same effect of using <code>DISTRIBUTE BY</code> and <code>SORT BY</code> together.
   </dd>
   <dt><code><em>DISTRIBUTE BY</em></code></dt>
   <dd>
-    Specifies a set of expressions by which the result rows are repartitioned. This parameter is mutually 
-    exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can not be specified together. 
+    Specifies a set of expressions by which the result rows are repartitioned. This parameter is mutually
+    exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can not be specified together.
   </dd>
   <dt><code><em>LIMIT</em></code></dt>
   <dd>
-    Specifies the maximum number of rows that can be returned by a statement or subquery. This clause 
-    is mostly used in the conjunction with <code>ORDER BY</code> to produce a deterministic result. 
+    Specifies the maximum number of rows that can be returned by a statement or subquery. This clause
+    is mostly used in the conjunction with <code>ORDER BY</code> to produce a deterministic result.
   </dd>
   <dt><code><em>boolean_expression</em></code></dt>
   <dd>
@@ -130,7 +135,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
   </dd>
   <dt><code><em>named_window</em></code></dt>
   <dd>
-    Specifies aliases for one or more source window specifications. The source window specifications can 
+    Specifies aliases for one or more source window specifications. The source window specifications can
     be referenced in the widow definitions in the query.
   </dd>
 </dl>
diff --git a/docs/sql-ref-syntax-qry.md b/docs/sql-ref-syntax-qry.md
index cd7c0ff..37414ac 100644
--- a/docs/sql-ref-syntax-qry.md
+++ b/docs/sql-ref-syntax-qry.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -19,14 +19,14 @@ license: |
   limitations under the License.
 ---
 
-Spark supports <code>SELECT</code> statement that is  used to retrieve rows
+Spark supports <code>SELECT</code> statement that is used to retrieve rows
 from one or more tables according to the specified clauses. The full syntax
 and brief description of supported clauses are explained in
 [SELECT](sql-ref-syntax-qry-select.html) section. Spark also provides the
 ability to generate logical and physical plan for a given query using
 [EXPLAIN](sql-ref-syntax-qry-explain.html) statement.
 
- 
+
 - [WHERE Clause](sql-ref-syntax-qry-select-where.html)
 - [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
 - [HAVING Clause](sql-ref-syntax-qry-select-having.html)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org