You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/07/03 06:05:20 UTC

[GitHub] spark pull request #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESA...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/14034

    [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is Zero or Negative

    #### What changes were proposed in this pull request?
    **Issue 1:** When a query containing LIMIT/TABLESAMPLE 0, the statistics could be zero. Results are correct but it could cause a huge performance regression. For example,
    ```Scala
    Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
      .createOrReplaceTempView("test")
    val df1 = spark.table("test")
    val df2 = spark.table("test").limit(0)
    val df = df1.join(df2, Seq("k"), "left")
    ```
    The statistics of both `df` and `df2` are zero. The statistics values should never be zero; otherwise `sizeInBytes` of `BinaryNode` will also be zero (product of children). This PR is to increase it to `1` when the num of rows is equal to 0.
    
    **Issue 2:** When a query containing LIMIT/TABLESAMPLE is negative, we should issue exceptions. Negative values could break the assumption of multiple parts. For example, statistics calculation.  Below is the example query.
    ```SQL
    SELECT * FROM testData TABLESAMPLE (-1 rows)
    SELECT * FROM testData LIMIT -1
    ``` 
    This PR is to issue an appropriate exception in this case.
    
    #### How was this patch tested?
    Added test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark limit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14034.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14034
    
----
commit 1255968908454bfb01b247567f796e10ca6e6d30
Author: gatorsmile <ga...@gmail.com>
Date:   2016-07-03T05:46:44Z

    fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69599041
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    --- End diff --
    
    Are you saying we should change the SQL parser? You know, if so, we also need to change the `TABLESAMPLE n ROWS`.
    
    FYI, even Hive does not support foldable expressions.
    ```
    hive> select * from t1 limit 1 + 1;
    FAILED: ParseException line 1:25 missing EOF at '+' near '1'
    hive> select * from t1 limit (1+1);
    FAILED: ParseException line 1:23 extraneous input '(' expecting Number near '1'
    ```
    I found that Impala supports it. 
    http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_limit.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61963/consoleFull)** for PR 14034 at commit [`f600ba4`](https://github.com/apache/spark/commit/f600ba4f4455890b17080bc1cd3f647ec9023799).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    uh... Thanks! Let me do it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61678/consoleFull)** for PR 14034 at commit [`bdf4e56`](https://github.com/apache/spark/commit/bdf4e56f3478bd99d1e92d338a984dba869363dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61963/consoleFull)** for PR 14034 at commit [`f600ba4`](https://github.com/apache/spark/commit/f600ba4f4455890b17080bc1cd3f647ec9023799).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61678/consoleFull)** for PR 14034 at commit [`bdf4e56`](https://github.com/apache/spark/commit/bdf4e56f3478bd99d1e92d338a984dba869363dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62080/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62002/consoleFull)** for PR 14034 at commit [`3036847`](https://github.com/apache/spark/commit/3036847f8436ab72b05dafcf6566a96dc766dca1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62073/consoleFull)** for PR 14034 at commit [`2e6f8d8`](https://github.com/apache/spark/commit/2e6f8d8c8b5007302415b7fd984a38fc51be44bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69498807
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    val numRows = limitExpr.eval().asInstanceOf[Int]
    --- End diff --
    
    ah, but it's still foldable. Is it possible it's non-foldable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61761/consoleFull)** for PR 14034 at commit [`a2a828f`](https://github.com/apache/spark/commit/a2a828ff3b2a944ceec48ee99163392267311646).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62069/consoleFull)** for PR 14034 at commit [`dec5ad9`](https://github.com/apache/spark/commit/dec5ad95bdd003fe58e92d1245388fa4758d8f49).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61746/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62027/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62006/consoleFull)** for PR 14034 at commit [`d135b77`](https://github.com/apache/spark/commit/d135b77b2d3dd2f9f6ae160bfb24b73df34d3995).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70183882
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
    @@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with SharedSQLContext {
           spark.sessionState.conf.autoBroadcastJoinThreshold)
       }
     
    +  test("estimates the size of limit") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
    +        val df = sql(s"""SELECT * FROM test limit $limit""")
    +
    +        val sizesGlobalLimit = df.queryExecution.analyzed.collect { case g: GlobalLimit =>
    +          g.statistics.sizeInBytes
    +        }
    +        assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesGlobalLimit.head === BigInt(expected),
    +          s"expected exact size 24 for table 'test', got: ${sizesGlobalLimit.head}")
    --- End diff --
    
    why we hardcode `24` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70167835
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2045,6 +2047,21 @@ object EliminateUnions extends Rule[LogicalPlan] {
     }
     
     /**
    + * Converts foldable numeric expressions to integers in [[GlobalLimit]] and [[LocalLimit]] operators
    + */
    +object ResolveLimits extends Rule[LogicalPlan] {
    --- End diff --
    
    Sure, let me revert it back. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62080/consoleFull)** for PR 14034 at commit [`d66870b`](https://github.com/apache/spark/commit/d66870bb274a206f16d33f56214246b17953a90e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    cc @cloud-fan @davies Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62069/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62027/consoleFull)** for PR 14034 at commit [`028aa79`](https://github.com/apache/spark/commit/028aa79e3c0b57cf887d4d735a387a5043612281).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70183976
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
    @@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with SharedSQLContext {
           spark.sessionState.conf.autoBroadcastJoinThreshold)
       }
     
    +  test("estimates the size of limit") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
    +        val df = sql(s"""SELECT * FROM test limit $limit""")
    +
    +        val sizesGlobalLimit = df.queryExecution.analyzed.collect { case g: GlobalLimit =>
    +          g.statistics.sizeInBytes
    +        }
    +        assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesGlobalLimit.head === BigInt(expected),
    +          s"expected exact size 24 for table 'test', got: ${sizesGlobalLimit.head}")
    +
    +        val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: LocalLimit =>
    +          l.statistics.sizeInBytes
    +        }
    +        assert(sizesLocalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesLocalLimit.head === BigInt(expected),
    +          s"expected exact size 24 for table 'test', got: ${sizesLocalLimit.head}")
    +      }
    +    }
    +  }
    +
    +  test("estimates the size of a limit 0 on outer join") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      val df1 = spark.table("test")
    +      val df2 = spark.table("test").limit(0)
    +      val df = df1.join(df2, Seq("k"), "left")
    +
    +      val sizes = df.queryExecution.analyzed.collect { case g: Join =>
    +        g.statistics.sizeInBytes
    +      }
    +
    +      assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    --- End diff --
    
    how about `number of Join nodes is wrong`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62069/consoleFull)** for PR 14034 at commit [`dec5ad9`](https://github.com/apache/spark/commit/dec5ad95bdd003fe58e92d1245388fa4758d8f49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70035305
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    +      failAnalysis(
    +        "The argument to the LIMIT clause must evaluate to a constant value. " +
    +        s"Limit:${limitExpr.sql}")
    +    }
    +    limitExpr.eval() match {
    +      case o: Int if o >= 0 => // OK
    +      case o: Int => failAnalysis(
    +        s"number_rows in limit clause must be equal to or greater than 0. number_rows:$o")
    +      case o => failAnalysis(
    --- End diff --
    
    do we allow other integral type? e.g. byte, long, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70183835
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -660,18 +660,51 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
     
       test("limit") {
         checkAnswer(
    -      sql("SELECT * FROM testData LIMIT 10"),
    +      sql("SELECT * FROM testData LIMIT 9 + 1"),
           testData.take(10).toSeq)
     
         checkAnswer(
    -      sql("SELECT * FROM arrayData LIMIT 1"),
    +      sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
           arrayData.collect().take(1).map(Row.fromTuple).toSeq)
     
         checkAnswer(
           sql("SELECT * FROM mapData LIMIT 1"),
           mapData.collect().take(1).map(Row.fromTuple).toSeq)
       }
     
    +  test("non-foldable expressions in LIMIT") {
    +    val e = intercept[AnalysisException] {
    +      sql("SELECT * FROM testData LIMIT key > 3")
    +    }.getMessage
    +    assert(e.contains("The argument to the LIMIT clause must evaluate to a constant value. " +
    +      "Limit:(testdata.`key` > 3)"))
    +  }
    +
    +  test("Limit: unable to evaluate and cast expressions in limit clauses to Int") {
    --- End diff --
    
    We should also update the test name, we don't try to cast the limit expression to int type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70178966
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    +      failAnalysis(
    +        "The argument to the LIMIT clause must evaluate to a constant value. " +
    +        s"Limit:${limitExpr.sql}")
    +    }
    +    limitExpr.eval() match {
    +      case o: Int if o >= 0 => // OK
    +      case o: Int => failAnalysis(
    +        s"number_rows in limit clause must be equal to or greater than 0. number_rows:$o")
    +      case o => failAnalysis(
    +        s"""number_rows in limit clause cannot be cast to integer:\"$o\".""")
    --- End diff --
    
    `cannot be cast to integer` -> `must be integer`? e.g. byte is castable to int.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    ping @cloud-fan : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61761/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61974/consoleFull)** for PR 14034 at commit [`1abdbb9`](https://github.com/apache/spark/commit/1abdbb9e6b03a01184825c9785a5b49c3724e315).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70029842
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -660,18 +660,39 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
     
       test("limit") {
         checkAnswer(
    -      sql("SELECT * FROM testData LIMIT 10"),
    +      sql("SELECT * FROM testData LIMIT 9 + 1"),
           testData.take(10).toSeq)
     
         checkAnswer(
    -      sql("SELECT * FROM arrayData LIMIT 1"),
    +      sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
           arrayData.collect().take(1).map(Row.fromTuple).toSeq)
     
         checkAnswer(
           sql("SELECT * FROM mapData LIMIT 1"),
           mapData.collect().take(1).map(Row.fromTuple).toSeq)
       }
     
    +  test("non-foldable expressions in LIMIT") {
    +    val e = intercept[AnalysisException] {
    +      sql("SELECT * FROM testData LIMIT key > 3")
    --- End diff --
    
    what will happen if the type is wrong? e.g. `LIMIT true`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61973/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69528969
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
    @@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with SharedSQLContext {
           spark.sessionState.conf.autoBroadcastJoinThreshold)
       }
     
    +  test("estimates the size of limit") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
    +        val df = sql(s"""SELECT * FROM test limit $limit""")
    +
    +        val sizesGlobalLimit = df.queryExecution.analyzed.collect { case g: GlobalLimit =>
    +          g.statistics.sizeInBytes
    +        }
    +        assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesGlobalLimit(0).equals(BigInt(expected)),
    +          s"expected exact size 24 for table 'test', got: ${sizesGlobalLimit(0)}")
    +
    +        val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: LocalLimit =>
    +          l.statistics.sizeInBytes
    +        }
    +        assert(sizesLocalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesLocalLimit(0).equals(BigInt(expected)),
    +          s"expected exact size 24 for table 'test', got: ${sizesLocalLimit(0)}")
    +      }
    +    }
    +  }
    +
    +  test("estimates the size of a limit 0 on outer join") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      val df1 = spark.table("test")
    +      val df2 = spark.table("test").limit(0)
    +      val df = df1.join(df2, Seq("k"), "left")
    +
    +      val sizes = df.queryExecution.analyzed.collect { case g: Join =>
    +        g.statistics.sizeInBytes
    +      }
    +
    +      assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +      assert(sizes(0).equals(BigInt(96)),
    --- End diff --
    
    Why do you `equals`? Would `===` not work here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69499666
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    val numRows = limitExpr.eval().asInstanceOf[Int]
    --- End diff --
    
    - Oracle: http://docs.oracle.com/javadb/10.5.3.0/ref/rrefsqljoffsetfetch.html
    - DB2 z/OS: https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sql_fetchfirstclause.html
    - MySQL: http://dev.mysql.com/doc/refman/5.7/en/select.html
    - PostgreSQL: https://www.postgresql.org/docs/8.1/static/queries-limit.html
    
    It sounds like nobody supports it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70025602
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    --- End diff --
    
    I'm saying maybe we can declare Limit as `case class GlobalLimit(limit: Int, child: LogicalPlan)`, but it's not a small change, we could think about it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    thanks, merging to master and 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62002/consoleFull)** for PR 14034 at commit [`3036847`](https://github.com/apache/spark/commit/3036847f8436ab72b05dafcf6566a96dc766dca1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61997/consoleFull)** for PR 14034 at commit [`3036847`](https://github.com/apache/spark/commit/3036847f8436ab72b05dafcf6566a96dc766dca1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61963/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69488260
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -251,6 +251,22 @@ trait CheckAnalysis extends PredicateHelper {
                     s"but one table has '${firstError.output.length}' columns and another table has " +
                     s"'${s.children.head.output.length}' columns")
     
    +          case l: GlobalLimit =>
    +            val numRows = l.limitExpr.eval().asInstanceOf[Int]
    +            if (numRows < 0) {
    +              failAnalysis(
    +                s"number_rows in limit clause must be equal to or greater than 0. " +
    +                  s"number_rows:$numRows")
    +            }
    +
    +          case l: LocalLimit =>
    --- End diff --
    
    What do you think about merging the two cases to `case l @ (_: LocalLimit | _: GlobalLimit) =>` to remove the duplication (or at least introduce a local method).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70156160
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2045,6 +2047,21 @@ object EliminateUnions extends Rule[LogicalPlan] {
     }
     
     /**
    + * Converts foldable numeric expressions to integers of [[GlobalLimit]] and [[LocalLimit]] operators
    + */
    +object ResolveLimits extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case g @ GlobalLimit(limitExpr, _) if limitExpr.foldable && isNumeric(limitExpr.eval()) =>
    +      g.copy(limitExpr = Literal(limitExpr.eval().asInstanceOf[Number].intValue(), IntegerType))
    --- End diff --
    
    Another way is like
    ```Scala
    g.copy(limitExpr = Literal(Cast(limitExpr, IntegerType).eval(), IntegerType))
    ```
    Not sure which one is better here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62078/consoleFull)** for PR 14034 at commit [`2e6f8d8`](https://github.com/apache/spark/commit/2e6f8d8c8b5007302415b7fd984a38fc51be44bf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70183958
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    --- End diff --
    
    this may be more readable:
    ```
    limitExpr match {
      case e if !e.foldable => fail(...)
      case e if e.dataType != IntegerType => fail("the limit expression must be int type, but got ${e.dataType.simpleString}")
      case e if e.eval() < 0 => fail(...)
      case e =>
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70109475
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    +      failAnalysis(
    +        "The argument to the LIMIT clause must evaluate to a constant value. " +
    +        s"Limit:${limitExpr.sql}")
    +    }
    +    limitExpr.eval() match {
    +      case o: Int if o >= 0 => // OK
    +      case o: Int => failAnalysis(
    +        s"number_rows in limit clause must be equal to or greater than 0. number_rows:$o")
    +      case o => failAnalysis(
    --- End diff --
    
    You know, we can do it, but we need to fix the other parts. Let me try it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69499892
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    val numRows = limitExpr.eval().asInstanceOf[Int]
    --- End diff --
    
    We do not support non-foldable limit clauses.
    https://github.com/apache/spark/blob/d063898bebaaf4ec2aad24c3ac70aabdbf97a190/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L67-L89
    https://github.com/apache/spark/blob/d063898bebaaf4ec2aad24c3ac70aabdbf97a190/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L398-L401
    
    But,,, we do not issue an exception if users do it. Thus, the error we got is strange:
    ```
    assertion failed: No plan for GlobalLimit (_nondeterministic#203 > 0.2)
    +- Project [key#11, value#12, rand(-1441968339187861415) AS _nondeterministic#203]
       +- LocalLimit (_nondeterministic#202 > 0.2)
          +- Project [key#11, value#12, rand(-1308350387169017676) AS _nondeterministic#202]
             +- LogicalRDD [key#11, value#12]
    
    java.lang.AssertionError: assertion failed: No plan for GlobalLimit (_nondeterministic#203 > 0.2)
    +- Project [key#11, value#12, rand(-1441968339187861415) AS _nondeterministic#203]
       +- LocalLimit (_nondeterministic#202 > 0.2)
          +- Project [key#11, value#12, rand(-1308350387169017676) AS _nondeterministic#202]
             +- LogicalRDD [key#11, value#12]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61974/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62073/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69599120
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
    @@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with SharedSQLContext {
           spark.sessionState.conf.autoBroadcastJoinThreshold)
       }
     
    +  test("estimates the size of limit") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
    +        val df = sql(s"""SELECT * FROM test limit $limit""")
    +
    +        val sizesGlobalLimit = df.queryExecution.analyzed.collect { case g: GlobalLimit =>
    +          g.statistics.sizeInBytes
    +        }
    +        assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesGlobalLimit(0).equals(BigInt(expected)),
    +          s"expected exact size 24 for table 'test', got: ${sizesGlobalLimit(0)}")
    +
    +        val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: LocalLimit =>
    +          l.statistics.sizeInBytes
    +        }
    +        assert(sizesLocalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesLocalLimit(0).equals(BigInt(expected)),
    +          s"expected exact size 24 for table 'test', got: ${sizesLocalLimit(0)}")
    +      }
    +    }
    +  }
    +
    +  test("estimates the size of a limit 0 on outer join") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      val df1 = spark.table("test")
    +      val df2 = spark.table("test").limit(0)
    +      val df = df1.join(df2, Seq("k"), "left")
    +
    +      val sizes = df.queryExecution.analyzed.collect { case g: Join =>
    +        g.statistics.sizeInBytes
    +      }
    +
    +      assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +      assert(sizes(0).equals(BigInt(96)),
    --- End diff --
    
    This is just following what the other test cases did. Sure, we can change all of them. Let me do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62027/consoleFull)** for PR 14034 at commit [`028aa79`](https://github.com/apache/spark/commit/028aa79e3c0b57cf887d4d735a387a5043612281).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62006/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61997/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70166568
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2045,6 +2047,21 @@ object EliminateUnions extends Rule[LogicalPlan] {
     }
     
     /**
    + * Converts foldable numeric expressions to integers in [[GlobalLimit]] and [[LocalLimit]] operators
    + */
    +object ResolveLimits extends Rule[LogicalPlan] {
    --- End diff --
    
    Thanks for trying this out! Yea looks like it's doable. Let's remove it first and do it in follow-ups, to make this PR surgical.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62078/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62073/consoleFull)** for PR 14034 at commit [`2e6f8d8`](https://github.com/apache/spark/commit/2e6f8d8c8b5007302415b7fd984a38fc51be44bf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62002/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70166579
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2045,6 +2047,21 @@ object EliminateUnions extends Rule[LogicalPlan] {
     }
     
     /**
    + * Converts foldable numeric expressions to integers in [[GlobalLimit]] and [[LocalLimit]] operators
    + */
    +object ResolveLimits extends Rule[LogicalPlan] {
    --- End diff --
    
    We can create a JIRA first and discuss with others to decide if it's useful to support all integral types in limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69586217
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    --- End diff --
    
    If it must be foldable, can we just use `int` as limit instead of expression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61973/consoleFull)** for PR 14034 at commit [`8fd72f6`](https://github.com/apache/spark/commit/8fd72f650af0653c96cc0702a8a96743ed73e08f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14034


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62080/consoleFull)** for PR 14034 at commit [`d66870b`](https://github.com/apache/spark/commit/d66870bb274a206f16d33f56214246b17953a90e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69499913
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    val numRows = limitExpr.eval().asInstanceOf[Int]
    --- End diff --
    
    Let me do it in this PR. Thank you for your review! : ) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61736/consoleFull)** for PR 14034 at commit [`3c402d3`](https://github.com/apache/spark/commit/3c402d304883fa83712f07cd09a3bbe765b1f071).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    you missed one comment : https://github.com/apache/spark/pull/14034/files#r70183958 :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61736/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70024905
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -660,7 +660,12 @@ case class GlobalLimit(limitExpr: Expression, child: LogicalPlan) extends UnaryN
       }
       override lazy val statistics: Statistics = {
         val limit = limitExpr.eval().asInstanceOf[Int]
    -    val sizeInBytes = (limit: Long) * output.map(a => a.dataType.defaultSize).sum
    +    var sizeInBytes = (limit: Long) * output.map(a => a.dataType.defaultSize).sum
    +    if (sizeInBytes == 0) {
    --- End diff --
    
    I think it's a special case for limit. How about we make it more clear? e.g.
    ```
    val sizeInBytes = if (limit == 0) {
      1
    } else {
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69498258
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    val numRows = limitExpr.eval().asInstanceOf[Int]
    --- End diff --
    
    Nope. Users can input an expression here. For example, 
    https://github.com/apache/spark/blob/e5d703bca85c65ce329b1e202283cfa35d109146/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L234


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70194595
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
    @@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with SharedSQLContext {
           spark.sessionState.conf.autoBroadcastJoinThreshold)
       }
     
    +  test("estimates the size of limit") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
    +        val df = sql(s"""SELECT * FROM test limit $limit""")
    +
    +        val sizesGlobalLimit = df.queryExecution.analyzed.collect { case g: GlobalLimit =>
    +          g.statistics.sizeInBytes
    +        }
    +        assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesGlobalLimit.head === BigInt(expected),
    +          s"expected exact size 24 for table 'test', got: ${sizesGlobalLimit.head}")
    --- End diff --
    
    : ) Forgot to change it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62058/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62006/consoleFull)** for PR 14034 at commit [`d135b77`](https://github.com/apache/spark/commit/d135b77b2d3dd2f9f6ae160bfb24b73df34d3995).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69498054
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    val numRows = limitExpr.eval().asInstanceOf[Int]
    --- End diff --
    
    is the limit expression guaranteed to be literal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61736/consoleFull)** for PR 14034 at commit [`3c402d3`](https://github.com/apache/spark/commit/3c402d304883fa83712f07cd09a3bbe765b1f071).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61974/consoleFull)** for PR 14034 at commit [`1abdbb9`](https://github.com/apache/spark/commit/1abdbb9e6b03a01184825c9785a5b49c3724e315).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r69494036
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -251,6 +251,22 @@ trait CheckAnalysis extends PredicateHelper {
                     s"but one table has '${firstError.output.length}' columns and another table has " +
                     s"'${s.children.head.output.length}' columns")
     
    +          case l: GlobalLimit =>
    +            val numRows = l.limitExpr.eval().asInstanceOf[Int]
    +            if (numRows < 0) {
    +              failAnalysis(
    +                s"number_rows in limit clause must be equal to or greater than 0. " +
    +                  s"number_rows:$numRows")
    +            }
    +
    +          case l: LocalLimit =>
    --- End diff --
    
    I do not think we can merge them, but, yeah, we can create a local function for it. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70118783
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    +      failAnalysis(
    +        "The argument to the LIMIT clause must evaluate to a constant value. " +
    +        s"Limit:${limitExpr.sql}")
    +    }
    +    limitExpr.eval() match {
    +      case o: Int if o >= 0 => // OK
    +      case o: Int => failAnalysis(
    +        s"number_rows in limit clause must be equal to or greater than 0. number_rows:$o")
    +      case o => failAnalysis(
    --- End diff --
    
    Tried to do it in different ways, but the best way I found is to introduce an Analyzer rule. 
    
    We have one rule in Optimizer and one rule in Planner. Both assume the `limitExpr` is an `Integer` type. It looks ugly to convert the type everywhere. To support different numeric types, we need to convert these foldable numeric values to integer in `Analzyer`, I think. Let me upload a fix at first. You can judge whether this is a good way or not. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62058/consoleFull)** for PR 14034 at commit [`01137dc`](https://github.com/apache/spark/commit/01137dcf739e75be31ba8836e342537f66971aa3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62058/consoleFull)** for PR 14034 at commit [`01137dc`](https://github.com/apache/spark/commit/01137dcf739e75be31ba8836e342537f66971aa3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70028525
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -660,7 +660,12 @@ case class GlobalLimit(limitExpr: Expression, child: LogicalPlan) extends UnaryN
       }
       override lazy val statistics: Statistics = {
         val limit = limitExpr.eval().asInstanceOf[Int]
    -    val sizeInBytes = (limit: Long) * output.map(a => a.dataType.defaultSize).sum
    +    var sizeInBytes = (limit: Long) * output.map(a => a.dataType.defaultSize).sum
    +    if (sizeInBytes == 0) {
    --- End diff --
    
    Thanks! Let me fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61997/consoleFull)** for PR 14034 at commit [`3036847`](https://github.com/apache/spark/commit/3036847f8436ab72b05dafcf6566a96dc766dca1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70028351
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    --- End diff --
    
    uh, I see. Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61678/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61973/consoleFull)** for PR 14034 at commit [`8fd72f6`](https://github.com/apache/spark/commit/8fd72f650af0653c96cc0702a8a96743ed73e08f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61746/consoleFull)** for PR 14034 at commit [`5b36fbc`](https://github.com/apache/spark/commit/5b36fbcd1b0417c0e5796299ffb6b538d322fed6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61761/consoleFull)** for PR 14034 at commit [`a2a828f`](https://github.com/apache/spark/commit/a2a828ff3b2a944ceec48ee99163392267311646).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70183276
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
         }).length > 1
       }
     
    +  private def checkLimitClause(limitExpr: Expression): Unit = {
    +    if (!limitExpr.foldable) {
    +      failAnalysis(
    +        "The argument to the LIMIT clause must evaluate to a constant value. " +
    +        s"Limit:${limitExpr.sql}")
    +    }
    +    limitExpr.eval() match {
    +      case o: Int if o >= 0 => // OK
    +      case o: Int => failAnalysis(
    +        s"number_rows in limit clause must be equal to or greater than 0. number_rows:$o")
    +      case o => failAnalysis(
    +        s"""number_rows in limit clause cannot be cast to integer:\"$o\".""")
    --- End diff --
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70194607
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -660,18 +660,51 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
     
       test("limit") {
         checkAnswer(
    -      sql("SELECT * FROM testData LIMIT 10"),
    +      sql("SELECT * FROM testData LIMIT 9 + 1"),
           testData.take(10).toSeq)
     
         checkAnswer(
    -      sql("SELECT * FROM arrayData LIMIT 1"),
    +      sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
           arrayData.collect().take(1).map(Row.fromTuple).toSeq)
     
         checkAnswer(
           sql("SELECT * FROM mapData LIMIT 1"),
           mapData.collect().take(1).map(Row.fromTuple).toSeq)
       }
     
    +  test("non-foldable expressions in LIMIT") {
    +    val e = intercept[AnalysisException] {
    +      sql("SELECT * FROM testData LIMIT key > 3")
    +    }.getMessage
    +    assert(e.contains("The argument to the LIMIT clause must evaluate to a constant value. " +
    +      "Limit:(testdata.`key` > 3)"))
    +  }
    +
    +  test("Limit: unable to evaluate and cast expressions in limit clauses to Int") {
    --- End diff --
    
    Yeah, will change it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    LGTM except some style comments, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70194599
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
    @@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with SharedSQLContext {
           spark.sessionState.conf.autoBroadcastJoinThreshold)
       }
     
    +  test("estimates the size of limit") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
    +        val df = sql(s"""SELECT * FROM test limit $limit""")
    +
    +        val sizesGlobalLimit = df.queryExecution.analyzed.collect { case g: GlobalLimit =>
    +          g.statistics.sizeInBytes
    +        }
    +        assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesGlobalLimit.head === BigInt(expected),
    +          s"expected exact size 24 for table 'test', got: ${sizesGlobalLimit.head}")
    +
    +        val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: LocalLimit =>
    +          l.statistics.sizeInBytes
    +        }
    +        assert(sizesLocalLimit.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    +        assert(sizesLocalLimit.head === BigInt(expected),
    +          s"expected exact size 24 for table 'test', got: ${sizesLocalLimit.head}")
    +      }
    +    }
    +  }
    +
    +  test("estimates the size of a limit 0 on outer join") {
    +    withTempTable("test") {
    +      Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
    +        .createOrReplaceTempView("test")
    +      val df1 = spark.table("test")
    +      val df2 = spark.table("test").limit(0)
    +      val df = df1.join(df2, Seq("k"), "left")
    +
    +      val sizes = df.queryExecution.analyzed.collect { case g: Join =>
    +        g.statistics.sizeInBytes
    +      }
    +
    +      assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
    --- End diff --
    
    Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #61746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61746/consoleFull)** for PR 14034 at commit [`5b36fbc`](https://github.com/apache/spark/commit/5b36fbcd1b0417c0e5796299ffb6b538d322fed6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14034
  
    **[Test build #62078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62078/consoleFull)** for PR 14034 at commit [`2e6f8d8`](https://github.com/apache/spark/commit/2e6f8d8c8b5007302415b7fd984a38fc51be44bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14034#discussion_r70033419
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -660,18 +660,39 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
     
       test("limit") {
         checkAnswer(
    -      sql("SELECT * FROM testData LIMIT 10"),
    +      sql("SELECT * FROM testData LIMIT 9 + 1"),
           testData.take(10).toSeq)
     
         checkAnswer(
    -      sql("SELECT * FROM arrayData LIMIT 1"),
    +      sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
           arrayData.collect().take(1).map(Row.fromTuple).toSeq)
     
         checkAnswer(
           sql("SELECT * FROM mapData LIMIT 1"),
           mapData.collect().take(1).map(Row.fromTuple).toSeq)
       }
     
    +  test("non-foldable expressions in LIMIT") {
    +    val e = intercept[AnalysisException] {
    +      sql("SELECT * FROM testData LIMIT key > 3")
    --- End diff --
    
    A good question! : ) Now, the exception we issued is not good:
    ```
    java.lang.Boolean cannot be cast to java.lang.Integer
    java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.Integer
    	at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
    ```
    
    Let me fix it and throw a more reasonable exception:
    ```
    number_rows in limit clause cannot be cast to integer:true;
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org