You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by rberenguel <gi...@git.apache.org> on 2017/02/28 13:46:09 UTC

[GitHub] spark pull request #17100: [SPARK-13947][PYTHON] PySpark DataFrames: The err...

GitHub user rberenguel opened a pull request:

    https://github.com/apache/spark/pull/17100

    [SPARK-13947][PYTHON] PySpark DataFrames: The error message from using an invalid table reference is not clear

    ## What changes were proposed in this pull request?
    
     Rewritten error message for clarity. Added extra information in case of attribute name collision, hinting the user to double-check referencing two different tables
    
    ## How was this patch tested?
    
    No functional changes, only final message has changed. It has been tested manually against the situation proposed in the JIRA ticket. Automated tests in repository pass.
    
    This PR is original work from me and I license this work to the Spark project

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rberenguel/spark SPARK-13947-error-message

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17100
    
----
commit 981224bd9631a0ac633b6967e639197794894fcf
Author: Ruben Berenguel Montoro <ru...@mostlymaths.net>
Date:   2017-02-28T13:40:47Z

    SPARK-13947 Rewritten error message for clarity. Added extra information in case of attribute name collision

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146667191
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val sameNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
    +                  |Please check if the right attribute(s) are used.""".stripMargin
    +            } else {
    +              ""
    +            }
    +
                 val missingAttributes = o.missingInput.mkString(",")
                 val input = o.inputSet.mkString(",")
     
    -            failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +            val msg = s"""Resolved attribute(s) $missingAttributes missing from $input
    +                          |in operator ${operator.simpleString}.""".stripMargin
    +
    +            failAnalysis(if (repeatedNameHint.nonEmpty) msg + "\n" + repeatedNameHint else msg)
    --- End diff --
    
    Yup thanks, is more orderly this way


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146257154
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,23 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
    --- End diff --
    
    we can make the test case more strong by adding another attribute `c` which is missing from input but doesn't have the same name in input.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #73700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73700/testReport)** for PR 17100 at commit [`ea07688`](https://github.com/apache/spark/commit/ea07688ae1e09505fef58de834c35e08324cb83b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146255861
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
    +                  |different input operators, with the same name.""".stripMargin
    --- End diff --
    
    The message `two different` could be wrong (such as the test case). How about changing the hint message:
    ```
    Attribute(s) with the same name are found in the put: `$sameNames`. Please check if the right attribute(s) are used.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146625735
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val sameNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
    --- End diff --
    
    You mean wrapping each column name with the ticks?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146176609
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    --- End diff --
    
    What do you suggest instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146667141
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val sameNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
    --- End diff --
    
    Ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: Th...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r107325822
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -337,11 +337,29 @@ trait CheckAnalysis extends PredicateHelper {
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val missingAttributesHelper = o.missingInput.map(a => (a.name, a.exprId.id))
    +
    +            val availableAttributesHelper = o.inputSet.map(a => (a.name, a.exprId.id))
    +
    +            val common = (missingAttributesHelper ++ availableAttributesHelper).groupBy(_._1)
    +            val commonNames = common.map(x => (x._1, x._2.toSet))
    +              .filter(_._2.size > 1)
    +              .map(_._1)
    +
    +            val repeatedNameHint = if (commonNames.size > 0) {
    +              s"\n|Observe that attribute(s) ${commonNames.mkString(",")} appear in your " +
    +                "query with at least two different hashes, but same name."
    --- End diff --
    
    Users do not understand what hashes mean. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146205865
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    --- End diff --
    
    Yea that' what I meant :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82980/testReport)** for PR 17100 at commit [`33a8a71`](https://github.com/apache/spark/commit/33a8a71092b699f6e402c67684fe4bfab6d2ccf8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #77502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77502/testReport)** for PR 17100 at commit [`0cb9825`](https://github.com/apache/spark/commit/0cb9825a1cfebe67f08ce344e67a06c53e4a78be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146234832
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
    +                  |different input operators, with the same name.""".stripMargin
    +            } else {
    +              ""
    +            }
    +
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"""Some resolved attribute(s) are not present among the available attributes
    +                |for a query.
    +                |$missingAttributes is not in $availableAttributes.
    +                |$repeatedNameHint
    +                |The failed query was for operator
    +                |${operator.simpleString}""".stripMargin)
    --- End diff --
    
    Hmmm seems reasonable, indeed. and avoids that horrible trailing newline. Thanks a lot :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #73714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73714/testReport)** for PR 17100 at commit [`65b9596`](https://github.com/apache/spark/commit/65b9596c229ac2b62ecdfeb98e541d2ea92e078d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82981/testReport)** for PR 17100 at commit [`e3cc455`](https://github.com/apache/spark/commit/e3cc4555b7a14299b93e8e4e0ae6d7f482582f9d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146128078
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    --- End diff --
    
    Oh, much handier. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73714/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146257308
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,23 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
             Nil,
    -        Alias(sum(AttributeReference("a", LongType)(exprId = ExprId(1))), "b")() :: Nil,
    -        LocalRelation(
    -          AttributeReference("a", LongType)(exprId = ExprId(2))))
    +        bAlias,
    +        LocalRelation(otherA))
     
         assert(plan.resolved)
     
    -    assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
    +    val errorMsg = s"""Resolved attribute(s) ${attrA.toString} missing from ${otherA.toString}
    +                     |in operator !Aggregate [${bAlias.mkString("#")}].
    +                     |Please check attribute(s) `a`, they seem to appear in two
    +                     |different input operators, with the same name.""".stripMargin
    +
    +
    --- End diff --
    
    only one new line is enough here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    cc @wzhfy Could you review this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON] PySpark DataFrames: The error mess...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #73700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73700/testReport)** for PR 17100 at commit [`ea07688`](https://github.com/apache/spark/commit/ea07688ae1e09505fef58de834c35e08324cb83b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146627111
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val sameNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
    +                  |Please check if the right attribute(s) are used.""".stripMargin
    +            } else {
    +              ""
    +            }
    +
                 val missingAttributes = o.missingInput.mkString(",")
                 val input = o.inputSet.mkString(",")
     
    -            failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +            val msg = s"""Resolved attribute(s) $missingAttributes missing from $input
    +                          |in operator ${operator.simpleString}.""".stripMargin
    +
    +            failAnalysis(if (repeatedNameHint.nonEmpty) msg + "\n" + repeatedNameHint else msg)
    --- End diff --
    
    How about 
    ```
                val missingAttributes = o.missingInput.mkString(",")
                val input = o.inputSet.mkString(",")
                val msgForMissingAttributes = s"Resolved attribute(s) $missingAttributes missing " +
                  s"from $input in operator ${operator.simpleString}."
    
                val resolver = plan.conf.resolver
                val attrsWithSameName = o.missingInput.filter { missing =>
                  o.inputSet.exists(input => resolver(missing.name, input.name))
                }
    
                val msg = if (attrsWithSameName.nonEmpty) {
                  val sameNames = attrsWithSameName.map(_.name).mkString(",")
                  s"$msgForMissingAttributes. Attribute(s) with the same name appear in the " +
                    s"operation: $sameNames. Please check if the right attribute(s) are used."
                } else {
                  msgForMissingAttributes
                }
    
                failAnalysis(msg)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82979/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146469083
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
    +                  |different input operators, with the same name.""".stripMargin
    --- End diff --
    
    I have changed it, now looks better no matter how many appear


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82970/testReport)** for PR 17100 at commit [`6e8ab42`](https://github.com/apache/spark/commit/6e8ab429a229df5cb075239c69fe90404734ef64).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146120800
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -125,7 +128,7 @@ trait CheckAnalysis extends PredicateHelper {
                 }
     
               case s: SubqueryExpression =>
    -            checkSubqueryExpression(operator, s)
    +            checkAnalysisWithConf(s.plan, conf)
    --- End diff --
    
    This is wrong. Could you revert it back?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146121917
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    +            val commonAttrs = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (commonAttrs.size > 0) {
    +              val commonNames = commonAttrs.map(_.name).mkString(",")
    +              s"""\n|Observe that attribute(s) $commonNames appear in two
    --- End diff --
    
    The new line `\n` is intentional?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #76434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76434/testReport)** for PR 17100 at commit [`c2dfe11`](https://github.com/apache/spark/commit/c2dfe11c4c5da3ec8fb0983377400aa999213c35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146121907
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    +            val commonAttrs = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (commonAttrs.size > 0) {
    +              val commonNames = commonAttrs.map(_.name).mkString(",")
    +              s"""\n|Observe that attribute(s) $commonNames appear in two
    +                  |different datasets, with the same name"""
    --- End diff --
    
    Missing a `.` at the end of sentence. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82980/testReport)** for PR 17100 at commit [`33a8a71`](https://github.com/apache/spark/commit/33a8a71092b699f6e402c67684fe4bfab6d2ccf8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146211291
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    +            } else {
    +              ""
    +            }
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"""Some resolved attribute(s) are not present among the available attributes
    +                |for a query.
    +                |$missingAttributes is not in $availableAttributes.$repeatedNameHint
    --- End diff --
    
    Dang, I knew there was a reason for the weird new line back when I wrote this. I'm putting the `\n` back, do you have any idea that doesn't look as weird as this does, though?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146229748
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
    +                  |different input operators, with the same name.""".stripMargin
    +            } else {
    +              ""
    +            }
    +
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"""Some resolved attribute(s) are not present among the available attributes
    +                |for a query.
    +                |$missingAttributes is not in $availableAttributes.
    +                |$repeatedNameHint
    +                |The failed query was for operator
    +                |${operator.simpleString}""".stripMargin)
    --- End diff --
    
    the final message could be:
    ```
    if (repeatedNameHint.nonEmpty) msg + “ ” + repeatedNameHint else msg
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146208370
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    +            } else {
    +              ""
    +            }
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"""Some resolved attribute(s) are not present among the available attributes
    +                |for a query.
    +                |$missingAttributes is not in $availableAttributes.$repeatedNameHint
    --- End diff --
    
    This may add an empty line if `repeatedNameHint` is an empty string?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146121901
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    +            val commonAttrs = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (commonAttrs.size > 0) {
    +              val commonNames = commonAttrs.map(_.name).mkString(",")
    +              s"""\n|Observe that attribute(s) $commonNames appear in two
    --- End diff --
    
    I'm not sure if this is always the case. Some failures in the query plan analysis can cause missing attributes, I mean.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82960/testReport)** for PR 17100 at commit [`75be390`](https://github.com/apache/spark/commit/75be3900f65e64bd083a91e5c63978489a61ccfe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146153590
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    --- End diff --
    
    `attrsWithSameName.size > 0` -> `attrsWithSameName.nonEmpty`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146152895
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,28 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val aId = Array[String](attrA.name, attrA.exprId.id.toString)
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val otherAId = Array[String](otherA.name, otherA.exprId.id.toString)
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
             Nil,
    -        Alias(sum(AttributeReference("a", LongType)(exprId = ExprId(1))), "b")() :: Nil,
    -        LocalRelation(
    -          AttributeReference("a", LongType)(exprId = ExprId(2))))
    +        bAlias,
    +        LocalRelation(otherA))
     
         assert(plan.resolved)
     
    -    assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
    +    val errorMsg = s"""Some resolved attribute(s) are not present among the available attributes
    +                     |for a query.
    +                     |${aId.mkString("#")}L is not in ${otherAId.mkString("#")}L.
    --- End diff --
    
    Seems this is still unchanged?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146128090
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -125,7 +128,7 @@ trait CheckAnalysis extends PredicateHelper {
                 }
     
               case s: SubqueryExpression =>
    -            checkSubqueryExpression(operator, s)
    +            checkAnalysisWithConf(s.plan, conf)
    --- End diff --
    
    Yup, sorry. Was too eager to merge back master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146176760
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
    --- End diff --
    
    Thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #76401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76401/testReport)** for PR 17100 at commit [`4ac8143`](https://github.com/apache/spark/commit/4ac8143cf63f5b4777a66f236824671b0bb05933).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    @viirya Hi, I based the title of the PR off the original ticket (https://issues.apache.org/jira/browse/SPARK-13947), but indeed has nothing to do with Python. I see now the ticket has been "moved" to SQL, so I'll rewrite the title when I send the fixes (might take me a while to re-run the test suite) 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Can you give a look to the changes @gatorsmile when you have a few spare moments? Also not sure how the conflict has appeared: it was merging cleanly when the CI test suite ran (will obviously fix, but first I want to confirm it's the right way of doing this now)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82979/testReport)** for PR 17100 at commit [`72b72cf`](https://github.com/apache/spark/commit/72b72cfb0fecc169e0bc289f7f319bdb2b947de5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    @rberenguel Could you fix [this comment](https://github.com/apache/spark/pull/17100#pullrequestreview-71023434)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146153288
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    --- End diff --
    
    `missingAttributes` doesn't need `missingAttributes` or `availableAttributes`, could you move this definition above right after `attrsWithSameName`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146177196
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    --- End diff --
    
    I don't understand where I should move missing or available, isn't it already just right after `attrsWithSameName`? But I moved it after repeatedNameHint, since it's needed only for the `failAnalysis` argument


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82981/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #76401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76401/testReport)** for PR 17100 at commit [`4ac8143`](https://github.com/apache/spark/commit/4ac8143cf63f5b4777a66f236824671b0bb05933).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    re-ping @gatorsmile?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75013/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76401/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73700/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON] PySpark DataFrames: The error mess...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Jenkins, ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76402/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146230295
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
    +                  |different input operators, with the same name.""".stripMargin
    +            } else {
    +              ""
    +            }
    +
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    --- End diff --
    
    could you revert this change? original `input` is ok for me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146128109
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    +            val commonAttrs = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (commonAttrs.size > 0) {
    +              val commonNames = commonAttrs.map(_.name).mkString(",")
    +              s"""\n|Observe that attribute(s) $commonNames appear in two
    --- End diff --
    
    I'll rewrite the wording a bit so it hints to this as the cause (it's the most usual I suspect). The newline is for readability of the error message. I could remove it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146211140
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
    +                  |different input operators, with the same name.""".stripMargin
    +            } else {
    +              ""
    +            }
    +
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"""Some resolved attribute(s) are not present among the available attributes
    +                |for a query.
    +                |$missingAttributes is not in $availableAttributes.
    +                |$repeatedNameHint
    +                |The failed query was for operator
    +                |${operator.simpleString}""".stripMargin)
    --- End diff --
    
    Actually I agree with @viirya, the original error message is OK for me. If you really want to improve it, I think just adding `repeatedNameHint` at the end would be good enough:
    ```
    s"Resolved attribute(s) $missingAttributes missing from $input in operator ${operator.simpleString}. $repeatedNameHint"
    ```
    What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146256023
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    --- End diff --
    
    `commonNames` -> `sameNames`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #83020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83020/testReport)** for PR 17100 at commit [`b8fbdea`](https://github.com/apache/spark/commit/b8fbdea3b35ea69e6dc74ea9c11e257ac88dc765).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146233808
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    --- End diff --
    
    Done. Do you know if there is any style rule for when to use parentheses vs braces in these kind of lambda functions?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON] PySpark DataFrames: The error mess...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    @rberenguel : how about adding the "[SQL]" tag to this, since while the feature request comes out of PySpark its changing the SQL code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    LGTM except one minor comment. ping @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    In the title, why specifying "PySpark" DataFrames? I don't see any python code change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146230050
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    --- End diff --
    
    could you change the format?
    ```
    o.missingInput.filter { missing =>
      o.inputSet.exists(input => resolver(missing.name, input.name))
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146177263
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    --- End diff --
    
    I have changed it. Agree datasets may not be the best word to define it


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #73714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73714/testReport)** for PR 17100 at commit [`65b9596`](https://github.com/apache/spark/commit/65b9596c229ac2b62ecdfeb98e541d2ea92e078d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146251261
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    --- End diff --
    
    Usually parentheses are used in one line, while curly braces are used in multi-line. For complete scala style guide, please check this: https://github.com/databricks/scala-style-guide


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146627509
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val sameNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
    --- End diff --
    
    Let us follow what we did for `missingAttributes`. No need to do it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Congratulations on your change @rberenguel :D :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146234085
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
    +                  |different input operators, with the same name.""".stripMargin
    +            } else {
    +              ""
    +            }
    +
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    --- End diff --
    
    Okies


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146128114
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    +            val commonAttrs = o.missingInput.filter(x =>
    --- End diff --
    
    Indeed!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82970/testReport)** for PR 17100 at commit [`6e8ab42`](https://github.com/apache/spark/commit/6e8ab429a229df5cb075239c69fe90404734ef64).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Thanks @gatorsmile @holdenk @viirya @wzhfy for all the help


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82950/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146153805
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    --- End diff --
    
    I'm +0 for this additional error msg. I'm not sure if it is actually more clear to end users.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #76434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76434/testReport)** for PR 17100 at commit [`c2dfe11`](https://github.com/apache/spark/commit/c2dfe11c4c5da3ec8fb0983377400aa999213c35).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r107354055
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -337,11 +337,29 @@ trait CheckAnalysis extends PredicateHelper {
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val missingAttributesHelper = o.missingInput.map(a => (a.name, a.exprId.id))
    +
    +            val availableAttributesHelper = o.inputSet.map(a => (a.name, a.exprId.id))
    +
    +            val common = (missingAttributesHelper ++ availableAttributesHelper).groupBy(_._1)
    +            val commonNames = common.map(x => (x._1, x._2.toSet))
    +              .filter(_._2.size > 1)
    +              .map(_._1)
    +
    +            val repeatedNameHint = if (commonNames.size > 0) {
    +              s"\n|Observe that attribute(s) ${commonNames.mkString(",")} appear in your " +
    +                "query with at least two different hashes, but same name."
    --- End diff --
    
    Good point!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82981/testReport)** for PR 17100 at commit [`e3cc455`](https://github.com/apache/spark/commit/e3cc4555b7a14299b93e8e4e0ae6d7f482582f9d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON] PySpark DataFrames: The error mess...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73604/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: Th...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r104455583
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -398,16 +398,22 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val aId = Array[String](attrA.name, attrA.exprId.id.toString)
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val otherAId = Array[String](otherA.name, otherA.exprId.id.toString)
    +    val plan = Aggregate(
             Nil,
    -        Alias(sum(AttributeReference("a", LongType)(exprId = ExprId(1))), "b")() :: Nil,
    -        LocalRelation(
    -          AttributeReference("a", LongType)(exprId = ExprId(2))))
    +        Alias(sum(attrA), "b")() :: Nil,
    +        LocalRelation(otherA))
     
         assert(plan.resolved)
     
    -    assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
    +    assertAnalysisError(plan,
    +                        "Some resolved attribute(s) are not present among available " +
    --- End diff --
    
    This is a weird mixing of strings formats.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #76402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76402/testReport)** for PR 17100 at commit [`766a033`](https://github.com/apache/spark/commit/766a033f19602dbd7da6eff96947236c8c0fd2a2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146152572
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
     import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
     import org.apache.spark.sql.catalyst.plans._
     import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.internal.SQLConf
    --- End diff --
    
    unused import


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    @rberenguel Thank you for your contribution!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146468977
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    @holdenk added! And I suspect I have some issue locally with coursier when running the tests... They supposedly work (as in, no outputted errors/failures?) 1/3 times, the other 2 cause coursier cache misses. But I think the no errors one is actually "I give up running the test suite anymore". I'll give a look at this (since it's pretty bad I can't rely on the local test suite!) and fix the failing test, is pretty straightforward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82980/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Will review it this weekend. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146177319
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    +            } else {
    +              ""
    +            }
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"""Some resolved attribute(s) are not present among the available attributes
    +                |for a query.
    +                |$missingAttributes is not in $availableAttributes.$repeatedNameHint
    --- End diff --
    
    Oh, thanks. I didn't see it


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146252936
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    --- End diff --
    
    Thanks @wzhfy 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146469410
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,23 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
             Nil,
    -        Alias(sum(AttributeReference("a", LongType)(exprId = ExprId(1))), "b")() :: Nil,
    -        LocalRelation(
    -          AttributeReference("a", LongType)(exprId = ExprId(2))))
    +        bAlias,
    +        LocalRelation(otherA))
     
         assert(plan.resolved)
     
    -    assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
    +    val errorMsg = s"""Resolved attribute(s) ${attrA.toString} missing from ${otherA.toString}
    +                     |in operator !Aggregate [${bAlias.mkString("#")}].
    +                     |Please check attribute(s) `a`, they seem to appear in two
    +                     |different input operators, with the same name.""".stripMargin
    +
    +
    --- End diff --
    
    Fixed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146153739
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    --- End diff --
    
    The `datasets` concept is a little strange here, would `inputs` or `input operators` be better?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82994/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON] PySpark DataFrames: The error mess...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #73604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73604/testReport)** for PR 17100 at commit [`981224b`](https://github.com/apache/spark/commit/981224bd9631a0ac633b6967e639197794894fcf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146120797
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    --- End diff --
    
    We do not need to pass conf now. You can directly call `plan.conf.resolver`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146469160
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,23 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
    --- End diff --
    
    Added another attribute


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: Th...

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r104455520
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -338,11 +338,29 @@ trait CheckAnalysis extends PredicateHelper {
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val missingAttributesHelper = o.missingInput.map(a => (a.name, a.exprId.id))
    +
    +            val availableAttributesHelper = o.inputSet.map(a => (a.name, a.exprId.id))
    +
    +            val common = (missingAttributesHelper ++ availableAttributesHelper).groupBy(_._1)
    +            val commonNames = common.map(x => (x._1, x._2.toSet))
    +              .filter(_._2.size > 1)
    +              .map(_._1)
    +
    +            val repeatedNameHint = if (commonNames.size > 0) {
    +              s"\n|Observe that attribute(s) ${commonNames.mkString(",")} appear in your " +
    +                "query with at least two different hashes, but same name."
    +            } else {
    +              ""
    +            }
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"|Some resolved attribute(s) are not present among available attributes " +
    --- End diff --
    
    This is really long, would maybe """ + stripMargin be easier to write/read?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77502/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146177284
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
     import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
     import org.apache.spark.sql.catalyst.plans._
     import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.internal.SQLConf
    --- End diff --
    
    Thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146128111
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    +            val commonAttrs = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (commonAttrs.size > 0) {
    +              val commonNames = commonAttrs.map(_.name).mkString(",")
    +              s"""\n|Observe that attribute(s) $commonNames appear in two
    +                  |different datasets, with the same name"""
    --- End diff --
    
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82970/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON] PySpark DataFrames: The error mess...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #73604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73604/testReport)** for PR 17100 at commit [`981224b`](https://github.com/apache/spark/commit/981224bd9631a0ac633b6967e639197794894fcf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76434/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82960/testReport)** for PR 17100 at commit [`75be390`](https://github.com/apache/spark/commit/75be3900f65e64bd083a91e5c63978489a61ccfe).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #76402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76402/testReport)** for PR 17100 at commit [`766a033`](https://github.com/apache/spark/commit/766a033f19602dbd7da6eff96947236c8c0fd2a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #75013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75013/testReport)** for PR 17100 at commit [`7bb6f35`](https://github.com/apache/spark/commit/7bb6f35d9153ea522b8ccb4bcd44bf1c70cc64f0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #75013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75013/testReport)** for PR 17100 at commit [`7bb6f35`](https://github.com/apache/spark/commit/7bb6f35d9153ea522b8ccb4bcd44bf1c70cc64f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146623577
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val sameNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
    --- End diff --
    
    Normally, we do not quote multiple column names in the same quote. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146153863
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    --- End diff --
    
    `\n|Please check attribute(s) `$commonNames`, they seem to appear in two...`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146544863
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val sameNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
    +                  |Please check if the right attribute(s) are used.""".stripMargin
    --- End diff --
    
    I don't have a strong case for having two lines here, and I see the point of seeing it as one line. I guess the best way to have it as one line is splitting it into 
    ```scala
    val attributeRepetitionMsg = s"Attribute(s) with the same name appear in the operation: `$sameNames`"
    val checkAttributesMsg = s"Please check if the right attribute(s) are used."
    ```
    to avoid hitting the < 100 chars linting rule?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146176626
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
     import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
     import org.apache.spark.sql.catalyst.plans._
     import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.internal.SQLConf
    --- End diff --
    
    Thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146152740
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
     import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
     import org.apache.spark.sql.catalyst.plans._
     import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.internal.SQLConf
    --- End diff --
    
    No need to import SQLConf now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146207532
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,28 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val aId = Array[String](attrA.name, attrA.exprId.id.toString)
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val otherAId = Array[String](otherA.name, otherA.exprId.id.toString)
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
             Nil,
    -        Alias(sum(AttributeReference("a", LongType)(exprId = ExprId(1))), "b")() :: Nil,
    -        LocalRelation(
    -          AttributeReference("a", LongType)(exprId = ExprId(2))))
    +        bAlias,
    +        LocalRelation(otherA))
     
         assert(plan.resolved)
     
    -    assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
    +    val errorMsg = s"""Some resolved attribute(s) are not present among the available attributes
    +                     |for a query.
    +                     |${aId.mkString("#")}L is not in ${otherAId.mkString("#")}L.
    --- End diff --
    
    Can you `{aId.mkString("#")}L` to `attrA.toString`? and also remove `val aId` and `val otherAId`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146128092
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,28 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val aId = Array[String](attrA.name, attrA.exprId.id.toString)
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val otherAId = Array[String](otherA.name, otherA.exprId.id.toString)
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
             Nil,
    -        Alias(sum(AttributeReference("a", LongType)(exprId = ExprId(1))), "b")() :: Nil,
    -        LocalRelation(
    -          AttributeReference("a", LongType)(exprId = ExprId(2))))
    +        bAlias,
    +        LocalRelation(otherA))
     
         assert(plan.resolved)
     
    -    assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
    +    val errorMsg = s"""Some resolved attribute(s) are not present among the available attributes
    +                     |for a query.
    +                     |${aId.mkString("#")}L is not in ${otherAId.mkString("#")}L.
    --- End diff --
    
    Yes thanks, I discovered this recently, but didn't come back to this PR. Will fix


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON] PySpark DataFrames: The error mess...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    @gatorsmile Thanks for the pointers, finally found some time to come back to this. I'm not sure if my approach to get the `SQLConf` into `checkAnalysis` is the correct one in my current local changes (since it seems to change a possible API endpoint). I changed the current implementation in the trait to be named instead `def checkAnalysisWithConf(plan: LogicalPlan, conf: SQLConf): Unit` and added an abstract method `def checkAnalysis(plan: LogicalPlan): Unit` that is then implemented in `Analyzer` (where we have a `conf` we can pass around). I haven't fixed all the rest yet, was puzzled enough with the correctness of this for now ;) Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146121699
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,28 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val aId = Array[String](attrA.name, attrA.exprId.id.toString)
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val otherAId = Array[String](otherA.name, otherA.exprId.id.toString)
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
             Nil,
    -        Alias(sum(AttributeReference("a", LongType)(exprId = ExprId(1))), "b")() :: Nil,
    -        LocalRelation(
    -          AttributeReference("a", LongType)(exprId = ExprId(2))))
    +        bAlias,
    +        LocalRelation(otherA))
     
         assert(plan.resolved)
     
    -    assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
    +    val errorMsg = s"""Some resolved attribute(s) are not present among the available attributes
    +                     |for a query.
    +                     |${aId.mkString("#")}L is not in ${otherAId.mkString("#")}L.
    --- End diff --
    
    Isn't `${aId.mkString("#")}L` just `attrA.toString`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] PySpark DataFrames: The error ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146127775
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +273,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = conf.resolver
    +            val commonAttrs = o.missingInput.filter(x =>
    --- End diff --
    
    Not quite common attrs actually. Maybe `attrsWithSameName`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146177053
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    --- End diff --
    
    Original error msg looks good to me, actually. But this change seems ok to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82979/testReport)** for PR 17100 at commit [`72b72cf`](https://github.com/apache/spark/commit/72b72cfb0fecc169e0bc289f7f319bdb2b947de5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82994/testReport)** for PR 17100 at commit [`34753b5`](https://github.com/apache/spark/commit/34753b513323f5076edd4c5006983a9c9d3d97d7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83020/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    To detect duplicate names, you need to pass SQLConf `conf` in `checkAnalysis` for supporting the case sensitivity in name resolution. For each missing attribute, you can try using `conf.resolver` to find whether there exists a column from `o.inputSet` whose name is the same. After detecting the duplicate names, you can output the message to say the missing attributes are from a Dataset different from the input set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Can you remove `[PYTHON]` from the PR title? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82950/testReport)** for PR 17100 at commit [`6bbe496`](https://github.com/apache/spark/commit/6bbe496f9d8bf48953ba15518d5f80f0b4140ed5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #77502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77502/testReport)** for PR 17100 at commit [`0cb9825`](https://github.com/apache/spark/commit/0cb9825a1cfebe67f08ce344e67a06c53e4a78be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146208143
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ---
    @@ -408,16 +408,28 @@ class AnalysisErrorSuite extends AnalysisTest {
         // CheckAnalysis should throw AnalysisException when Aggregate contains missing attribute(s)
         // Since we manually construct the logical plan at here and Sum only accept
         // LongType, DoubleType, and DecimalType. We use LongType as the type of a.
    -    val plan =
    -      Aggregate(
    +    val attrA = AttributeReference("a", LongType)(exprId = ExprId(1))
    +    val aId = Array[String](attrA.name, attrA.exprId.id.toString)
    +    val otherA = AttributeReference("a", LongType)(exprId = ExprId(2))
    +    val otherAId = Array[String](otherA.name, otherA.exprId.id.toString)
    +    val bAlias = Alias(sum(attrA), "b")() :: Nil
    +    val plan = Aggregate(
             Nil,
    -        Alias(sum(AttributeReference("a", LongType)(exprId = ExprId(1))), "b")() :: Nil,
    -        LocalRelation(
    -          AttributeReference("a", LongType)(exprId = ExprId(2))))
    +        bAlias,
    +        LocalRelation(otherA))
     
         assert(plan.resolved)
     
    -    assertAnalysisError(plan, "resolved attribute(s) a#1L missing from a#2L" :: Nil)
    +    val errorMsg = s"""Some resolved attribute(s) are not present among the available attributes
    +                     |for a query.
    +                     |${aId.mkString("#")}L is not in ${otherAId.mkString("#")}L.
    --- End diff --
    
    Oh, I did it wrongly and thus did no work (did aId.toString and got a horrible java String object view). Changing it, also regenerating the golden files which seemed to make the test fail


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Applied all changes @viirya & @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17100


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146543299
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter { missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name))
    +            }
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val sameNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""Attribute(s) with the same name appear in the operation: `$sameNames`.
    +                  |Please check if the right attribute(s) are used.""".stripMargin
    --- End diff --
    
    Personally I prefer one line message because it's an parameter of `AnalysisException`, but that's not a strong point.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146152918
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
    +            val repeatedNameHint = if (attrsWithSameName.size > 0) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""\n|Attribute(s) `$commonNames` seem to appear in two
    +                  |different datasets, with the same name."""
    +            } else {
    +              ""
    +            }
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"""Some resolved attribute(s) are not present among the available attributes
    +                |for a query.
    +                |$missingAttributes is not in $availableAttributes.$repeatedNameHint
    --- End diff --
    
    Move `$repeatedNameHint` in next line as:
    ```scala
    |$missingAttributes is not in $availableAttributes.
    |$repeatedNameHint
    ```
    So you can remove the new line above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82960/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by wzhfy <gi...@git.apache.org>.

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146153187
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +271,25 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(x =>
    +              o.inputSet.exists(y => resolver(x.name, y.name)))
    --- End diff --
    
    nit:
    ```
    val attrsWithSameName = o.missingInput.filter { missing =>
      o.inputSet.exists(input => resolver(missing.name, input.name))
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #83020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83020/testReport)** for PR 17100 at commit [`b8fbdea`](https://github.com/apache/spark/commit/b8fbdea3b35ea69e6dc74ea9c11e257ac88dc765).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] PySpark DataFrames: The error message...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82950/testReport)** for PR 17100 at commit [`6bbe496`](https://github.com/apache/spark/commit/6bbe496f9d8bf48953ba15518d5f80f0b4140ed5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17100: [SPARK-13947][SQL] The error message from using an inval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17100
  
    **[Test build #82994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82994/testReport)** for PR 17100 at commit [`34753b5`](https://github.com/apache/spark/commit/34753b513323f5076edd4c5006983a9c9d3d97d7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17100: [SPARK-13947][SQL] The error message from using a...

Posted by rberenguel <gi...@git.apache.org>.

Github user rberenguel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17100#discussion_r146211895
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -270,12 +270,27 @@ trait CheckAnalysis extends PredicateHelper {
     
             operator match {
               case o if o.children.nonEmpty && o.missingInput.nonEmpty =>
    +            val resolver = plan.conf.resolver
    +            val attrsWithSameName = o.missingInput.filter(missing =>
    +              o.inputSet.exists(input => resolver(missing.name, input.name)))
    +            val repeatedNameHint = if (attrsWithSameName.nonEmpty) {
    +              val commonNames = attrsWithSameName.map(_.name).mkString(",")
    +              s"""|Please check attribute(s) `$commonNames`, they seem to appear in two
    +                  |different input operators, with the same name.""".stripMargin
    +            } else {
    +              ""
    +            }
    +
                 val missingAttributes = o.missingInput.mkString(",")
    -            val input = o.inputSet.mkString(",")
    +            val availableAttributes = o.inputSet.mkString(",")
     
                 failAnalysis(
    -              s"resolved attribute(s) $missingAttributes missing from $input " +
    -                s"in operator ${operator.simpleString}")
    +              s"""Some resolved attribute(s) are not present among the available attributes
    +                |for a query.
    +                |$missingAttributes is not in $availableAttributes.
    +                |$repeatedNameHint
    +                |The failed query was for operator
    +                |${operator.simpleString}""".stripMargin)
    --- End diff --
    
    This is good for me, having the `repeatedNameHint` is the whole point, at the end. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org