You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2015/04/30 04:36:36 UTC

[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/5798

    [SPARK-7269] [SQL] Incorrect analysis for aggregation

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark analysis

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5798.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5798
    
----
commit 1280cdadcac5730eb2763e75a76ee1eed6c12947
Author: Cheng Hao <ha...@intel.com>
Date:   2015-04-30T02:21:53Z

    Incorrect analysis for aggregation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97695741
  
    It looks to me that the ostensible reason of this failure is `groupingExprs.contains(e)` mistakenly return `false`. Why not simply change the `equals` method in `AttributeReference` to not compare `name`? The `AttributeReference.hashCode` didn't use `name` either. Sorry if I missed something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29403423
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,7 +81,7 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "B", "a", "B", "a", "B", "a", "B"),
    --- End diff --
    
    Oh, yes, I will update the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29403917
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,13 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "b", "a", "b", "a", "b", "a", "b"),
           "The output schema did not preserve the case of the query.")
    --- End diff --
    
    The output schema should be lower case. ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29411449
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,11 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "B", "a", "B", "a", "b", "A", "B"),
    --- End diff --
    
    I'm not sure what we really want here. When user `SELECT b FROM t` and `t` has a column `B`, which one should we used in the result schema? `b` or `B`? cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29407261
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala ---
    @@ -29,12 +29,23 @@ package object analysis {
     
       /**
        * Resolver should return true if the first string refers to the same entity as the second string.
    -   * For example, by using case insensitive equality.
    +   * For example, by using case insensitive equality. Besides, Resolver also provides the ability
    +   * to normalize the string according to its semantic.
        */
    -  type Resolver = (String, String) => Boolean
    +  trait Resolver {
    +    def apply(a: String, b: String): Boolean
    +    def apply(a: String): String
    +  }
    +
    +  val caseInsensitiveResolution = new Resolver {
    +    override def apply(a: String, b: String): Boolean = a.equalsIgnoreCase(b)
    +    override def apply(a: String): String = a.toLowerCase // as Hive does
    --- End diff --
    
    If we want to add this, I think we should call it normalize. Maybe change the first apply to something else in the future.
    
    I'm not sure if we need to add this though. I will let @marmbrus comment on that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97702179
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97702165
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97696378
  
    @cloud-fan I was thinking that also, but I don't think it's a good idea to override the `equals` method for a case class like that. And that's why we have the helper class `AttributeEquals`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97651498
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31372/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29412354
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,11 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "B", "a", "B", "a", "b", "A", "B"),
    --- End diff --
    
    Does that matter for a case-insensitive system? 
    But we do need keep the attribute name identical in the references chain. This is a workaround approach for the bug fixing, in long term, we probably need to refactor the AttributeReference `equality` for name (or take the Resolver in?).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29408488
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,13 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "b", "a", "b", "a", "b", "a", "b"),
           "The output schema did not preserve the case of the query.")
    --- End diff --
    
    OK, I see your point, I will keep minimize the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97637937
  
    This is a workaround for the fixing, however, in long term, we should refactor the `Resolver`, to support  the attribute name normalization (either to upper case or lower case), otherwise, we will always runs into the bug when checking the expression existence in a expression set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97662156
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97679301
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31383/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97637178
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97728970
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29406960
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,13 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "b", "a", "b", "a", "b", "a", "b"),
           "The output schema did not preserve the case of the query.")
    --- End diff --
    
    Yes I think for caseInSensitivity case we should normalize the table name and attribute name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97728964
  
      [Test build #31403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31403/consoleFull) for   PR 5798 at commit [`1f0ed92`](https://github.com/apache/spark/commit/1f0ed9236527bf1071f2cc4a5815f5f705f85dc5).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class Param[T] (val parent: Params, val name: String, val doc: String, val isValid: T => Boolean)`
      * `class DoubleParam(parent: Params, name: String, doc: String, isValid: Double => Boolean)`
      * `class IntParam(parent: Params, name: String, doc: String, isValid: Int => Boolean)`
      * `class FloatParam(parent: Params, name: String, doc: String, isValid: Float => Boolean)`
      * `class LongParam(parent: Params, name: String, doc: String, isValid: Long => Boolean)`
      * `class BooleanParam(parent: Params, name: String, doc: String) // No need for isValid`
      * `case class ParamPair[T](param: Param[T], value: T) `
      * `class KMeansModel (`
      * `trait PMMLExportable `
      * `case class Sample(`
      * `case class Sample(`
    
     * This patch **adds the following new dependencies:**
       * `jaxb-api-2.2.7.jar`
       * `jaxb-core-2.2.7.jar`
       * `jaxb-impl-2.2.7.jar`
       * `pmml-agent-1.1.15.jar`
       * `pmml-model-1.1.15.jar`
       * `pmml-schema-1.1.15.jar`
    
     * This patch **removes the following dependencies:**
       * `activation-1.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97702224
  
      [Test build #31403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31403/consoleFull) for   PR 5798 at commit [`1f0ed92`](https://github.com/apache/spark/commit/1f0ed9236527bf1071f2cc4a5815f5f705f85dc5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97728973
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31403/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29472499
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -158,7 +158,7 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging {
           resolver: Resolver,
           attribute: Attribute): Option[(Attribute, List[String])] = {
         if (resolver(attribute.name, nameParts.head)) {
    -      Option((attribute.withName(nameParts.head), nameParts.tail.toList))
    +      Option((attribute, nameParts.tail.toList))
    --- End diff --
    
    This is incorrect.  Spark SQL is case insensitive but case preserving.  This behavior is important because we interface with systems that are case sensitive (think DataFrames in python) and otherwise it is very confusing to the user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29402793
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,7 +81,7 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "B", "a", "B", "a", "B", "a", "B"),
    --- End diff --
    
    I have unit test for explain this. Actually this is a workaround for the bug fixing, and, we should normalize the attribute names during the analysis. But leave it for the further improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29406952
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala ---
    @@ -29,12 +29,23 @@ package object analysis {
     
       /**
        * Resolver should return true if the first string refers to the same entity as the second string.
    -   * For example, by using case insensitive equality.
    +   * For example, by using case insensitive equality. Besides, Resolver also provides the ability
    +   * to normalize the string according to its semantic.
        */
    -  type Resolver = (String, String) => Boolean
    +  trait Resolver {
    +    def apply(a: String, b: String): Boolean
    +    def apply(a: String): String
    +  }
    +
    +  val caseInsensitiveResolution = new Resolver {
    +    override def apply(a: String, b: String): Boolean = a.equalsIgnoreCase(b)
    +    override def apply(a: String): String = a.toLowerCase // as Hive does
    --- End diff --
    
    I'd like keep the first `apply` as it was, because I don't want to impact a lots of existed code. I agree  we should rename the second `apply` => `normalize`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29411128
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,11 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "B", "a", "B", "a", "b", "A", "B"),
    --- End diff --
    
    I'm not sure what we really want here. When user `SELECT b FROM t` and `t` has a column `B`, which one should we used in the result schema? `b` or `B`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29407019
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala ---
    @@ -29,12 +29,23 @@ package object analysis {
     
       /**
        * Resolver should return true if the first string refers to the same entity as the second string.
    -   * For example, by using case insensitive equality.
    +   * For example, by using case insensitive equality. Besides, Resolver also provides the ability
    +   * to normalize the string according to its semantic.
        */
    -  type Resolver = (String, String) => Boolean
    +  trait Resolver {
    +    def apply(a: String, b: String): Boolean
    +    def apply(a: String): String
    +  }
    +
    +  val caseInsensitiveResolution = new Resolver {
    +    override def apply(a: String, b: String): Boolean = a.equalsIgnoreCase(b)
    +    override def apply(a: String): String = a.toLowerCase // as Hive does
    --- End diff --
    
    /cc @rxin may has concern about this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29406469
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,13 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "b", "a", "b", "a", "b", "a", "b"),
           "The output schema did not preserve the case of the query.")
    --- End diff --
    
    Supporting normalization is good. However, when explicitly specifying the case in the query, should we need to preserve the case of the query, instead of normalizing it like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97662343
  
      [Test build #31383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31383/consoleFull) for   PR 5798 at commit [`c00f1ad`](https://github.com/apache/spark/commit/c00f1adda402f139b41d63fbe20dd9f1b4d6677e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97679293
  
      [Test build #31383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31383/consoleFull) for   PR 5798 at commit [`c00f1ad`](https://github.com/apache/spark/commit/c00f1adda402f139b41d63fbe20dd9f1b4d6677e).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  trait Resolver `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97975817
  
    Definitely do not change the equality function for `AttributeReference`.  I did this in an early version of catalyst and the result can be quite confusing.  `equals()` should always be exact and consider _all_ properties of a case class.
    
    Instead, use an `AttributeSet` whenever you are looking for reference `equals` or `contains` operations.  Really it would be awesome if we could add a linter rule that warned for `Seq/Set[Attribute].contains()`, since this is often incorrect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29403863
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala ---
    @@ -29,12 +29,23 @@ package object analysis {
     
       /**
        * Resolver should return true if the first string refers to the same entity as the second string.
    -   * For example, by using case insensitive equality.
    +   * For example, by using case insensitive equality. Besides, Resolver also provides the ability
    +   * to normalize the string according to its semantic.
        */
    -  type Resolver = (String, String) => Boolean
    +  trait Resolver {
    +    def apply(a: String, b: String): Boolean
    +    def apply(a: String): String
    +  }
    +
    +  val caseInsensitiveResolution = new Resolver {
    +    override def apply(a: String, b: String): Boolean = a.equalsIgnoreCase(b)
    +    override def apply(a: String): String = a.toLowerCase // as Hive does
    --- End diff --
    
    how about rename the first apply -> resolve and the second rename to normalize


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97651460
  
      [Test build #31372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31372/consoleFull) for   PR 5798 at commit [`1280cda`](https://github.com/apache/spark/commit/1280cdadcac5730eb2763e75a76ee1eed6c12947).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29401536
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,7 +81,7 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "B", "a", "B", "a", "B", "a", "B"),
    --- End diff --
    
    Why don't we preserve the case of the query? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97637192
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97651491
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97662117
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97679300
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29407552
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,13 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "b", "a", "b", "a", "b", "a", "b"),
           "The output schema did not preserve the case of the query.")
    --- End diff --
    
    It is no problem to normalize table name and column names. I meant, for the case user explicitly specifies the case in the query like this test, don't we need to preserve it in the schema of the returned df? e.g., Some databases might use case-sensitivity enabled configuration. If we connect with it through jdbc driver, may it cause problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97704895
  
    @chenghao-intel , how about changing `groupingExprs.contains(e)` to using `AttributeEquals`? Thus we don't need to touch `AttributeReference.equals`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97637651
  
      [Test build #31372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31372/consoleFull) for   PR 5798 at commit [`1280cda`](https://github.com/apache/spark/commit/1280cdadcac5730eb2763e75a76ee1eed6c12947).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5798#issuecomment-97701616
  
    Thank you for the comments, I've updated the code for preserving the attribute name. Attribute name normalization seems still require some discussion, let's keep it for the future improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29406841
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,9 +81,13 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "b", "a", "b", "a", "b", "a", "b"),
           "The output schema did not preserve the case of the query.")
    --- End diff --
    
    In Hive
    ```
    hive> create table ddDD as select Key, valUe from src;
    hive> desc extended dddd;
    OK
    key                 	string              	                    
    value               	string              	                    
    	 	 
    Detailed Table Information	Table(tableName:dddd, dbName:default, owner:hcheng, createTime:1430368423, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:string, comment:null), FieldSchema(name:value, type:string, comment:null)], location:file:/home/hcheng/warehouse/dddd, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1430368423, numRows=0, totalSize=5824, rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)	
    Time taken: 0.111 seconds, Fetched: 4 row(s)
    ```
    You will see both table name & column names are normalized (to lower case), so I think it's probably not necessary for the preservation (Normalized name is what we want, doesn't it?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7269] [SQL] Incorrect analysis for aggr...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5798#discussion_r29403253
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveResolutionSuite.scala ---
    @@ -81,7 +81,7 @@ class HiveResolutionSuite extends HiveComparisonTest {
           .toDF().registerTempTable("caseSensitivityTest")
     
         val query = sql("SELECT a, b, A, B, n.a, n.b, n.A, n.B FROM caseSensitivityTest")
    -    assert(query.schema.fields.map(_.name) === Seq("a", "b", "A", "B", "a", "b", "A", "B"),
    +    assert(query.schema.fields.map(_.name) === Seq("a", "B", "a", "B", "a", "B", "a", "B"),
    --- End diff --
    
    I meant that looks we preserve the case before, why do we now don't want to preserve it?
    This test is used to test preserving the case of the query. So if you modified it like that, the test is not meaningful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org