You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2014/07/24 02:50:30 UTC

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/1557

    [SPARK-2659][SQL] Fix division semantics for hive

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark fixDivision

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1557
    
----
commit 0c29ae8eeed1a657c192e9d2b989df7c5e82fe9d
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-07-24T00:48:57Z

    Fix division semantics for hive

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-49955634
  
    QA tests have started for PR 1557. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17076/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-49959903
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1557#discussion_r15326651
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -306,6 +307,23 @@ trait HiveTypeCoercion {
       }
     
       /**
    +   * Hive only performs integral division with the DIV operator. The arguments to / are always
    +   * converted to fractional types.
    +   */
    +  object Division extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +      // Skip nodes who's children have not been resolved yet.
    +      case e if !e.childrenResolved => e
    +
    +      // Decimal and Double remain the same
    +      case d: Divide if d.dataType == DoubleType => d
    +      case d: Divide if d.dataType == DecimalType => d
    --- End diff --
    
    Also add `case d: Divide if d.dataType == FloatType => d`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-50058832
  
    QA tests have started for PR 1557. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17126/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1557#discussion_r15326955
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -306,6 +307,23 @@ trait HiveTypeCoercion {
       }
     
       /**
    +   * Hive only performs integral division with the DIV operator. The arguments to / are always
    +   * converted to fractional types.
    +   */
    +  object Division extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    +      // Skip nodes who's children have not been resolved yet.
    +      case e if !e.childrenResolved => e
    +
    +      // Decimal and Double remain the same
    +      case d: Divide if d.dataType == DoubleType => d
    +      case d: Divide if d.dataType == DecimalType => d
    --- End diff --
    
    Please ignore my comment. I think we should not add `case d: Divide if d.dataType == FloatType => d`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-49956150
  
    When you run ""DESCRIBE FUNCTION div"" in Hive, the returned message is `a div b - Divide a by b rounded to the long integer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-49959379
  
    We already whitelist [udf_div](https://github.com/marmbrus/spark/blob/fixDivision/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala#L733)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-49961214
  
    QA results for PR 1557:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17076/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1557


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-50220680
  
    Thanks for reviewing! Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-49959260
  
    There is a hive unit test called `udf_div.q` and it is very simple. The content is...
    ```
    DESCRIBE FUNCTION div;
    DESCRIBE FUNCTION EXTENDED div;
    
    SELECT 3 DIV 2 FROM SRC LIMIT 1;
    ```
    I think we can ignore it. But, to keep track how many hive unit tests we can pass, maybe it is better to record it in `HiveCompatibilitySuite`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2659][SQL] Fix division semantics for h...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1557#issuecomment-50070956
  
    QA results for PR 1557:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17126/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---