You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by tejasapatil <gi...@git.apache.org> on 2017/02/25 02:34:37 UTC

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

GitHub user tejasapatil opened a pull request:

    https://github.com/apache/spark/pull/17062

    [SPARK-17495] [SQL] Support date, timestamp and interval types in Hive hash

    ## What changes were proposed in this pull request?
    
    - Timestamp hashing is done as per [TimestampWritable.hashCode()](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java#L406) in Hive
    - Interval hashing is done as per [HiveIntervalDayTime.hashCode()](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveIntervalDayTime.java#L178). Note that there are inherent differences in how Hive and Spark store intervals under the hood which limits the ability to be in completely sync with hive's hashing function. I have explained this in the method doc.
    - Date type was already supported. This PR adds test for that.
    
    ## How was this patch tested?
    
    Added unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tejasapatil/spark SPARK-17495_time_related_types

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17062.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17062
    
----
commit cc359fc45547b7ba3fd4c1d11d3dcfbaf71ea66a
Author: Tejas Patil <te...@fb.com>
Date:   2017-02-25T00:18:03Z

    [SPARK-17495] [SQL] Support date, timestamp datatypes in Hive hash

commit 332475c1641f61080aa41dda9f1ceec237351d75
Author: Tejas Patil <te...@fb.com>
Date:   2017-02-25T02:23:41Z

    minor refac

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r104282564
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForTimestampType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    checkHiveHashForTimestampType("interval 1 day", 3220073)
    +    checkHiveHashForTimestampType("interval 6 day 15 hour", 21202073)
    +    checkHiveHashForTimestampType("interval -23 day 56 hour -1111113 minute 9898989 second",
    +      -2128468593)
    --- End diff --
    
    Coud you add more test cases?
    
    ```
        checkHiveHashForTimestampType("interval 0 day 0 hour 0 minute 0 second", 23273)
        checkHiveHashForTimestampType("interval 0 day 0 hour", 23273)
        checkHiveHashForTimestampType("interval -1 day", 3220036)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #73459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73459/testReport)** for PR 17062 at commit [`332475c`](https://github.com/apache/spark/commit/332475c1641f61080aa41dda9f1ceec237351d75).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r103357588
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForTimestampType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    checkHiveHashForTimestampType("interval 1 day", 3220073)
    +    checkHiveHashForTimestampType("interval 6 day 15 hour", 21202073)
    +    checkHiveHashForTimestampType("interval -23 day 56 hour -1111113 minute 9898989 second",
    --- End diff --
    
     SELECT HASH ( INTERVAL '-23' DAY + INTERVAL '56' HOUR + INTERVAL '-1111113' MINUTE + INTERVAL '9898989' SECOND );


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    @tejasapatil Thanks for your work! Could you add a comment in the `hash` function? The caller of `hash` needs to check the validity of input values. 
    
    LGTM pending test. 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r103300013
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    --- End diff --
    
    Spark does not allow creating `Date` which do not fit its spec and throws exception. Hive will not fail but fallback to `null` and return `0` as hash value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74414/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    @gatorsmile : Thanks for the review :) Added method doc for `hash()` with the comment as suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r105261953
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForTimestampType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    checkHiveHashForTimestampType("interval 1 day", 3220073)
    +    checkHiveHashForTimestampType("interval 6 day 15 hour", 21202073)
    +    checkHiveHashForTimestampType("interval -23 day 56 hour -1111113 minute 9898989 second",
    +      -2128468593)
    --- End diff --
    
    added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r103300592
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    --- End diff --
    
    same as `Date`, invalid timestamp values are not allowed in Spark and it will fail. Hive will not fail but fallback to `null` and return `0` as hash value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17062


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #74278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74278/testReport)** for PR 17062 at commit [`fd0330d`](https://github.com/apache/spark/commit/fd0330d09551b69770a9fc586145b5547d3d44dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73459/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r105265211
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th is a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    --- End diff --
    
    Hive queries for all the tests below. Outputs are generated by running against Hive-1.2.1
    
    ```
    // ----- MICROSEC -----
    SELECT HASH(interval_day_time("0 0:0:0.000001") );
    SELECT HASH(interval_day_time("-0 0:0:0.000001") );
    SELECT HASH(interval_day_time("0 0:0:0.000000") );
    SELECT HASH(interval_day_time("0 0:0:0.000999") );
    SELECT HASH(interval_day_time("-0 0:0:0.000999") );
    
    // ----- MILLISEC -----
    SELECT HASH(interval_day_time("0 0:0:0.001") );
    SELECT HASH(interval_day_time("-0 0:0:0.001") );
    SELECT HASH(interval_day_time("0 0:0:0.000") );
    SELECT HASH(interval_day_time("0 0:0:0.999") );
    SELECT HASH(interval_day_time("-0 0:0:0.999") );
    
    // ----- SECOND -----
    SELECT HASH( INTERVAL '1' SECOND);
    SELECT HASH( INTERVAL '-1' SECOND);
    SELECT HASH( INTERVAL '0' SECOND);
    SELECT HASH( INTERVAL '2147483647' SECOND);
    SELECT HASH( INTERVAL '-2147483648' SECOND);
    
    // ----- MINUTE -----
    SELECT HASH( INTERVAL '1' MINUTE);
    SELECT HASH( INTERVAL '-1' MINUTE);
    SELECT HASH( INTERVAL '0' MINUTE);
    SELECT HASH( INTERVAL '2147483647' MINUTE);
    SELECT HASH( INTERVAL '-2147483648' MINUTE);
    
    // ----- HOUR -----
    SELECT HASH( INTERVAL '1' HOUR);
    SELECT HASH( INTERVAL '-1' HOUR);
    SELECT HASH( INTERVAL '0' HOUR);
    SELECT HASH( INTERVAL '2147483647' HOUR);
    SELECT HASH( INTERVAL '-2147483648' HOUR);
    
    // ----- DAY -----
    SELECT HASH( INTERVAL '1' DAY);
    SELECT HASH( INTERVAL '-1' DAY);
    SELECT HASH( INTERVAL '0' DAY);
    SELECT HASH( INTERVAL '106751991' DAY);
    SELECT HASH( INTERVAL '-106751991' DAY);
    
    // ----- MIX -----
    SELECT HASH( INTERVAL '0' DAY );
    SELECT HASH( INTERVAL '0' DAY + INTERVAL '0' HOUR );
    SELECT HASH( INTERVAL '0' DAY + INTERVAL '0' HOUR + INTERVAL '0' MINUTE);
    SELECT HASH( INTERVAL '0' DAY + INTERVAL '0' HOUR + INTERVAL '0' MINUTE + INTERVAL '0' SECOND);
    SELECT HASH(interval_day_time("0 0:0:0.000") );
    SELECT HASH(interval_day_time("0 0:0:0.000000") );
    
    SELECT HASH( INTERVAL '6' DAY + INTERVAL '15' HOUR );
    SELECT HASH( INTERVAL '5' DAY + INTERVAL '4' HOUR + INTERVAL '8' MINUTE);
    SELECT HASH ( INTERVAL '-23' DAY + INTERVAL '56' HOUR + INTERVAL '-1111113' MINUTE + INTERVAL '9898989' SECOND );
    SELECT HASH(interval_day_time("66 12:39:23.987") );
    SELECT HASH(interval_day_time("66 12:39:23.987123") );
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73472/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74416/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Updated comments with the corresponding hive queries used to generate the expected outputs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #73459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73459/testReport)** for PR 17062 at commit [`332475c`](https://github.com/apache/spark/commit/332475c1641f61080aa41dda9f1ceec237351d75).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #74414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74414/testReport)** for PR 17062 at commit [`332686a`](https://github.com/apache/spark/commit/332686acd902c2b05bf48b848ece0860de172355).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r103300293
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    --- End diff --
    
    Corresponding hive query.
    ```
    select HASH(CAST("2017-02-24 10:56:29" AS TIMESTAMP));
    ```
    
    Note that this is with system's timezone set to UTC (export TZ=/usr/share/zoneinfo/UTC)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #73472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73472/testReport)** for PR 17062 at commit [`e050a50`](https://github.com/apache/spark/commit/e050a5000d0bb4620d9eb7890da5206e4e36ce09).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Will review it tonight. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r103281696
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    --- End diff --
    
    expected values computed over hive 1.2. using:
    
    ```
    SELECT HASH( CAST( "2017-01-01" AS DATE) )
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r103357472
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForTimestampType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    checkHiveHashForTimestampType("interval 1 day", 3220073)
    +    checkHiveHashForTimestampType("interval 6 day 15 hour", 21202073)
    --- End diff --
    
    SELECT HASH ( INTERVAL '1' DAY + INTERVAL '15' HOUR );


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r105569790
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th is a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForIntervalType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    // ----- MICROSEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 microsecond", 24273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 microsecond", 22273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 microsecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
    +    checkHiveHashForIntervalType("interval -999 microsecond", -975727)
    +
    +    // ----- MILLISEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 millisecond", -976727)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 millisecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 millisecond", 999023273)
    +    checkHiveHashForIntervalType("interval -999 millisecond", -998976727)
    +
    +    // ----- SECOND -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 second", 23310)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 second", 23273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 second", 23273)
    +    checkHiveHashForIntervalType("interval 2147483647 second", -2147460412)
    +    checkHiveHashForIntervalType("interval -2147483648 second", -2147460412)
    +
    +    // Out of range for both Hive and Spark
    +    // Hive throws an exception. Spark overflows and returns wrong output
    +    // checkHiveHashForIntervalType("interval 9999999999 day", -4767228)
    --- End diff --
    
    Should we fix it before this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r105570229
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th is a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForIntervalType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    // ----- MICROSEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 microsecond", 24273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 microsecond", 22273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 microsecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
    +    checkHiveHashForIntervalType("interval -999 microsecond", -975727)
    +
    +    // ----- MILLISEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 millisecond", -976727)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 millisecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 millisecond", 999023273)
    +    checkHiveHashForIntervalType("interval -999 millisecond", -998976727)
    +
    +    // ----- SECOND -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 second", 23310)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 second", 23273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 second", 23273)
    +    checkHiveHashForIntervalType("interval 2147483647 second", -2147460412)
    +    checkHiveHashForIntervalType("interval -2147483648 second", -2147460412)
    +
    +    // Out of range for both Hive and Spark
    +    // Hive throws an exception. Spark overflows and returns wrong output
    +    // checkHiveHashForIntervalType("interval 9999999999 day", -4767228)
    +
    +    // ----- MINUTE -----
    +
    +    // basic cases
    +    checkHiveHashForIntervalType("interval 1 minute", 25493)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 minute", 25456)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 minute", 23273)
    +    checkHiveHashForIntervalType("interval 2147483647 minute", 21830)
    +    checkHiveHashForIntervalType("interval -2147483648 minute", 22163)
    +
    +    // Out of range for both Hive and Spark
    +    // Hive throws an exception. Spark overflows and returns wrong output
    +    // checkHiveHashForIntervalType("interval 9999999999 day", -4767228)
    --- End diff --
    
    The unit sounds incorrect. The same to the other case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    It sounds like no test case covers nanosecond for INTERVAL 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #74414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74414/testReport)** for PR 17062 at commit [`332686a`](https://github.com/apache/spark/commit/332686acd902c2b05bf48b848ece0860de172355).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r105570430
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th is a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForIntervalType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    // ----- MICROSEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 microsecond", 24273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 microsecond", 22273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 microsecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
    +    checkHiveHashForIntervalType("interval -999 microsecond", -975727)
    +
    +    // ----- MILLISEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 millisecond", -976727)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 millisecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 millisecond", 999023273)
    +    checkHiveHashForIntervalType("interval -999 millisecond", -998976727)
    +
    +    // ----- SECOND -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 second", 23310)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 second", 23273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 second", 23273)
    +    checkHiveHashForIntervalType("interval 2147483647 second", -2147460412)
    +    checkHiveHashForIntervalType("interval -2147483648 second", -2147460412)
    +
    +    // Out of range for both Hive and Spark
    +    // Hive throws an exception. Spark overflows and returns wrong output
    +    // checkHiveHashForIntervalType("interval 9999999999 day", -4767228)
    --- End diff --
    
    In case of Spark SQL, the query with fails with exception (see below). However for the test case since I am by-passing and creating raw interval object which does not go through that check
    
    ```
    scala> hc.sql("SELECT interval 9999999999 day ").show
    org.apache.spark.sql.catalyst.parser.ParseException:
    Error parsing interval string: day 9999999999 outside range [-106751991, 106751991](line 1, pos 16)
    
    == SQL ==
    SELECT interval 9999999999 day
    ```
    
    ```
    scala> df.select("INTERVAL 9999999999 day").show()
    org.apache.spark.sql.AnalysisException: cannot resolve '`INTERVAL 9999999999 day`' given input columns: [key, value];;
    'Project ['INTERVAL 9999999999 day]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r105261981
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
    @@ -732,6 +741,38 @@ object HiveHashFunction extends InterpretedHashFunction {
         HiveHasher.hashUnsafeBytes(base, offset, len)
       }
     
    +  /**
    +   * Mimics TimestampWritable.hashCode() in Hive
    +   */
    +  def hashTimestamp(timestamp: Long): Long = {
    +    val timestampInSeconds = timestamp / 1000000
    +    val nanoSecondsPortion = (timestamp % 1000000) * 1000
    +
    +    var result = timestampInSeconds
    +    result <<= 30 // the nanosecond part fits in 30 bits
    +    result |= nanoSecondsPortion
    +    ((result >>> 32) ^ result).toInt
    +  }
    +
    +  /**
    +   * Hive allows input intervals to be defined using units below but the intervals
    +   * have to be from the same category:
    +   * - year, month (stored as HiveIntervalYearMonth)
    +   * - day, hour, minute, second, nanosecond (stored as HiveIntervalDayTime)
    +   *
    +   * eg. (INTERVAL '30' YEAR + INTERVAL '-23' DAY) fails in Hive
    +   *
    +   * This method mimics HiveIntervalDayTime.hashCode() in Hive. If the `INTERVAL` is backed as
    +   * HiveIntervalYearMonth in Hive, then this method will not produce Hive compatible result.
    +   * The reason being Spark's representation of calendar does not have such categories based on
    +   * the interval and is unified.
    +   */
    +  def hashCalendarInterval(calendarInterval: CalendarInterval): Long = {
    +    val totalSeconds = calendarInterval.milliseconds() / 1000
    --- End diff --
    
    Spark's CalendarInterval has precision upto microseconds while Hive can have precision upto nanoseconds. So, there is no way for us to support that in the hashing function. I have documented this in the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r105570634
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th is a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForIntervalType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    // ----- MICROSEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 microsecond", 24273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 microsecond", 22273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 microsecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
    +    checkHiveHashForIntervalType("interval -999 microsecond", -975727)
    +
    +    // ----- MILLISEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 millisecond", -976727)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 millisecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 millisecond", 999023273)
    +    checkHiveHashForIntervalType("interval -999 millisecond", -998976727)
    +
    +    // ----- SECOND -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 second", 23310)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 second", 23273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 second", 23273)
    +    checkHiveHashForIntervalType("interval 2147483647 second", -2147460412)
    +    checkHiveHashForIntervalType("interval -2147483648 second", -2147460412)
    +
    +    // Out of range for both Hive and Spark
    +    // Hive throws an exception. Spark overflows and returns wrong output
    +    // checkHiveHashForIntervalType("interval 9999999999 day", -4767228)
    +
    +    // ----- MINUTE -----
    +
    +    // basic cases
    +    checkHiveHashForIntervalType("interval 1 minute", 25493)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 minute", 25456)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 minute", 23273)
    +    checkHiveHashForIntervalType("interval 2147483647 minute", 21830)
    +    checkHiveHashForIntervalType("interval -2147483648 minute", 22163)
    +
    +    // Out of range for both Hive and Spark
    +    // Hive throws an exception. Spark overflows and returns wrong output
    +    // checkHiveHashForIntervalType("interval 9999999999 day", -4767228)
    --- End diff --
    
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74278/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    @gatorsmile : can you please review this PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r103357272
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala ---
    @@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 0:0:0", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 99:99:45", 0))
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this but Spark doesn't
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but Spark uses upto 6
    +    intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForTimestampType(interval: String, expected: Long): Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), CalendarIntervalType, expected)
    +    }
    +
    +    checkHiveHashForTimestampType("interval 1 day", 3220073)
    --- End diff --
    
    SELECT HASH ( INTERVAL '1' DAY );


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #74416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74416/testReport)** for PR 17062 at commit [`8a5f200`](https://github.com/apache/spark/commit/8a5f200427d616913ea506e34a11a5a698a20c8a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #74416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74416/testReport)** for PR 17062 at commit [`8a5f200`](https://github.com/apache/spark/commit/8a5f200427d616913ea506e34a11a5a698a20c8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    I did the same check. The results of Hive 2.0 exactly match the hard-coded values. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #74278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74278/testReport)** for PR 17062 at commit [`fd0330d`](https://github.com/apache/spark/commit/fd0330d09551b69770a9fc586145b5547d3d44dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17062
  
    **[Test build #73472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73472/testReport)** for PR 17062 at commit [`e050a50`](https://github.com/apache/spark/commit/e050a5000d0bb4620d9eb7890da5206e4e36ce09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r104282934
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala ---
    @@ -732,6 +741,38 @@ object HiveHashFunction extends InterpretedHashFunction {
         HiveHasher.hashUnsafeBytes(base, offset, len)
       }
     
    +  /**
    +   * Mimics TimestampWritable.hashCode() in Hive
    +   */
    +  def hashTimestamp(timestamp: Long): Long = {
    +    val timestampInSeconds = timestamp / 1000000
    +    val nanoSecondsPortion = (timestamp % 1000000) * 1000
    +
    +    var result = timestampInSeconds
    +    result <<= 30 // the nanosecond part fits in 30 bits
    +    result |= nanoSecondsPortion
    +    ((result >>> 32) ^ result).toInt
    +  }
    +
    +  /**
    +   * Hive allows input intervals to be defined using units below but the intervals
    +   * have to be from the same category:
    +   * - year, month (stored as HiveIntervalYearMonth)
    +   * - day, hour, minute, second, nanosecond (stored as HiveIntervalDayTime)
    +   *
    +   * eg. (INTERVAL '30' YEAR + INTERVAL '-23' DAY) fails in Hive
    +   *
    +   * This method mimics HiveIntervalDayTime.hashCode() in Hive. If the `INTERVAL` is backed as
    +   * HiveIntervalYearMonth in Hive, then this method will not produce Hive compatible result.
    +   * The reason being Spark's representation of calendar does not have such categories based on
    +   * the interval and is unified.
    +   */
    +  def hashCalendarInterval(calendarInterval: CalendarInterval): Long = {
    +    val totalSeconds = calendarInterval.milliseconds() / 1000
    --- End diff --
    
    How does Hive deal with nanoseconds, if we divide it by 1000? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org