You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/01/30 11:40:51 UTC

[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20438

    [SPARK-23272][SQL] add calendar interval type support to ColumnVector

    ## What changes were proposed in this pull request?
    
    `ColumnVector` is aimed to support all the data types, but `CalendarIntervalType` is missing. Actually we do support interval type for inner fields, e.g. `ColumnarRow`, `ColumnarArray` both supports interval type, it's weird if we don't support interval type at the top level.
    
    This PR adds the interval type support.
    
    ## How was this patch tested?
    
    a new test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark interval

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20438.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20438
    
----
commit 4e538913901de6cb439e9919e958a60ab698fe24
Author: Wenchen Fan <we...@...>
Date:   2018-01-30T11:30:34Z

    add calendar interval type support to ColumnVector

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    **[Test build #86825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86825/testReport)** for PR 20438 at commit [`2f23a1d`](https://github.com/apache/spark/commit/2f23a1d4a6f6968b1c1209b94ca340ca25cc67e1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164802008
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -236,9 +238,29 @@ public MapData getMap(int ordinal) {
       public abstract byte[] getBinary(int rowId);
     
       /**
    -   * Returns the ordinal's child column vector.
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. An interval type vector is the same as a struct type vector with 2 fields: `months`
    +   * and `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    --- End diff --
    
    It's a little annoying to type `calendar interval type` all the time...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164769822
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -236,9 +238,29 @@ public MapData getMap(int ordinal) {
       public abstract byte[] getBinary(int rowId);
     
       /**
    -   * Returns the ordinal's child column vector.
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. An interval type vector is the same as a struct type vector with 2 fields: `months`
    +   * and `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    --- End diff --
    
    nit: `interval type` -> `calender interval type`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164719793
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java ---
    @@ -135,9 +135,7 @@ public UTF8String getUTF8String(int ordinal) {
     
       @Override
       public CalendarInterval getInterval(int ordinal) {
    -    int month = data.getChild(0).getInt(offset + ordinal);
    -    long microseconds = data.getChild(1).getLong(offset + ordinal);
    -    return new CalendarInterval(month, microseconds);
    +    return data.getInterval(offset + ordinal);
    --- End diff --
    
    We should insert `if (data.isNullAt(offset + ordinal)) return null;` to be consistent with other `ColumnarXxx`s?
    Or I guess we can remove these null-checks from all other `ColumnarXxx`s and leave it to `ColumnVector.getInterval()`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86825/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/381/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164729429
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -235,10 +237,30 @@ public MapData getMap(int ordinal) {
        */
       public abstract byte[] getBinary(int rowId);
     
    +  /**
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. A interval type vector is same as a struct type vector with 2 fields: `months` and
    +   * `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    +   * child vectors: the first child vector is a int type vector, containing all the month values of
    +   * all the interval values in this vector. The second child vector is a long type vector,
    +   * containing all the microsecond values of all the interval values in this vector.
    +   */
    +  public final CalendarInterval getInterval(int rowId) {
    +    if (isNullAt(rowId)) return null;
    +    final int months = getChild(0).getInt(rowId);
    --- End diff --
    
    What's the purpose of `final` keyword here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164944493
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnarRow.java ---
    @@ -146,9 +146,7 @@ public UTF8String getUTF8String(int ordinal) {
       @Override
       public CalendarInterval getInterval(int ordinal) {
         if (columns[ordinal].isNullAt(rowId)) return null;
    --- End diff --
    
    see https://github.com/apache/spark/pull/20438#discussion_r164719793 . In this PR I just fixed the returning null issue for `getStruct` and `getInterval`, because they are non-abstract. There should be a follow up to clearly document that `ColumnVector.getBinary/getUTF8String/...` should return null if this slot is null. Then we can remove these null checks here. I appreciate it if you have time to take this :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    thanks, merging to master/2.3!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164729250
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -235,10 +237,30 @@ public MapData getMap(int ordinal) {
        */
       public abstract byte[] getBinary(int rowId);
     
    +  /**
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. A interval type vector is same as a struct type vector with 2 fields: `months` and
    +   * `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    +   * child vectors: the first child vector is a int type vector, containing all the month values of
    --- End diff --
    
    is **an** int type vector


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164729086
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -235,10 +237,30 @@ public MapData getMap(int ordinal) {
        */
       public abstract byte[] getBinary(int rowId);
     
    +  /**
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. A interval type vector is same as a struct type vector with 2 fields: `months` and
    --- End diff --
    
    "**An** interval" + "is **the** same as"


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164943754
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnarRow.java ---
    @@ -146,9 +146,7 @@ public UTF8String getUTF8String(int ordinal) {
       @Override
       public CalendarInterval getInterval(int ordinal) {
         if (columns[ordinal].isNullAt(rowId)) return null;
    --- End diff --
    
    Do we still need this null check?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164717772
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -195,6 +196,7 @@
        * struct field.
        */
       public final ColumnarRow getStruct(int rowId) {
    +    if (isNullAt(rowId)) return null;
    --- End diff --
    
    Good catch!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164943981
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java ---
    @@ -139,9 +139,7 @@ public UTF8String getUTF8String(int ordinal) {
       @Override
       public CalendarInterval getInterval(int ordinal) {
         if (data.getChild(ordinal).isNullAt(rowId)) return null;
    --- End diff --
    
    Can we remove this null check now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    **[Test build #86819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86819/testReport)** for PR 20438 at commit [`4e53891`](https://github.com/apache/spark/commit/4e538913901de6cb439e9919e958a60ab698fe24).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    **[Test build #86819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86819/testReport)** for PR 20438 at commit [`4e53891`](https://github.com/apache/spark/commit/4e538913901de6cb439e9919e958a60ab698fe24).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164806777
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -236,9 +238,29 @@ public MapData getMap(int ordinal) {
       public abstract byte[] getBinary(int rowId);
     
       /**
    -   * Returns the ordinal's child column vector.
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. An interval type vector is the same as a struct type vector with 2 fields: `months`
    +   * and `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    +   * child vectors: the first child vector is an int type vector, containing all the month values of
    +   * all the interval values in this vector. The second child vector is a long type vector,
    +   * containing all the microsecond values of all the interval values in this vector.
    +   */
    +  public final CalendarInterval getInterval(int rowId) {
    +    if (isNullAt(rowId)) return null;
    +    final int months = getChild(0).getInt(rowId);
    +    final long microseconds = getChild(1).getLong(rowId);
    +    return new CalendarInterval(months, microseconds);
    +  }
    +
    +  /**
    +   * @return child [[ColumnVector]] at the given ordinal.
        */
    -  public abstract ColumnVector getChild(int ordinal);
    +  protected abstract ColumnVector getChild(int ordinal);
    --- End diff --
    
    Oh, I see. Now, it became `protected`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164729684
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -235,10 +237,30 @@ public MapData getMap(int ordinal) {
        */
       public abstract byte[] getBinary(int rowId);
     
    +  /**
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. A interval type vector is same as a struct type vector with 2 fields: `months` and
    +   * `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    +   * child vectors: the first child vector is a int type vector, containing all the month values of
    +   * all the interval values in this vector. The second child vector is a long type vector,
    +   * containing all the microsecond values of all the interval values in this vector.
    +   */
    +  public final CalendarInterval getInterval(int rowId) {
    +    if (isNullAt(rowId)) return null;
    +    final int months = getChild(0).getInt(rowId);
    +    final long microseconds = getChild(1).getLong(rowId);
    +    return new CalendarInterval(months, microseconds);
    +  }
    +
       /**
        * Returns the ordinal's child column vector.
    --- End diff --
    
    `@return [[ColumnVector]] at the ordinal`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/386/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86819/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164730300
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java ---
    @@ -135,9 +135,7 @@ public UTF8String getUTF8String(int ordinal) {
     
       @Override
       public CalendarInterval getInterval(int ordinal) {
    -    int month = data.getChild(0).getInt(offset + ordinal);
    -    long microseconds = data.getChild(1).getLong(offset + ordinal);
    -    return new CalendarInterval(month, microseconds);
    +    return data.getInterval(offset + ordinal);
    --- End diff --
    
    Good catch! I'm going to require `ColumnVector.getXXX` to return null if the value is null, but I'll do it in another PR, to update all the documents and define the behavior of batched `getXXX` methods.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    cc @sameeragarwal @viirya @kiszk @ueshin @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164728359
  
    --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java ---
    @@ -87,7 +87,7 @@ public static CalendarInterval fromString(String s) {
         }
       }
     
    -  public static long toLongWithRange(String fieldName,
    +  private static long toLongWithRange(String fieldName,
    --- End diff --
    
    Why?! It's much harder (if at all possible) to test `private` methods (been bitten few times this week and remember the pain).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164807808
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -236,9 +238,29 @@ public MapData getMap(int ordinal) {
       public abstract byte[] getBinary(int rowId);
     
       /**
    -   * Returns the ordinal's child column vector.
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. An interval type vector is the same as a struct type vector with 2 fields: `months`
    +   * and `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    +   * child vectors: the first child vector is an int type vector, containing all the month values of
    +   * all the interval values in this vector. The second child vector is a long type vector,
    +   * containing all the microsecond values of all the interval values in this vector.
    +   */
    +  public final CalendarInterval getInterval(int rowId) {
    +    if (isNullAt(rowId)) return null;
    +    final int months = getChild(0).getInt(rowId);
    +    final long microseconds = getChild(1).getLong(rowId);
    +    return new CalendarInterval(months, microseconds);
    +  }
    +
    +  /**
    +   * @return child [[ColumnVector]] at the given ordinal.
        */
    -  public abstract ColumnVector getChild(int ordinal);
    +  protected abstract ColumnVector getChild(int ordinal);
    --- End diff --
    
    Since `ColumnVector` is public, could you add some description in PR description for this kind of visibility change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164942783
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -236,9 +238,29 @@ public MapData getMap(int ordinal) {
       public abstract byte[] getBinary(int rowId);
     
       /**
    -   * Returns the ordinal's child column vector.
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. An interval type vector is the same as a struct type vector with 2 fields: `months`
    +   * and `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    +   * child vectors: the first child vector is an int type vector, containing all the month values of
    +   * all the interval values in this vector. The second child vector is a long type vector,
    +   * containing all the microsecond values of all the interval values in this vector.
    +   */
    +  public final CalendarInterval getInterval(int rowId) {
    +    if (isNullAt(rowId)) return null;
    +    final int months = getChild(0).getInt(rowId);
    +    final long microseconds = getChild(1).getLong(rowId);
    +    return new CalendarInterval(months, microseconds);
    +  }
    +
    +  /**
    +   * @return child [[ColumnVector]] at the given ordinal.
        */
    -  public abstract ColumnVector getChild(int ordinal);
    +  protected abstract ColumnVector getChild(int ordinal);
    --- End diff --
    
    added


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20438: [SPARK-23272][SQL] add calendar interval type support to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20438
  
    **[Test build #86825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86825/testReport)** for PR 20438 at commit [`2f23a1d`](https://github.com/apache/spark/commit/2f23a1d4a6f6968b1c1209b94ca340ca25cc67e1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164801042
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -235,10 +237,30 @@ public MapData getMap(int ordinal) {
        */
       public abstract byte[] getBinary(int rowId);
     
    +  /**
    +   * Returns the calendar interval type value for rowId.
    +   *
    +   * In Spark, calendar interval type value is basically an integer value representing the number of
    +   * months in this interval, and a long value representing the number of microseconds in this
    +   * interval. A interval type vector is same as a struct type vector with 2 fields: `months` and
    +   * `microseconds`.
    +   *
    +   * To support interval type, implementations must implement {@link #getChild(int)} and define 2
    +   * child vectors: the first child vector is a int type vector, containing all the month values of
    +   * all the interval values in this vector. The second child vector is a long type vector,
    +   * containing all the microsecond values of all the interval values in this vector.
    +   */
    +  public final CalendarInterval getInterval(int rowId) {
    +    if (isNullAt(rowId)) return null;
    +    final int months = getChild(0).getInt(rowId);
    --- End diff --
    
    It's from the previous code, probably it tries to make the JVM happy and run the code faster.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20438


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20438: [SPARK-23272][SQL] add calendar interval type sup...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20438#discussion_r164971509
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/MutableColumnarRow.java ---
    @@ -146,9 +146,7 @@ public UTF8String getUTF8String(int ordinal) {
       @Override
       public CalendarInterval getInterval(int ordinal) {
         if (columns[ordinal].isNullAt(rowId)) return null;
    --- End diff --
    
    Let me try to prepare a PR tonight.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org