You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by nongli <gi...@git.apache.org> on 2016/01/28 00:46:08 UTC

[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

GitHub user nongli opened a pull request:

    https://github.com/apache/spark/pull/10961

    [SPARK-13043][SQL] Implement remaining catalyst types in ColumnarBatch.

    This includes: float, boolean, short, decimal and calendar interval.
    
    Decimal is mapped to long or byte array depending on the size and calendar
    interval is mapped to a struct of int and long.
    
    The only remaining type is map. The schema mapping is straightforward but
    we might want to revisit how we deal with this in the rest of the execution
    engine.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nongli/spark spark-13043

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10961
    
----
commit 24ca13c7f8b2ac5fbc4a9600539bb02d22b56a91
Author: Nong Li <no...@databricks.com>
Date:   2016-01-27T06:22:48Z

    [SPARK-13043][SQL] Implement remaining catalyst types in ColumnarBatch.
    
    This includes: float, boolean, short, decimal and calendar interval.
    
    Decimal is mapped to long or byte array depending on the size and calendar
    interval is mapped to a struct of int and long.
    
    The only remaining type is map. The schema mapping is straightforward but
    we might want to revisit how we deal with this in the rest of the execution
    engine.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51299383
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java ---
    @@ -59,19 +62,44 @@ public static Object toPrimitiveJavaArray(ColumnVector.Array array) {
     
       private static void appendValue(ColumnVector dst, DataType t, Object o) {
         if (o == null) {
    -      dst.appendNull();
    +      if (t instanceof CalendarIntervalType) {
    +        dst.appendStruct(true);
    +      } else {
    +        dst.appendNull();
    +      }
         } else {
    -      if (t == DataTypes.ByteType) {
    -        dst.appendByte(((Byte)o).byteValue());
    +      if (t == DataTypes.BooleanType) {
    --- End diff --
    
    No of the functions in this file is perf critical. That's largely why I didn't put them in the ColumnVector class. This needs to be codegened whenever we need to do row -> columnarbatch conversion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51232603
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -126,12 +147,25 @@ public ArrayData copy() {
                 list[i] = data.getLong(offset + i);
               }
             }
    +      } else if (dt instanceof DecimalType) {
    +        DecimalType decType = (DecimalType)dt;
    +        for (int i = 0; i < length; i++) {
    +          if (!data.getIsNull(offset + i)) {
    +            list[i] = getDecimal(i, decType.precision(), decType.scale());
    --- End diff --
    
    Can the compiler pull `decType.precision()` out of the loop?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51233270
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java ---
    @@ -59,19 +62,44 @@ public static Object toPrimitiveJavaArray(ColumnVector.Array array) {
     
       private static void appendValue(ColumnVector dst, DataType t, Object o) {
         if (o == null) {
    -      dst.appendNull();
    +      if (t instanceof CalendarIntervalType) {
    +        dst.appendStruct(true);
    +      } else {
    +        dst.appendNull();
    +      }
         } else {
    -      if (t == DataTypes.ByteType) {
    -        dst.appendByte(((Byte)o).byteValue());
    +      if (t == DataTypes.BooleanType) {
    --- End diff --
    
    Is this method used in a performance critical path or not?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51232913
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -354,6 +435,33 @@ public void reset() {
       /**
        * Sets the value at rowId to `value`.
        */
    +  public abstract void putFloat(int rowId, float value);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to value.
    +   */
    +  public abstract void putFloats(int rowId, int count, float value);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to [src + srcIndex, src + srcIndex + count)
    +   * src should contain `count` doubles written as ieee format.
    +   */
    +  public abstract void putFloats(int rowId, int count, float[] src, int srcIndex);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    +   * The data in src must be ieee formatted floats.
    --- End diff --
    
    How is the endianness?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10961#issuecomment-175917592
  
    **[Test build #50241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50241/consoleFull)** for PR 10961 at commit [`24ca13c`](https://github.com/apache/spark/commit/24ca13c7f8b2ac5fbc4a9600539bb02d22b56a91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51088303
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ---
    @@ -217,6 +269,41 @@ public final long getLong(int rowId) {
       }
     
       //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public final void putFloat(int rowId, float value) {
    +    Platform.putFloat(null, data + rowId * 4, value);
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, float value) {
    +    long offset = data + 4 * rowId;
    +    for (int i = 0; i < count; ++i, offset += 4) {
    +      Platform.putFloat(null, offset, value);
    +    }
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, float[] src, int srcIndex) {
    +    Platform.copyMemory(src, Platform.DOUBLE_ARRAY_OFFSET + srcIndex * 4,
    +        null, data + 4 * rowId, count * 4);
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, byte[] src, int srcIndex) {
    +    Platform.copyMemory(src, Platform.DOUBLE_ARRAY_OFFSET + srcIndex,
    --- End diff --
    
    if this is a problem we should fix other places too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10961#issuecomment-176938191
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51233211
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java ---
    @@ -59,19 +62,44 @@ public static Object toPrimitiveJavaArray(ColumnVector.Array array) {
     
       private static void appendValue(ColumnVector dst, DataType t, Object o) {
         if (o == null) {
    -      dst.appendNull();
    +      if (t instanceof CalendarIntervalType) {
    +        dst.appendStruct(true);
    +      } else {
    +        dst.appendNull();
    +      }
         } else {
    -      if (t == DataTypes.ByteType) {
    -        dst.appendByte(((Byte)o).byteValue());
    +      if (t == DataTypes.BooleanType) {
    --- End diff --
    
    Will a switch be better than these if/else ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51167501
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -170,7 +204,14 @@ public float getFloat(int ordinal) {
     
         @Override
         public Decimal getDecimal(int ordinal, int precision, int scale) {
    -      throw new NotImplementedException();
    +      if (precision <= Decimal.MAX_LONG_DIGITS()) {
    --- End diff --
    
    Not easily. This is implementing an existing interface. Codegen should fix this but we should revisit this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51299048
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -126,12 +147,25 @@ public ArrayData copy() {
                 list[i] = data.getLong(offset + i);
               }
             }
    +      } else if (dt instanceof DecimalType) {
    +        DecimalType decType = (DecimalType)dt;
    +        for (int i = 0; i < length; i++) {
    +          if (!data.getIsNull(offset + i)) {
    +            list[i] = getDecimal(i, decType.precision(), decType.scale());
    --- End diff --
    
    No idea. I hope so but I'll pull it out here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51232873
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -282,6 +328,21 @@ public void reset() {
       /**
        * Sets the value at rowId to `value`.
        */
    +  public abstract void putBoolean(int rowId, boolean value);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to value.
    +   */
    +  public abstract void putBooleans(int rowId, int count, boolean value);
    --- End diff --
    
    Do we need this `putBooleans(int rowId, int count, boolean[] src, int srcIndex)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51088280
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ---
    @@ -217,6 +269,41 @@ public final long getLong(int rowId) {
       }
     
       //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public final void putFloat(int rowId, float value) {
    +    Platform.putFloat(null, data + rowId * 4, value);
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, float value) {
    +    long offset = data + 4 * rowId;
    +    for (int i = 0; i < count; ++i, offset += 4) {
    +      Platform.putFloat(null, offset, value);
    +    }
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, float[] src, int srcIndex) {
    +    Platform.copyMemory(src, Platform.DOUBLE_ARRAY_OFFSET + srcIndex * 4,
    +        null, data + 4 * rowId, count * 4);
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, byte[] src, int srcIndex) {
    +    Platform.copyMemory(src, Platform.DOUBLE_ARRAY_OFFSET + srcIndex,
    --- End diff --
    
    byte offset here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51305930
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -126,12 +147,25 @@ public ArrayData copy() {
                 list[i] = data.getLong(offset + i);
               }
             }
    +      } else if (dt instanceof DecimalType) {
    +        DecimalType decType = (DecimalType)dt;
    +        for (int i = 0; i < length; i++) {
    +          if (!data.getIsNull(offset + i)) {
    +            list[i] = getDecimal(i, decType.precision(), decType.scale());
    +          }
    +        }
           } else if (dt instanceof StringType) {
             for (int i = 0; i < length; i++) {
               if (!data.getIsNull(offset + i)) {
                 list[i] = ColumnVectorUtils.toString(data.getByteArray(offset + i));
               }
             }
    +      } else if (dt instanceof CalendarIntervalType) {
    --- End diff --
    
    This is exposing missing implementation. We never generate arrays or complex types in the testing currently but implementing it is quite a bit more work. The issue is that to materialize the list of nested types inside the array, we need a deep copy of everything.
    
    e.g. array<struct> means we need to copy the row for the struct. 
    
    This patch is big enough that I'd prefer to defer. Also, this is extremely expensive so we might want to rethink how we do this. For example, having getArray return an iterator instead of the list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10961#issuecomment-176348260
  
    **[Test build #50287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50287/consoleFull)** for PR 10961 at commit [`ec397e3`](https://github.com/apache/spark/commit/ec397e388c446fdf33b166f123589740222949f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10961#issuecomment-176414955
  
    **[Test build #50287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50287/consoleFull)** for PR 10961 at commit [`ec397e3`](https://github.com/apache/spark/commit/ec397e388c446fdf33b166f123589740222949f2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10961#issuecomment-176415357
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51087836
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -170,7 +204,14 @@ public float getFloat(int ordinal) {
     
         @Override
         public Decimal getDecimal(int ordinal, int precision, int scale) {
    -      throw new NotImplementedException();
    +      if (precision <= Decimal.MAX_LONG_DIGITS()) {
    --- End diff --
    
    can we move the if branch outside?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51299238
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -354,6 +435,33 @@ public void reset() {
       /**
        * Sets the value at rowId to `value`.
        */
    +  public abstract void putFloat(int rowId, float value);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to value.
    +   */
    +  public abstract void putFloats(int rowId, int count, float value);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to [src + srcIndex, src + srcIndex + count)
    +   * src should contain `count` doubles written as ieee format.
    +   */
    +  public abstract void putFloats(int rowId, int count, float[] src, int srcIndex);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    +   * The data in src must be ieee formatted floats.
    --- End diff --
    
    IEEE formatted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51299173
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -282,6 +328,21 @@ public void reset() {
       /**
        * Sets the value at rowId to `value`.
        */
    +  public abstract void putBoolean(int rowId, boolean value);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to value.
    +   */
    +  public abstract void putBooleans(int rowId, int count, boolean value);
    --- End diff --
    
    The intended caller of the batched APIs is parquet which doesn't do booleans this way. We can add it when we need it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10961#issuecomment-178214178
  
    Going to merge this. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51088273
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ---
    @@ -217,6 +269,41 @@ public final long getLong(int rowId) {
       }
     
       //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public final void putFloat(int rowId, float value) {
    +    Platform.putFloat(null, data + rowId * 4, value);
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, float value) {
    +    long offset = data + 4 * rowId;
    +    for (int i = 0; i < count; ++i, offset += 4) {
    +      Platform.putFloat(null, offset, value);
    +    }
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, float[] src, int srcIndex) {
    +    Platform.copyMemory(src, Platform.DOUBLE_ARRAY_OFFSET + srcIndex * 4,
    --- End diff --
    
    float offset rather than double offset?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51168028
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ---
    @@ -217,6 +269,41 @@ public final long getLong(int rowId) {
       }
     
       //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public final void putFloat(int rowId, float value) {
    +    Platform.putFloat(null, data + rowId * 4, value);
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, float value) {
    +    long offset = data + 4 * rowId;
    +    for (int i = 0; i < count; ++i, offset += 4) {
    +      Platform.putFloat(null, offset, value);
    +    }
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, float[] src, int srcIndex) {
    +    Platform.copyMemory(src, Platform.DOUBLE_ARRAY_OFFSET + srcIndex * 4,
    +        null, data + 4 * rowId, count * 4);
    +  }
    +
    +  @Override
    +  public final void putFloats(int rowId, int count, byte[] src, int srcIndex) {
    +    Platform.copyMemory(src, Platform.DOUBLE_ARRAY_OFFSET + srcIndex,
    --- End diff --
    
    Done. Not sure if they are ever different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51301088
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -354,6 +435,33 @@ public void reset() {
       /**
        * Sets the value at rowId to `value`.
        */
    +  public abstract void putFloat(int rowId, float value);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to value.
    +   */
    +  public abstract void putFloats(int rowId, int count, float value);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to [src + srcIndex, src + srcIndex + count)
    +   * src should contain `count` doubles written as ieee format.
    +   */
    +  public abstract void putFloats(int rowId, int count, float[] src, int srcIndex);
    +
    +  /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    +   * The data in src must be ieee formatted floats.
    --- End diff --
    
    IEEE 754 does not specify the endianness for float number, Parquet uses little-endianness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10961


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10961#issuecomment-176415361
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50287/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51232647
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -126,12 +147,25 @@ public ArrayData copy() {
                 list[i] = data.getLong(offset + i);
               }
             }
    +      } else if (dt instanceof DecimalType) {
    +        DecimalType decType = (DecimalType)dt;
    +        for (int i = 0; i < length; i++) {
    +          if (!data.getIsNull(offset + i)) {
    +            list[i] = getDecimal(i, decType.precision(), decType.scale());
    +          }
    +        }
           } else if (dt instanceof StringType) {
             for (int i = 0; i < length; i++) {
               if (!data.getIsNull(offset + i)) {
                 list[i] = ColumnVectorUtils.toString(data.getByteArray(offset + i));
               }
             }
    +      } else if (dt instanceof CalendarIntervalType) {
    --- End diff --
    
    Should we also support Binary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10961#issuecomment-176618365
  
    cc @davies for a more detailed review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10961#discussion_r51087484
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
    @@ -148,6 +148,28 @@ object DecimalType extends AbstractDataType {
         }
       }
     
    +  /**
    +   * Returns if dt is a DecimalType that fits inside a long
    +   */
    +  def isLongDecimalType(dt: DataType): Boolean = {
    --- End diff --
    
    is64bitDecimal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org