You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by nongli <gi...@git.apache.org> on 2016/01/19 00:42:34 UTC

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

GitHub user nongli opened a pull request:

    https://github.com/apache/spark/pull/10820

    [WIP] [SPARK-12854][SQL] Implement complex types support in ColumnarBatch

    WIP: this patch adds some random row generation testing. The test code needs to be cleaned up as it
    duplicates functionality from else where. The non-test code should be good to review.
    
    This patch adds support for complex types for ColumnarBatch. ColumnarBatch supports structs
    and arrays. There is a simple mapping between the richer catalyst types to these two. Strings
    are treated as an array of bytes.
    
    ColumnarBatch will contain a column for each node of the schema. Non-complex schemas consists
    of just leaf nodes. Structs represent an internal node with one child for each field. Arrays
    are internal nodes with one child. Structs just contain nullability. Arrays contain offsets
    and lengths into the child array. This structure is able to handle arbitrary nesting. It has
    the key property that we maintain columnar throughout and that primitive types are only stored
    in the leaf nodes and contiguous across rows. For example, if the schema is array<array<int>>,
    all of the int data is stored consecutively.
    
    As part of this, this patch adds append APIs in addition to the Put APIs (e.g. putLong(rowid, v)
    vs appendLong(v)). These APIs are necessary when the batch contains variable length elements.
    The vectors are not fixed length and will grow as necessary. This should make the usage a lot
    simpler for the writer.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nongli/spark spark-12854

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10820.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10820
    
----
commit 518d9bc64fd687204b157b1b36816620482716bf
Author: Nong Li <no...@databricks.com>
Date:   2016-01-05T04:39:31Z

    [WIP] [SPARK-12854][SQL] Implement complex types support in ColumnarBatch.
    
    WIP: this patch adds some random row generation. The test code needs to be cleaned up as it
    duplicates functionality from else where. The non-test code should be good to review.
    
    This patch adds support for complex types for ColumnarBatch. ColumnarBatch supports structs
    and arrays. There is a simple mapping between the richer catalyst types to these two. Strings
    are treated as an array of bytes.
    
    ColumnarBatch will contain a column for each node of the schema. Non-complex schemas consists
    of just leaf nodes. Structs represent an internal node with one child for each field. Arrays
    are internal nodes with one child. Structs just contain nullability. Arrays contain offsets
    and lengths into the child array. This structure is able to handle arbitrary nesting. It has
    the key property that we maintain columnar throughout and that primitive types are only stored
    in the leaf nodes and contiguous across rows. For example, if the schema is array<array<int>>,
    all of the int data is stored consecutively.
    
    As part of this, this patch adds append APIs in addition to the Put APIs (e.g. putLong(rowid, v)
    vs appendLong(v)). These APIs are necessary when the batch contains variable length elements.
    The vectors are not fixed length and will grow as necessary. This should make the usage a lot
    simpler for the writer.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50783615
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java ---
    @@ -82,25 +91,53 @@ public void close() {
        * Adapter class to interop with existing components that expect internal row. A lot of
        * performance is lost with this translation.
        */
    -  public final class Row extends InternalRow {
    +  public static final class Row extends InternalRow {
         private int rowId;
    +    private final ColumnarBatch parent;
    +    private final int fixedLenRowSize;
    +
    +    private Row(ColumnarBatch parent) {
    +      this.parent = parent;
    +      this.fixedLenRowSize = UnsafeRow.calculateFixedPortionByteSize(parent.numCols());
    +    }
     
         /**
          * Marks this row as being filtered out. This means a subsequent iteration over the rows
          * in this batch will not include this row.
          */
         public final void markFiltered() {
    -      ColumnarBatch.this.markFiltered(rowId);
    +      parent.markFiltered(rowId);
         }
     
         @Override
         public final int numFields() {
    -      return ColumnarBatch.this.numCols();
    +      return parent.numCols();
         }
     
         @Override
    +    /**
    +     * Revisit this. This is expensive.
    --- End diff --
    
    This may be too slow for Join (or any operator that requires to hold a row). Could we use a lazy generated UnsafeProjection here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50159410
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -17,22 +17,37 @@
     package org.apache.spark.sql.execution.vectorized;
     
     import org.apache.spark.memory.MemoryMode;
    -import org.apache.spark.sql.types.DataType;
    +import org.apache.spark.sql.types.*;
     
     /**
      * This class represents a column of values and provides the main APIs to access the data
      * values. It supports all the types and contains get/put APIs as well as their batched versions.
      * The batched versions are preferable whenever possible.
      *
    - * Most of the APIs take the rowId as a parameter. This is the local 0-based row id for values
    + * To handle nested schemas, ColumnVector has two types: Arrays and Structs. In both cases these
    + * columns have child columns. All of the data is stored in the child columns and the parent column
    + * contains nullability, and in the case of Arrays, the lengths and offsets into the child column.
    --- End diff --
    
    I'll update the comment when I rev the PR but the answer is:
    
    I'm not sure what single "parent" means. The array is a column that stores nullability, lengths and offsets. The child column stores the values, including their nullability. Either one can be independently nullable or not.
    
    Lengths and offsets are encoded like 'ints'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174748800
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-173140566
  
    Will ColumnarBatch become a column version of UnsafeRow? Or, it will become more general columnar store?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50783393
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java ---
    @@ -82,25 +91,53 @@ public void close() {
        * Adapter class to interop with existing components that expect internal row. A lot of
        * performance is lost with this translation.
        */
    -  public final class Row extends InternalRow {
    +  public static final class Row extends InternalRow {
         private int rowId;
    +    private final ColumnarBatch parent;
    +    private final int fixedLenRowSize;
    +
    +    private Row(ColumnarBatch parent) {
    +      this.parent = parent;
    +      this.fixedLenRowSize = UnsafeRow.calculateFixedPortionByteSize(parent.numCols());
    +    }
     
         /**
          * Marks this row as being filtered out. This means a subsequent iteration over the rows
          * in this batch will not include this row.
          */
         public final void markFiltered() {
    -      ColumnarBatch.this.markFiltered(rowId);
    +      parent.markFiltered(rowId);
         }
     
         @Override
         public final int numFields() {
    -      return ColumnarBatch.this.numCols();
    +      return parent.numCols();
         }
     
         @Override
    +    /**
    +     * Revisit this. This is expensive.
    +     */
         public final InternalRow copy() {
    -      throw new NotImplementedException();
    +      UnsafeRow row = new UnsafeRow(parent.numCols());
    --- End diff --
    
    Should we re-use the row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50787370
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java ---
    @@ -170,12 +204,13 @@ public final CalendarInterval getInterval(int ordinal) {
     
         @Override
         public final InternalRow getStruct(int ordinal, int numFields) {
    -      throw new NotImplementedException();
    +      return ColumnVectorUtils.toRow(parent.column(ordinal).getStruct(rowId));
         }
     
         @Override
         public final ArrayData getArray(int ordinal) {
    -      throw new NotImplementedException();
    +      ColumnVector.Array array = parent.column(ordinal).getArray(rowId);
    +      return ColumnVectorUtils.toGenericArray(array);
    --- End diff --
    
    I think for this and getStruct, it will be too slow. We are materializing the collection which we should avoid if possible. 
    
    I think we need to clean up InternalRow's interface and then have getStruct return that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #10820: [SPARK-12854][SQL] Implement complex types suppor...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r151419068
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -166,12 +727,63 @@ public void reset() {
       protected boolean anyNullsSet;
     
       /**
    -   * Data type for this column.
    +   * Default size of each array length value. This grows as necessary.
        */
    -  protected final DataType type;
    +  protected static final int DEFAULT_ARRAY_LENGTH = 4;
    +
    +  /**
    +   * Current write cursor (row index) when appending data.
    +   */
    +  protected int elementsAppended;
     
    -  protected ColumnVector(int capacity, DataType type) {
    +  /**
    +   * If this is a nested type (array or struct), the column for the child data.
    +   */
    +  protected final ColumnVector[] childColumns;
    +
    +  /**
    +   * Reusable Array holder for getArray().
    +   */
    +  protected final Array resultArray;
    +
    +  /**
    +   * Reusable Struct holder for getStruct().
    +   */
    +  protected final Struct resultStruct;
    +
    +  /**
    +   * Sets up the common state and also handles creating the child columns if this is a nested
    +   * type.
    +   */
    +  protected ColumnVector(int capacity, DataType type, MemoryMode memMode) {
         this.capacity = capacity;
         this.type = type;
    +
    +    if (type instanceof ArrayType || type instanceof BinaryType || type instanceof StringType) {
    +      DataType childType;
    +      int childCapacity = capacity;
    +      if (type instanceof ArrayType) {
    +        childType = ((ArrayType)type).elementType();
    +      } else {
    +        childType = DataTypes.ByteType;
    +        childCapacity *= DEFAULT_ARRAY_LENGTH;
    --- End diff --
    
    Why only grow the capacity for non-array type?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174744503
  
    **[Test build #50044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50044/consoleFull)** for PR 10820 at commit [`c58eedf`](https://github.com/apache/spark/commit/c58eedf66e23d11cee6771806c377e9f1a5c667a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50784640
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -42,13 +58,77 @@ public static ColumnVector allocate(int capacity, DataType type, MemoryMode mode
         }
       }
     
    +  /**
    +   * Holder object to return an array. This object is intended to be reused. Callers should
    +   * copy the data out if it needs to be stored.
    +   * TODO: consider adding all the get APIs that are 0 indexed. This would just be a convenience
    +   * wrapper. Measure this for common usage patterns and see if there is an overhead.
    +   *   int getInt(index) {
    +   *     return data.getInt(index + offset);
    +   *   }
    +   */
    +  public static final class Array {
    --- End diff --
    
    Can we make this extend ArrayData? similar to what we have done for Row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50784556
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -99,6 +185,26 @@ public void reset() {
       /**
        * Sets the value at rowId to `value`.
        */
    +  public abstract void putByte(int rowId, byte value);
    --- End diff --
    
    We could separate these primitive types as another PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174780170
  
    **[Test build #50046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50046/consoleFull)** for PR 10820 at commit [`f579889`](https://github.com/apache/spark/commit/f579889413c3978355fb0fb0d0ac02719803cb30).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174780637
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174748426
  
    **[Test build #50029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50029/consoleFull)** for PR 10820 at commit [`f579889`](https://github.com/apache/spark/commit/f579889413c3978355fb0fb0d0ac02719803cb30).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175260359
  
    **[Test build #2460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2460/consoleFull)** for PR 10820 at commit [`3d11c37`](https://github.com/apache/spark/commit/3d11c374df4f11e5a535eb69e5f990528ab3847b).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174777552
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174792670
  
    **[Test build #50051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50051/consoleFull)** for PR 10820 at commit [`2918077`](https://github.com/apache/spark/commit/29180770a4e8ea38349eeee6c5e7671f0857b831).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-172709705
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49636/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-172765612
  
    if the schema is array>,? array of int?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174698706
  
    @kiszk We're looking to make the execution engine represent data in a columnar way. I'm not sure what you mean by more general columnar store. I usually think of that more for storage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-172709703
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174915657
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175198461
  
    **[Test build #50119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50119/consoleFull)** for PR 10820 at commit [`3d11c37`](https://github.com/apache/spark/commit/3d11c374df4f11e5a535eb69e5f990528ab3847b).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50784218
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java ---
    @@ -170,12 +204,13 @@ public final CalendarInterval getInterval(int ordinal) {
     
         @Override
         public final InternalRow getStruct(int ordinal, int numFields) {
    -      throw new NotImplementedException();
    +      return ColumnVectorUtils.toRow(parent.column(ordinal).getStruct(rowId));
         }
     
         @Override
         public final ArrayData getArray(int ordinal) {
    -      throw new NotImplementedException();
    +      ColumnVector.Array array = parent.column(ordinal).getArray(rowId);
    +      return ColumnVectorUtils.toGenericArray(array);
    --- End diff --
    
    Will this be too slow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174780643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50046/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174777554
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50044/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174748807
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50029/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174914829
  
    **[Test build #50080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50080/consoleFull)** for PR 10820 at commit [`f378335`](https://github.com/apache/spark/commit/f378335858c1c10400936f046430f8e7f4c70c3c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  public static final class Array extends ArrayData `
      * `  public static final class Struct extends InternalRow `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175316886
  
    **[Test build #2464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2464/consoleFull)** for PR 10820 at commit [`3d11c37`](https://github.com/apache/spark/commit/3d11c374df4f11e5a535eb69e5f990528ab3847b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175198473
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50119/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174867931
  
    **[Test build #50080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50080/consoleFull)** for PR 10820 at commit [`f378335`](https://github.com/apache/spark/commit/f378335858c1c10400936f046430f8e7f4c70c3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50782380
  
    --- Diff: unsafe/src/main/java/org/apache/spark/unsafe/Platform.java ---
    @@ -105,6 +105,17 @@ public static void freeMemory(long address) {
         _UNSAFE.freeMemory(address);
       }
     
    +  public static long reallocateMemory(long address, long oldSize, long newSize) {
    --- End diff --
    
    Should this also work for on-heap?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174792786
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50051/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174777402
  
    **[Test build #50044 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50044/consoleFull)** for PR 10820 at commit [`c58eedf`](https://github.com/apache/spark/commit/c58eedf66e23d11cee6771806c377e9f1a5c667a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50784202
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java ---
    @@ -170,12 +204,13 @@ public final CalendarInterval getInterval(int ordinal) {
     
         @Override
         public final InternalRow getStruct(int ordinal, int numFields) {
    -      throw new NotImplementedException();
    +      return ColumnVectorUtils.toRow(parent.column(ordinal).getStruct(rowId));
    --- End diff --
    
    Will this be too slow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50080842
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -17,22 +17,37 @@
     package org.apache.spark.sql.execution.vectorized;
     
     import org.apache.spark.memory.MemoryMode;
    -import org.apache.spark.sql.types.DataType;
    +import org.apache.spark.sql.types.*;
     
     /**
      * This class represents a column of values and provides the main APIs to access the data
      * values. It supports all the types and contains get/put APIs as well as their batched versions.
      * The batched versions are preferable whenever possible.
      *
    - * Most of the APIs take the rowId as a parameter. This is the local 0-based row id for values
    + * To handle nested schemas, ColumnVector has two types: Arrays and Structs. In both cases these
    + * columns have child columns. All of the data is stored in the child columns and the parent column
    + * contains nullability, and in the case of Arrays, the lengths and offsets into the child column.
    --- End diff --
    
    can you explain how lengths and offsets are stored? also is there a single "parent" column that encodes nullability, length, and offset?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174792785
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174707695
  
    **[Test build #50029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50029/consoleFull)** for PR 10820 at commit [`f579889`](https://github.com/apache/spark/commit/f579889413c3978355fb0fb0d0ac02719803cb30).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-172709449
  
    **[Test build #49636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49636/consoleFull)** for PR 10820 at commit [`518d9bc`](https://github.com/apache/spark/commit/518d9bc64fd687204b157b1b36816620482716bf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  public static final class Array `
      * `  public static final class Struct `
      * `public class ColumnVectorUtils `
      * `  public static final class Row extends InternalRow `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50786762
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -99,6 +185,26 @@ public void reset() {
       /**
        * Sets the value at rowId to `value`.
        */
    +  public abstract void putByte(int rowId, byte value);
    --- End diff --
    
    Yea, we're still missing those two types and probably decimal. I'll do those in a follow up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175332571
  
    Going to merge this in master. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50783916
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -99,6 +185,26 @@ public void reset() {
       /**
        * Sets the value at rowId to `value`.
        */
    +  public abstract void putByte(int rowId, byte value);
    --- End diff --
    
    Should it also have putShort()/putFloat()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50787029
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java ---
    @@ -82,25 +91,53 @@ public void close() {
        * Adapter class to interop with existing components that expect internal row. A lot of
        * performance is lost with this translation.
        */
    -  public final class Row extends InternalRow {
    +  public static final class Row extends InternalRow {
         private int rowId;
    +    private final ColumnarBatch parent;
    +    private final int fixedLenRowSize;
    +
    +    private Row(ColumnarBatch parent) {
    +      this.parent = parent;
    +      this.fixedLenRowSize = UnsafeRow.calculateFixedPortionByteSize(parent.numCols());
    +    }
     
         /**
          * Marks this row as being filtered out. This means a subsequent iteration over the rows
          * in this batch will not include this row.
          */
         public final void markFiltered() {
    -      ColumnarBatch.this.markFiltered(rowId);
    +      parent.markFiltered(rowId);
         }
     
         @Override
         public final int numFields() {
    -      return ColumnarBatch.this.numCols();
    +      return parent.numCols();
         }
     
         @Override
    +    /**
    +     * Revisit this. This is expensive.
    --- End diff --
    
    Yea I think matei was looking at some code that would serve this purpose. I expect this too be too slow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50892329
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala ---
    @@ -314,8 +338,60 @@ object ColumnarBatchBenchmark {
         benchmark.run()
       }
     
    +  def stringAccess(iters: Long): Unit = {
    +    val chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    +    val random = new Random(0)
    +
    +    def randomString(min: Int, max: Int): String = {
    +      val len = random.nextInt(max - min) + min
    +      val sb = new StringBuilder(len)
    +      var i = 0
    +      while (i < len) {
    +        sb.append(chars.charAt(random.nextInt(chars.length())));
    +        i += 1
    +      }
    +      return sb.toString
    +    }
    +
    +    val minString = 3
    +    val maxString = 32
    +    val count = 4 * 1000
    +
    +    val data = Seq.fill(count)(randomString(minString, maxString)).map(_.getBytes).toArray
    +
    +    def column(memoryMode: MemoryMode) = { i: Int =>
    +      val column = ColumnVector.allocate(count, BinaryType, memoryMode)
    +      var sum = 0L
    +      for (n <- 0L until iters) {
    +        var i = 0
    +        while (i < count) {
    +          column.putByteArray(i, data(i))
    +          i += 1
    +        }
    +        i = 0
    +        while (i < count) {
    +          sum += column.getByteArray(i).length
    +          i += 1
    +        }
    +        column.reset()
    +      }
    +    }
    +
    +    /*
    +    String Read/Write:                       Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +    -------------------------------------------------------------------------------------
    +    On Heap                                         457.0            35.85         1.00 X
    +    Off Heap                                       1206.0            13.59         0.38 X
    --- End diff --
    
    utf8string supports reading offheap too so you can probably improve this substantially.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175194301
  
    **[Test build #50119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50119/consoleFull)** for PR 10820 at commit [`3d11c37`](https://github.com/apache/spark/commit/3d11c374df4f11e5a535eb69e5f990528ab3847b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175205210
  
    **[Test build #2460 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2460/consoleFull)** for PR 10820 at commit [`3d11c37`](https://github.com/apache/spark/commit/3d11c374df4f11e5a535eb69e5f990528ab3847b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174748141
  
    **[Test build #50046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50046/consoleFull)** for PR 10820 at commit [`f579889`](https://github.com/apache/spark/commit/f579889413c3978355fb0fb0d0ac02719803cb30).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50787004
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java ---
    @@ -82,25 +91,53 @@ public void close() {
        * Adapter class to interop with existing components that expect internal row. A lot of
        * performance is lost with this translation.
        */
    -  public final class Row extends InternalRow {
    +  public static final class Row extends InternalRow {
         private int rowId;
    +    private final ColumnarBatch parent;
    +    private final int fixedLenRowSize;
    +
    +    private Row(ColumnarBatch parent) {
    +      this.parent = parent;
    +      this.fixedLenRowSize = UnsafeRow.calculateFixedPortionByteSize(parent.numCols());
    +    }
     
         /**
          * Marks this row as being filtered out. This means a subsequent iteration over the rows
          * in this batch will not include this row.
          */
         public final void markFiltered() {
    -      ColumnarBatch.this.markFiltered(rowId);
    +      parent.markFiltered(rowId);
         }
     
         @Override
         public final int numFields() {
    -      return ColumnarBatch.this.numCols();
    +      return parent.numCols();
         }
     
         @Override
    +    /**
    +     * Revisit this. This is expensive.
    +     */
         public final InternalRow copy() {
    -      throw new NotImplementedException();
    +      UnsafeRow row = new UnsafeRow(parent.numCols());
    --- End diff --
    
    Is that valid for copy? Copy seems like it can't have those semanatics. I think we need to remove copy() from the row interface internally. Or at least change it to take to something like:
    
    void copy(UnsafeRowWriter writer). We should be able to directly write this to the destination.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175270187
  
    **[Test build #2464 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2464/consoleFull)** for PR 10820 at commit [`3d11c37`](https://github.com/apache/spark/commit/3d11c374df4f11e5a535eb69e5f990528ab3847b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174915659
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50080/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-174754227
  
    **[Test build #50051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50051/consoleFull)** for PR 10820 at commit [`2918077`](https://github.com/apache/spark/commit/29180770a4e8ea38349eeee6c5e7671f0857b831).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50783295
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java ---
    @@ -173,4 +232,86 @@ public final void putDoubles(int rowId, int count, byte[] src, int srcIndex) {
       public final double getDouble(int rowId) {
         return doubleData[rowId];
       }
    +
    +  //
    +  // APIs dealing with Arrays
    +  //
    +
    +  @Override
    +  public final int getArrayLength(int rowId) {
    +    return arrayLengths[rowId];
    +  }
    +  @Override
    +  public final int getArrayOffset(int rowId) {
    +    return arrayOffsets[rowId];
    +  }
    +
    +  @Override
    +  public final void putArray(int rowId, int offset, int length) {
    +    arrayOffsets[rowId] = offset;
    +    arrayLengths[rowId] = length;
    +  }
    +
    +  @Override
    +  public final void loadBytes(Array array) {
    +    array.byteArray = byteData;
    +    array.byteArrayOffset = array.offset;
    +  }
    +
    +  //
    +  // APIs dealing with Byte Arrays
    +  //
    +
    +  @Override
    +  public final int putByteArray(int rowId, byte[] value, int offset, int length) {
    +    int result = arrayData().appendBytes(length, value, offset);
    +    arrayOffsets[rowId] = result;
    +    arrayLengths[rowId] = length;
    +    return result;
    +  }
    +
    +  @Override
    +  public final void reserve(int requiredCapacity) {
    +    if (requiredCapacity > capacity) reserveInternal(requiredCapacity * 2);
    +  }
    +
    +  // Spilt this function out since it is the slow path.
    +  private final void reserveInternal(int newCapacity) {
    +    if (this.resultArray != null) {
    +      int[] newLengths = new int[newCapacity];
    +      int[] newOffsets = new int[newCapacity];
    +      if (this.arrayLengths != null) {
    +        System.arraycopy(this.arrayLengths, 0, newLengths, 0, elementsAppended);
    --- End diff --
    
    How does it compare to Unsafe.copyMemory()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] [SPARK-12854][SQL] Implement complex typ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-172688324
  
    **[Test build #49636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49636/consoleFull)** for PR 10820 at commit [`518d9bc`](https://github.com/apache/spark/commit/518d9bc64fd687204b157b1b36816620482716bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50787139
  
    --- Diff: unsafe/src/main/java/org/apache/spark/unsafe/Platform.java ---
    @@ -105,6 +105,17 @@ public static void freeMemory(long address) {
         _UNSAFE.freeMemory(address);
       }
     
    +  public static long reallocateMemory(long address, long oldSize, long newSize) {
    --- End diff --
    
    I have this in the same group of functions as freeMemory, allocateMemory. What would the on heap version look like?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10820#issuecomment-175198471
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org