You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2015/09/14 17:33:21 UTC

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/8747

    [SPARK-10585][SQL] only copy data once when generate unsafe projection

    This PR is a completely rewritten of GenerateUnsafeProjection, to accomplish the goal of copying data only once. The old code of GenerateUnsafeProjection is still there to reduce review difficulty.
    
    Instead of creating unsafe conversion code for struct, array and map, we create code of writing the content to the global row buffer.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark copy-once

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8747.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8747
    
----
commit d78671fa6bd2dc431bbf93b8c445431585b5db9e
Author: Wenchen Fan <cl...@outlook.com>
Date:   2015-09-14T07:57:00Z

    only copy data once when generate unsafe projection

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-144935204
  
      [Test build #43174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43174/console) for   PR 8747 at commit [`427da03`](https://github.com/apache/spark/commit/427da03597df85af83df633d07457dcb1fdb283c).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class BufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143313189
  
    I tried to generate as less code as possible and move the code to real java classes, to debug and maintain it easier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145096870
  
      [Test build #43185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43185/consoleFull) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145084693
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40957723
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -0,0 +1,218 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.array.ByteArrayMethods;
    +import org.apache.spark.unsafe.bitset.BitSetMethods;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeRow` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeRowWriter {
    +
    +  private GlobalBufferHolder holder;
    +  // The offset of the global buffer where we start to write this row.
    +  private int startingOffset;
    +  private int nullBitsSize;
    +
    +  public void initialize(GlobalBufferHolder holder, int numFields) {
    +    this.holder = holder;
    +    this.startingOffset = holder.cursor;
    +    this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
    +
    +    // grow the global buffer to make sure it has enough space to write fixed-length data.
    +    final int fixedSize = nullBitsSize + 8 * numFields;
    +    holder.grow(fixedSize);
    +    holder.cursor += fixedSize;
    +
    +    // zero-out the null bits region
    +    for (int i = 0; i < nullBitsSize; i += 8) {
    +      Platform.putLong(holder.buffer, startingOffset + i, 0L);
    +    }
    +  }
    +
    +  private void zeroOutPaddingBytes(int numBytes) {
    +    if ((numBytes & 0x07) > 0) {
    +      Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 3), 0L);
    +    }
    +  }
    +
    +  public void setNullAt(int ordinal) {
    +    BitSetMethods.set(holder.buffer, startingOffset, ordinal);
    +    Platform.putLong(holder.buffer, getFieldOffset(ordinal), 0L);
    +  }
    +
    +  public long getFieldOffset(int ordinal) {
    +    return startingOffset + nullBitsSize + 8 * ordinal;
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long size) {
    +    setOffsetAndSize(ordinal, holder.cursor, size);
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long currentCursor, long size) {
    +    final long relativeOffset = currentCursor - startingOffset;
    +    final long fieldOffset = getFieldOffset(ordinal);
    +    final long offsetAndSize = (relativeOffset << 32) | size;
    +
    +    Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
    +  }
    +
    +  // Do word alignment for this row and return the number of bytes padded.
    +  // todo: remove this after we make unsafe array data word align.
    +  public void alignWords(int numBytes) {
    --- End diff --
    
    alignToWords?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-140826488
  
    What's the performance improvement coming from this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145098225
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43185/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-144934425
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-142764287
  
    I tested it on 2 cases: deeply nested row and long row with non-null variable-length fields, the result is:
    
                |  old code | new code
    ----------|--------------|--------
    case 1(deeply nested)  |   2400ms | 2100ms
    case 2(long row) |    900ms  | 650ms
    
    The improvement of case 1 is beacuse we avoid the extra copy.
    The improvement of case 2 is beacuse now we clear the null bits region at the begining so that we don't need to reset it again when writing each non-null field.
    
    Actually it's not a big improvement and maybe much less for end-to-end performance, so I'm not sure if this change worth.
    cc @davies @rxin 
    
    testing code:
    
    ```
      test("perf1") {
        val superNestedStruct = {
          def makeStruct(struct: StructType, i: Int): StructType = {
            if (i == 0) {
              struct
            } else {
              makeStruct(new StructType().add("i", struct, false), i - 1)
            }
          }
          makeStruct(new StructType().add("a", LongType), 100)
        }
    
        val converter = UnsafeProjection.create(superNestedStruct)
        val generator = RandomDataGenerator.forType(superNestedStruct, nullable = false).get
        val input = CatalystTypeConverters.convertToCatalyst(generator()).asInstanceOf[InternalRow]
    
        var i = 100000
        val start = System.currentTimeMillis()
        while (i > 0) {
          converter(input)
          i -= 1
        }
        val end = System.currentTimeMillis()
    
        println(end - start)
      }
    
      test("perf2") {
        val struct = StructType((1 to 100).map { i =>
          StructField(i.toString, CalendarIntervalType, false)
        })
    
        val converter = UnsafeProjection.create(struct)
        val generator = RandomDataGenerator.forType(struct, nullable = false).get
        val input = CatalystTypeConverters.convertToCatalyst(generator()).asInstanceOf[InternalRow]
    
        var i = 100000
        val start = System.currentTimeMillis()
        while (i > 0) {
          converter(input)
          i -= 1
        }
        val end = System.currentTimeMillis()
    
        println(end - start)
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145097256
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145094269
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-144935206
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-140162712
  
      [Test build #42426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42426/console) for   PR 8747 at commit [`858f198`](https://github.com/apache/spark/commit/858f198d9bfe6f7b5a2750463ae0f5383d2a7f1c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145158408
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-144935207
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43174/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143128542
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40956378
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -0,0 +1,218 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.array.ByteArrayMethods;
    +import org.apache.spark.unsafe.bitset.BitSetMethods;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeRow` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeRowWriter {
    +
    +  private GlobalBufferHolder holder;
    +  // The offset of the global buffer where we start to write this row.
    +  private int startingOffset;
    +  private int nullBitsSize;
    +
    +  public void initialize(GlobalBufferHolder holder, int numFields) {
    +    this.holder = holder;
    +    this.startingOffset = holder.cursor;
    +    this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
    +
    +    // grow the global buffer to make sure it has enough space to write fixed-length data.
    +    final int fixedSize = nullBitsSize + 8 * numFields;
    +    holder.grow(fixedSize);
    +    holder.cursor += fixedSize;
    +
    +    // zero-out the null bits region
    +    for (int i = 0; i < nullBitsSize; i += 8) {
    +      Platform.putLong(holder.buffer, startingOffset + i, 0L);
    +    }
    +  }
    +
    +  private void zeroOutPaddingBytes(int numBytes) {
    +    if ((numBytes & 0x07) > 0) {
    +      Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 3), 0L);
    +    }
    +  }
    +
    +  public void setNullAt(int ordinal) {
    +    BitSetMethods.set(holder.buffer, startingOffset, ordinal);
    +    Platform.putLong(holder.buffer, getFieldOffset(ordinal), 0L);
    +  }
    +
    +  public long getFieldOffset(int ordinal) {
    +    return startingOffset + nullBitsSize + 8 * ordinal;
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long size) {
    +    setOffsetAndSize(ordinal, holder.cursor, size);
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long currentCursor, long size) {
    +    final long relativeOffset = currentCursor - startingOffset;
    +    final long fieldOffset = getFieldOffset(ordinal);
    +    final long offsetAndSize = (relativeOffset << 32) | size;
    +
    +    Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
    +  }
    +
    +  // Do word alignment for this row and return the number of bytes padded.
    +  // todo: remove this after we make unsafe array data word align.
    +  public void alignWords(int numBytes) {
    +    final int remainder = numBytes & 0x07;
    +
    +    if (remainder > 0) {
    +      final int paddingBytes = 8 - remainder;
    +      holder.grow(paddingBytes);
    +
    +      final byte[] orignalValues = new byte[remainder];
    +      Platform.copyMemory(
    +        holder.buffer,
    +        holder.cursor - remainder,
    +        orignalValues,
    +        Platform.BYTE_ARRAY_OFFSET,
    +        remainder);
    +
    +      Platform.putLong(holder.buffer, holder.cursor - remainder, 0);
    +
    +      Platform.copyMemory(
    +        orignalValues,
    +        Platform.BYTE_ARRAY_OFFSET,
    +        holder.buffer,
    +        holder.cursor - remainder,
    +        remainder);
    +
    +      holder.cursor += paddingBytes;
    +    }
    +  }
    +
    +
    +
    +  public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) {
    +    // make sure Decimal object has the same scale as DecimalType
    +    if (input.changePrecision(precision, scale)) {
    +      Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong());
    +    } else {
    +      setNullAt(ordinal);
    +    }
    +  }
    +
    +  public void write(int ordinal, Decimal input, int precision, int scale) {
    +    // grow the global buffer before writing data.
    +    holder.grow(16);
    +
    +    // zero-out the bytes
    +    Platform.putLong(holder.buffer, holder.cursor, 0L);
    +    Platform.putLong(holder.buffer, holder.cursor + 8, 0L);
    +
    +    // Make sure Decimal object has the same scale as DecimalType.
    +    // Note that we may pass in null Decimal object to set null for it.
    +    if (input == null || !input.changePrecision(precision, scale)) {
    +      BitSetMethods.set(holder.buffer, startingOffset, ordinal);
    +      // keep the offset for future update
    +      final long relativeOffset = holder.cursor - startingOffset;
    +      Platform.putLong(holder.buffer, getFieldOffset(ordinal), relativeOffset << 32);
    +    } else {
    +      final byte[] bytes = input.toJavaBigDecimal().unscaledValue().toByteArray();
    +      assert bytes.length <= 16;
    +
    +      // Write the bytes to the variable length portion.
    +      Platform.copyMemory(
    +        bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, bytes.length);
    +      setOffsetAndSize(ordinal, bytes.length);
    +    }
    +
    +    // move the cursor forward.
    +    holder.cursor += 16;
    +  }
    +
    +  public void write(int ordinal, UTF8String input) {
    +    final int numBytes = input.numBytes();
    +    final int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes);
    +
    +    // grow the global buffer before writing data.
    +    holder.grow(roundedSize);
    +
    +    zeroOutPaddingBytes(numBytes);
    +
    +    // Write the bytes to the variable length portion.
    +    input.writeToMemory(holder.buffer, holder.cursor);
    +
    +    setOffsetAndSize(ordinal, numBytes);
    +
    +    // move the cursor forward.
    +    holder.cursor += roundedSize;
    +  }
    +
    +  public void write(int ordinal, byte[] input) {
    +    final int numBytes = input.length;
    +    final int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes);
    +
    +    // grow the global buffer before writing data.
    +    holder.grow(roundedSize);
    +
    +    zeroOutPaddingBytes(numBytes);
    +
    +    // Write the bytes to the variable length portion.
    +    Platform.copyMemory(input, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, numBytes);
    +
    +    setOffsetAndSize(ordinal, numBytes);
    +
    +    // move the cursor forward.
    +    holder.cursor += roundedSize;
    +  }
    +
    +  public void write(int ordinal, CalendarInterval input) {
    +    // grow the global buffer before writing data.
    +    holder.grow(16);
    +
    +    // Write the months and microseconds fields of Interval to the variable length portion.
    +    Platform.putLong(holder.buffer, holder.cursor, input.months);
    +    Platform.putLong(holder.buffer, holder.cursor + 8, input.microseconds);
    +
    +    // Set the fixed length portion, we don't need size here because it's fixed as 16.
    --- End diff --
    
    Even it's fixed, we could still call `setOffsetAndSize(ordinal, 16)`, that could be more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145097171
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143138326
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40961647
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,278 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    +  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
    +
    +  // todo: if the nullability of field is correct, we can use it to save null check.
    +  private def writeStructToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      fieldTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) =>
    +      val fieldName = ctx.freshName("fieldName")
    +      val code = s"final ${ctx.javaType(dt)} $fieldName = ${ctx.getValue(input, dt, i.toString)};"
    +      val isNull = s"$input.isNullAt($i)"
    +      GeneratedExpressionCode(code, isNull, fieldName)
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeRow) {
    +        $rowWriterClass.directWrite($bufferHolder, (UnsafeRow) $input);
    +      } else {
    +        ${writeExpressionsToBuffer(ctx, input, fieldEvals, fieldTypes, bufferHolder)}
    +      }
    +    """
    +  }
    +
    +  private def writeExpressionsToBuffer(
    +      ctx: CodeGenContext,
    +      row: String,
    +      inputs: Seq[GeneratedExpressionCode],
    +      inputTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val rowWriter = ctx.freshName("rowWriter")
    +    ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new $rowWriterClass();")
    +
    +    val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
    +      case ((input, dt), index) =>
    +        val tmpCursor = ctx.freshName("tmpCursor")
    +
    +        val setNull = dt match {
    +          case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
    +            // Can't call setNullAt() for DecimalType with precision larger than 18.
    +            s"$rowWriter.write($index, null, 0, 0);"
    +          case _ => s"$rowWriter.setNullAt($index);"
    +        }
    +
    +        val writeField = dt match {
    +          case t: StructType =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeStructToBuffer(ctx, input.primitive, t.map(_.dataType), bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case a @ ArrayType(et, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeArrayToBuffer(ctx, input.primitive, et, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case m @ MapType(kt, vt, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeMapToBuffer(ctx, input.primitive, kt, vt, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case _ if ctx.isPrimitiveType(dt) =>
    +            val fieldOffset = ctx.freshName("fieldOffset")
    +            s"""
    +              final long $fieldOffset = $rowWriter.getFieldOffset($index);
    +              Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
    +              ${writePrimitiveType(ctx, input.primitive, dt, s"$bufferHolder.buffer", fieldOffset)}
    +            """
    +
    +          case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +            s"$rowWriter.writeCompactDecimal($index, ${input.primitive}, " +
    +              s"${t.precision}, ${t.scale});"
    +
    +          case t: DecimalType =>
    +            s"$rowWriter.write($index, ${input.primitive}, ${t.precision}, ${t.scale});"
    +
    +          case NullType => ""
    +
    +          case _ => s"$rowWriter.write($index, ${input.primitive});"
    +        }
    +
    +        s"""
    +          ${input.code}
    +          if (${input.isNull}) {
    +            $setNull
    +          } else {
    +            $writeField
    +          }
    +        """
    +    }
    +
    +    s"""
    +      $rowWriter.initialize($bufferHolder, ${inputs.length});
    +      ${ctx.splitExpressions(row, writeFields)}
    +    """
    +  }
    +
    +  // todo: if the nullability of array element is correct, we can use it to save null check.
    +  private def writeArrayToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      elementType: DataType,
    +      bufferHolder: String,
    +      needHeader: Boolean = true): String = {
    +    val arrayWriter = ctx.freshName("arrayWriter")
    +    ctx.addMutableState(arrayWriterClass, arrayWriter,
    +      s"this.$arrayWriter = new $arrayWriterClass();")
    +    val numElements = ctx.freshName("numElements")
    +    val index = ctx.freshName("index")
    +    val element = ctx.freshName("element")
    +
    +    val jt = ctx.javaType(elementType)
    +
    +    val fixedElementSize = elementType match {
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
    +      case _ if ctx.isPrimitiveType(jt) => elementType.defaultSize
    +      case _ => 0
    +    }
    +
    +    val writeElement = elementType match {
    +      case t: StructType =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeStructToBuffer(ctx, element, t.map(_.dataType), bufferHolder)}
    +        """
    +
    +      case a @ ArrayType(et, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeArrayToBuffer(ctx, element, et, bufferHolder)}
    +        """
    +
    +      case m @ MapType(kt, vt, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeMapToBuffer(ctx, element, kt, vt, bufferHolder)}
    +        """
    +
    +      case _ if ctx.isPrimitiveType(elementType) =>
    +        // Should we do word align?
    +        val dataSize = elementType.defaultSize
    +
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writePrimitiveType(ctx, element, elementType,
    +            s"$bufferHolder.buffer", s"$bufferHolder.cursor")}
    +          $bufferHolder.cursor += $dataSize;
    +        """
    +
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +        s"$arrayWriter.writeCompactDecimal($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case t: DecimalType =>
    +        s"$arrayWriter.write($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case NullType => ""
    +
    +      case _ => s"$arrayWriter.write($index, $element);"
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeArrayData) {
    +        $arrayWriterClass.directWrite($bufferHolder, (UnsafeArrayData) $input, $needHeader);
    +      } else {
    +        final int $numElements = $input.numElements();
    +        $arrayWriter.initialize($bufferHolder, $numElements, $needHeader, $fixedElementSize);
    +
    +        for (int $index = 0; $index < $numElements; $index++) {
    +          if ($input.isNullAt($index)) {
    +            $arrayWriter.setNullAt($index);
    +          } else {
    +            final $jt $element = ${ctx.getValue(input, elementType, index)};
    +            $writeElement
    +          }
    +        }
    +      }
    +    """
    +  }
    +
    +  // todo: if the nullability of value element is correct, we can use it to save null check.
    +  private def writeMapToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      keyType: DataType,
    +      valueType: DataType,
    +      bufferHolder: String): String = {
    +    val keys = ctx.freshName("keys")
    +    val values = ctx.freshName("values")
    +    val tmpCursor = ctx.freshName("tmpCursor")
    +
    +    s"""
    +      final ArrayData $keys = $input.keyArray();
    +      final ArrayData $values = $input.valueArray();
    +
    +      // Write the numElements into first 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $bufferHolder.cursor, $keys.numElements());
    --- End diff --
    
    Should we call grow first and zero out the unused 4 bytes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40959365
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,278 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    +  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
    +
    +  // todo: if the nullability of field is correct, we can use it to save null check.
    +  private def writeStructToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      fieldTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) =>
    +      val fieldName = ctx.freshName("fieldName")
    +      val code = s"final ${ctx.javaType(dt)} $fieldName = ${ctx.getValue(input, dt, i.toString)};"
    +      val isNull = s"$input.isNullAt($i)"
    +      GeneratedExpressionCode(code, isNull, fieldName)
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeRow) {
    +        $rowWriterClass.directWrite($bufferHolder, (UnsafeRow) $input);
    +      } else {
    +        ${writeExpressionsToBuffer(ctx, input, fieldEvals, fieldTypes, bufferHolder)}
    +      }
    +    """
    +  }
    +
    +  private def writeExpressionsToBuffer(
    +      ctx: CodeGenContext,
    +      row: String,
    +      inputs: Seq[GeneratedExpressionCode],
    +      inputTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val rowWriter = ctx.freshName("rowWriter")
    +    ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new $rowWriterClass();")
    +
    +    val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
    +      case ((input, dt), index) =>
    +        val tmpCursor = ctx.freshName("tmpCursor")
    +
    +        val setNull = dt match {
    +          case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
    +            // Can't call setNullAt() for DecimalType with precision larger than 18.
    +            s"$rowWriter.write($index, null, 0, 0);"
    --- End diff --
    
    Pass the correct precision and scale, we may use it in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145645874
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r41168843
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java ---
    @@ -0,0 +1,179 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeArrayData` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeArrayWriter {
    +
    +  private BufferHolder holder;
    +  // The offset of the global buffer where we start to write this array.
    +  private int startingOffset;
    +
    +  public void initialize(
    +      BufferHolder holder,
    +      int numElements,
    +      boolean needHeader,
    +      int fixedElementSize) {
    +
    +    // We need 4 bytes each element to store offset.
    +    final int fixedSize = 4 * numElements;
    +
    +    if (needHeader) {
    +      // If header is required, we need extra 4 bytes to store the header.
    +      holder.grow(fixedSize + 4);
    +      // Writes the number of elements into first 4 bytes;
    +      Platform.putInt(holder.buffer, holder.cursor, numElements);
    +      holder.cursor += 4;
    +    } else {
    +      holder.grow(fixedSize);
    +    }
    +
    +    this.holder = holder;
    +    this.startingOffset = holder.cursor;
    +
    +    holder.cursor += fixedSize;
    +
    +    // Grows the global buffer ahead for fixed size data.
    +    if (fixedElementSize > 0) {
    +      holder.grow(fixedElementSize * numElements);
    +    }
    +  }
    +
    +  private long getElementOffset(int ordinal) {
    +    return startingOffset + 4 * ordinal;
    +  }
    +
    +  public void setNullAt(int ordinal) {
    +    final int relativeOffset = holder.cursor - startingOffset;
    +    // Writes negative offset value to represent null element.
    +    Platform.putInt(holder.buffer, getElementOffset(ordinal), -relativeOffset);
    +  }
    +
    +  public void setOffset(int ordinal) {
    +    final int relativeOffset = holder.cursor - startingOffset;
    +    Platform.putInt(holder.buffer, getElementOffset(ordinal), relativeOffset);
    +  }
    +
    +
    --- End diff --
    
    remove empty lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143132969
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143128715
  
      [Test build #43008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43008/consoleFull) for   PR 8747 at commit [`a50923d`](https://github.com/apache/spark/commit/a50923def1eab00f3a61a60b58fca1579ba4a0e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-144934434
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145100466
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43186/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40952306
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/GlobalBufferHolder.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.unsafe.Platform;
    +
    +/**
    + * A helper class to manage the global row buffer used in `GenerateUnsafeProjection`.
    + *
    + * Note that it is only used in `GenerateUnsafeProjection`, so it's safe to mark member variables
    + * public for ease of use.
    + */
    +public class GlobalBufferHolder {
    --- End diff --
    
    This is just `global` to the projection, we may call it `BufferHolder`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40962145
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,278 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    +  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
    +
    +  // todo: if the nullability of field is correct, we can use it to save null check.
    +  private def writeStructToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      fieldTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) =>
    +      val fieldName = ctx.freshName("fieldName")
    +      val code = s"final ${ctx.javaType(dt)} $fieldName = ${ctx.getValue(input, dt, i.toString)};"
    +      val isNull = s"$input.isNullAt($i)"
    +      GeneratedExpressionCode(code, isNull, fieldName)
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeRow) {
    +        $rowWriterClass.directWrite($bufferHolder, (UnsafeRow) $input);
    +      } else {
    +        ${writeExpressionsToBuffer(ctx, input, fieldEvals, fieldTypes, bufferHolder)}
    +      }
    +    """
    +  }
    +
    +  private def writeExpressionsToBuffer(
    +      ctx: CodeGenContext,
    +      row: String,
    +      inputs: Seq[GeneratedExpressionCode],
    +      inputTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val rowWriter = ctx.freshName("rowWriter")
    +    ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new $rowWriterClass();")
    +
    +    val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
    +      case ((input, dt), index) =>
    +        val tmpCursor = ctx.freshName("tmpCursor")
    +
    +        val setNull = dt match {
    +          case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
    +            // Can't call setNullAt() for DecimalType with precision larger than 18.
    +            s"$rowWriter.write($index, null, 0, 0);"
    +          case _ => s"$rowWriter.setNullAt($index);"
    +        }
    +
    +        val writeField = dt match {
    +          case t: StructType =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeStructToBuffer(ctx, input.primitive, t.map(_.dataType), bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case a @ ArrayType(et, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeArrayToBuffer(ctx, input.primitive, et, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case m @ MapType(kt, vt, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeMapToBuffer(ctx, input.primitive, kt, vt, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case _ if ctx.isPrimitiveType(dt) =>
    +            val fieldOffset = ctx.freshName("fieldOffset")
    +            s"""
    +              final long $fieldOffset = $rowWriter.getFieldOffset($index);
    +              Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
    +              ${writePrimitiveType(ctx, input.primitive, dt, s"$bufferHolder.buffer", fieldOffset)}
    +            """
    +
    +          case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +            s"$rowWriter.writeCompactDecimal($index, ${input.primitive}, " +
    +              s"${t.precision}, ${t.scale});"
    +
    +          case t: DecimalType =>
    +            s"$rowWriter.write($index, ${input.primitive}, ${t.precision}, ${t.scale});"
    +
    +          case NullType => ""
    +
    +          case _ => s"$rowWriter.write($index, ${input.primitive});"
    +        }
    +
    +        s"""
    +          ${input.code}
    +          if (${input.isNull}) {
    +            $setNull
    +          } else {
    +            $writeField
    +          }
    +        """
    +    }
    +
    +    s"""
    +      $rowWriter.initialize($bufferHolder, ${inputs.length});
    +      ${ctx.splitExpressions(row, writeFields)}
    +    """
    +  }
    +
    +  // todo: if the nullability of array element is correct, we can use it to save null check.
    +  private def writeArrayToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      elementType: DataType,
    +      bufferHolder: String,
    +      needHeader: Boolean = true): String = {
    +    val arrayWriter = ctx.freshName("arrayWriter")
    +    ctx.addMutableState(arrayWriterClass, arrayWriter,
    +      s"this.$arrayWriter = new $arrayWriterClass();")
    +    val numElements = ctx.freshName("numElements")
    +    val index = ctx.freshName("index")
    +    val element = ctx.freshName("element")
    +
    +    val jt = ctx.javaType(elementType)
    +
    +    val fixedElementSize = elementType match {
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
    +      case _ if ctx.isPrimitiveType(jt) => elementType.defaultSize
    +      case _ => 0
    +    }
    +
    +    val writeElement = elementType match {
    +      case t: StructType =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeStructToBuffer(ctx, element, t.map(_.dataType), bufferHolder)}
    +        """
    +
    +      case a @ ArrayType(et, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeArrayToBuffer(ctx, element, et, bufferHolder)}
    +        """
    +
    +      case m @ MapType(kt, vt, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeMapToBuffer(ctx, element, kt, vt, bufferHolder)}
    +        """
    +
    +      case _ if ctx.isPrimitiveType(elementType) =>
    +        // Should we do word align?
    +        val dataSize = elementType.defaultSize
    +
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writePrimitiveType(ctx, element, elementType,
    +            s"$bufferHolder.buffer", s"$bufferHolder.cursor")}
    +          $bufferHolder.cursor += $dataSize;
    +        """
    +
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +        s"$arrayWriter.writeCompactDecimal($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case t: DecimalType =>
    +        s"$arrayWriter.write($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case NullType => ""
    +
    +      case _ => s"$arrayWriter.write($index, $element);"
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeArrayData) {
    +        $arrayWriterClass.directWrite($bufferHolder, (UnsafeArrayData) $input, $needHeader);
    +      } else {
    +        final int $numElements = $input.numElements();
    +        $arrayWriter.initialize($bufferHolder, $numElements, $needHeader, $fixedElementSize);
    +
    +        for (int $index = 0; $index < $numElements; $index++) {
    +          if ($input.isNullAt($index)) {
    +            $arrayWriter.setNullAt($index);
    +          } else {
    +            final $jt $element = ${ctx.getValue(input, elementType, index)};
    +            $writeElement
    +          }
    +        }
    +      }
    +    """
    +  }
    +
    +  // todo: if the nullability of value element is correct, we can use it to save null check.
    +  private def writeMapToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      keyType: DataType,
    +      valueType: DataType,
    +      bufferHolder: String): String = {
    +    val keys = ctx.freshName("keys")
    +    val values = ctx.freshName("values")
    +    val tmpCursor = ctx.freshName("tmpCursor")
    +
    +    s"""
    +      final ArrayData $keys = $input.keyArray();
    +      final ArrayData $values = $input.valueArray();
    +
    +      // Write the numElements into first 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $bufferHolder.cursor, $keys.numElements());
    +
    +      $bufferHolder.cursor += 8;
    +      // Remember the current cursor so that we can write numBytes of key array later.
    +      final int $tmpCursor = $bufferHolder.cursor;
    +
    +      ${writeArrayToBuffer(ctx, keys, keyType, bufferHolder, needHeader = false)}
    +      // Write the numBytes of key array into second 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $tmpCursor - 4, $bufferHolder.cursor - $tmpCursor);
    --- End diff --
    
    This will be wrong on BigEndian machine, we should use putLong()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143128553
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143132980
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40985000
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,278 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    +  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
    +
    +  // todo: if the nullability of field is correct, we can use it to save null check.
    +  private def writeStructToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      fieldTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) =>
    +      val fieldName = ctx.freshName("fieldName")
    +      val code = s"final ${ctx.javaType(dt)} $fieldName = ${ctx.getValue(input, dt, i.toString)};"
    +      val isNull = s"$input.isNullAt($i)"
    +      GeneratedExpressionCode(code, isNull, fieldName)
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeRow) {
    +        $rowWriterClass.directWrite($bufferHolder, (UnsafeRow) $input);
    +      } else {
    +        ${writeExpressionsToBuffer(ctx, input, fieldEvals, fieldTypes, bufferHolder)}
    +      }
    +    """
    +  }
    +
    +  private def writeExpressionsToBuffer(
    +      ctx: CodeGenContext,
    +      row: String,
    +      inputs: Seq[GeneratedExpressionCode],
    +      inputTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val rowWriter = ctx.freshName("rowWriter")
    +    ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new $rowWriterClass();")
    +
    +    val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
    +      case ((input, dt), index) =>
    +        val tmpCursor = ctx.freshName("tmpCursor")
    +
    +        val setNull = dt match {
    +          case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
    +            // Can't call setNullAt() for DecimalType with precision larger than 18.
    +            s"$rowWriter.write($index, null, 0, 0);"
    +          case _ => s"$rowWriter.setNullAt($index);"
    +        }
    +
    +        val writeField = dt match {
    +          case t: StructType =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeStructToBuffer(ctx, input.primitive, t.map(_.dataType), bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case a @ ArrayType(et, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeArrayToBuffer(ctx, input.primitive, et, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case m @ MapType(kt, vt, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeMapToBuffer(ctx, input.primitive, kt, vt, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case _ if ctx.isPrimitiveType(dt) =>
    +            val fieldOffset = ctx.freshName("fieldOffset")
    +            s"""
    +              final long $fieldOffset = $rowWriter.getFieldOffset($index);
    +              Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
    +              ${writePrimitiveType(ctx, input.primitive, dt, s"$bufferHolder.buffer", fieldOffset)}
    +            """
    +
    +          case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +            s"$rowWriter.writeCompactDecimal($index, ${input.primitive}, " +
    +              s"${t.precision}, ${t.scale});"
    +
    +          case t: DecimalType =>
    +            s"$rowWriter.write($index, ${input.primitive}, ${t.precision}, ${t.scale});"
    +
    +          case NullType => ""
    +
    +          case _ => s"$rowWriter.write($index, ${input.primitive});"
    +        }
    +
    +        s"""
    +          ${input.code}
    +          if (${input.isNull}) {
    +            $setNull
    +          } else {
    +            $writeField
    +          }
    +        """
    +    }
    +
    +    s"""
    +      $rowWriter.initialize($bufferHolder, ${inputs.length});
    +      ${ctx.splitExpressions(row, writeFields)}
    +    """
    +  }
    +
    +  // todo: if the nullability of array element is correct, we can use it to save null check.
    +  private def writeArrayToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      elementType: DataType,
    +      bufferHolder: String,
    +      needHeader: Boolean = true): String = {
    +    val arrayWriter = ctx.freshName("arrayWriter")
    +    ctx.addMutableState(arrayWriterClass, arrayWriter,
    +      s"this.$arrayWriter = new $arrayWriterClass();")
    +    val numElements = ctx.freshName("numElements")
    +    val index = ctx.freshName("index")
    +    val element = ctx.freshName("element")
    +
    +    val jt = ctx.javaType(elementType)
    +
    +    val fixedElementSize = elementType match {
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
    +      case _ if ctx.isPrimitiveType(jt) => elementType.defaultSize
    +      case _ => 0
    +    }
    +
    +    val writeElement = elementType match {
    +      case t: StructType =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeStructToBuffer(ctx, element, t.map(_.dataType), bufferHolder)}
    +        """
    +
    +      case a @ ArrayType(et, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeArrayToBuffer(ctx, element, et, bufferHolder)}
    +        """
    +
    +      case m @ MapType(kt, vt, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeMapToBuffer(ctx, element, kt, vt, bufferHolder)}
    +        """
    +
    +      case _ if ctx.isPrimitiveType(elementType) =>
    +        // Should we do word align?
    +        val dataSize = elementType.defaultSize
    +
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writePrimitiveType(ctx, element, elementType,
    +            s"$bufferHolder.buffer", s"$bufferHolder.cursor")}
    +          $bufferHolder.cursor += $dataSize;
    +        """
    +
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +        s"$arrayWriter.writeCompactDecimal($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case t: DecimalType =>
    +        s"$arrayWriter.write($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case NullType => ""
    +
    +      case _ => s"$arrayWriter.write($index, $element);"
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeArrayData) {
    +        $arrayWriterClass.directWrite($bufferHolder, (UnsafeArrayData) $input, $needHeader);
    +      } else {
    +        final int $numElements = $input.numElements();
    +        $arrayWriter.initialize($bufferHolder, $numElements, $needHeader, $fixedElementSize);
    +
    +        for (int $index = 0; $index < $numElements; $index++) {
    +          if ($input.isNullAt($index)) {
    +            $arrayWriter.setNullAt($index);
    +          } else {
    +            final $jt $element = ${ctx.getValue(input, elementType, index)};
    +            $writeElement
    +          }
    +        }
    +      }
    +    """
    +  }
    +
    +  // todo: if the nullability of value element is correct, we can use it to save null check.
    +  private def writeMapToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      keyType: DataType,
    +      valueType: DataType,
    +      bufferHolder: String): String = {
    +    val keys = ctx.freshName("keys")
    +    val values = ctx.freshName("values")
    +    val tmpCursor = ctx.freshName("tmpCursor")
    +
    +    s"""
    +      final ArrayData $keys = $input.keyArray();
    +      final ArrayData $values = $input.valueArray();
    +
    +      // Write the numElements into first 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $bufferHolder.cursor, $keys.numElements());
    +
    +      $bufferHolder.cursor += 8;
    +      // Remember the current cursor so that we can write numBytes of key array later.
    +      final int $tmpCursor = $bufferHolder.cursor;
    +
    +      ${writeArrayToBuffer(ctx, keys, keyType, bufferHolder, needHeader = false)}
    +      // Write the numBytes of key array into second 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $tmpCursor - 4, $bufferHolder.cursor - $tmpCursor);
    +
    +      ${writeArrayToBuffer(ctx, values, valueType, bufferHolder, needHeader = false)}
    +    """
    +  }
    +
    +  private def writePrimitiveType(
    +      ctx: CodeGenContext,
    +      input: String,
    +      dt: DataType,
    +      buffer: String,
    +      offset: String) = {
    +    assert(ctx.isPrimitiveType(dt))
    +
    +    val putMethod = s"put${ctx.primitiveTypeName(dt)}"
    +
    +    dt match {
    +      case FloatType | DoubleType =>
    +        val normalized = ctx.freshName("normalized")
    +        val boxedType = ctx.boxedType(dt)
    +        val handleNaN =
    +          s"""
    +            final ${ctx.javaType(dt)} $normalized;
    +            if ($boxedType.isNaN($input)) {
    +              $normalized = $boxedType.NaN;
    +            } else {
    +              $normalized = $input;
    +            }
    +          """
    +
    +        s"""
    +          $handleNaN
    +          Platform.$putMethod($buffer, $offset, $normalized);
    +        """
    +      case _ => s"Platform.$putMethod($buffer, $offset, $input);"
    +    }
    +  }
    +
       def createCode(ctx: CodeGenContext, expressions: Seq[Expression]): GeneratedExpressionCode = {
         val exprEvals = expressions.map(e => e.gen(ctx))
         val exprTypes = expressions.map(_.dataType)
    -    createCodeForStruct(ctx, "i", exprEvals, exprTypes)
    --- End diff --
    
    I'd like not, to keep the diff of this PR clear. We can remove the old code in a follow-up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40962839
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,278 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    +  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
    +
    +  // todo: if the nullability of field is correct, we can use it to save null check.
    +  private def writeStructToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      fieldTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) =>
    +      val fieldName = ctx.freshName("fieldName")
    +      val code = s"final ${ctx.javaType(dt)} $fieldName = ${ctx.getValue(input, dt, i.toString)};"
    +      val isNull = s"$input.isNullAt($i)"
    +      GeneratedExpressionCode(code, isNull, fieldName)
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeRow) {
    +        $rowWriterClass.directWrite($bufferHolder, (UnsafeRow) $input);
    +      } else {
    +        ${writeExpressionsToBuffer(ctx, input, fieldEvals, fieldTypes, bufferHolder)}
    +      }
    +    """
    +  }
    +
    +  private def writeExpressionsToBuffer(
    +      ctx: CodeGenContext,
    +      row: String,
    +      inputs: Seq[GeneratedExpressionCode],
    +      inputTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val rowWriter = ctx.freshName("rowWriter")
    +    ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new $rowWriterClass();")
    +
    +    val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
    +      case ((input, dt), index) =>
    +        val tmpCursor = ctx.freshName("tmpCursor")
    +
    +        val setNull = dt match {
    +          case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
    +            // Can't call setNullAt() for DecimalType with precision larger than 18.
    +            s"$rowWriter.write($index, null, 0, 0);"
    +          case _ => s"$rowWriter.setNullAt($index);"
    +        }
    +
    +        val writeField = dt match {
    +          case t: StructType =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeStructToBuffer(ctx, input.primitive, t.map(_.dataType), bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case a @ ArrayType(et, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeArrayToBuffer(ctx, input.primitive, et, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case m @ MapType(kt, vt, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeMapToBuffer(ctx, input.primitive, kt, vt, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case _ if ctx.isPrimitiveType(dt) =>
    +            val fieldOffset = ctx.freshName("fieldOffset")
    +            s"""
    +              final long $fieldOffset = $rowWriter.getFieldOffset($index);
    +              Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
    +              ${writePrimitiveType(ctx, input.primitive, dt, s"$bufferHolder.buffer", fieldOffset)}
    +            """
    +
    +          case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +            s"$rowWriter.writeCompactDecimal($index, ${input.primitive}, " +
    +              s"${t.precision}, ${t.scale});"
    +
    +          case t: DecimalType =>
    +            s"$rowWriter.write($index, ${input.primitive}, ${t.precision}, ${t.scale});"
    +
    +          case NullType => ""
    +
    +          case _ => s"$rowWriter.write($index, ${input.primitive});"
    +        }
    +
    +        s"""
    +          ${input.code}
    +          if (${input.isNull}) {
    +            $setNull
    +          } else {
    +            $writeField
    +          }
    +        """
    +    }
    +
    +    s"""
    +      $rowWriter.initialize($bufferHolder, ${inputs.length});
    +      ${ctx.splitExpressions(row, writeFields)}
    +    """
    +  }
    +
    +  // todo: if the nullability of array element is correct, we can use it to save null check.
    +  private def writeArrayToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      elementType: DataType,
    +      bufferHolder: String,
    +      needHeader: Boolean = true): String = {
    +    val arrayWriter = ctx.freshName("arrayWriter")
    +    ctx.addMutableState(arrayWriterClass, arrayWriter,
    +      s"this.$arrayWriter = new $arrayWriterClass();")
    +    val numElements = ctx.freshName("numElements")
    +    val index = ctx.freshName("index")
    +    val element = ctx.freshName("element")
    +
    +    val jt = ctx.javaType(elementType)
    +
    +    val fixedElementSize = elementType match {
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
    +      case _ if ctx.isPrimitiveType(jt) => elementType.defaultSize
    +      case _ => 0
    +    }
    +
    +    val writeElement = elementType match {
    +      case t: StructType =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeStructToBuffer(ctx, element, t.map(_.dataType), bufferHolder)}
    +        """
    +
    +      case a @ ArrayType(et, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeArrayToBuffer(ctx, element, et, bufferHolder)}
    +        """
    +
    +      case m @ MapType(kt, vt, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeMapToBuffer(ctx, element, kt, vt, bufferHolder)}
    +        """
    +
    +      case _ if ctx.isPrimitiveType(elementType) =>
    +        // Should we do word align?
    +        val dataSize = elementType.defaultSize
    +
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writePrimitiveType(ctx, element, elementType,
    +            s"$bufferHolder.buffer", s"$bufferHolder.cursor")}
    +          $bufferHolder.cursor += $dataSize;
    +        """
    +
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +        s"$arrayWriter.writeCompactDecimal($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case t: DecimalType =>
    +        s"$arrayWriter.write($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case NullType => ""
    +
    +      case _ => s"$arrayWriter.write($index, $element);"
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeArrayData) {
    +        $arrayWriterClass.directWrite($bufferHolder, (UnsafeArrayData) $input, $needHeader);
    +      } else {
    +        final int $numElements = $input.numElements();
    +        $arrayWriter.initialize($bufferHolder, $numElements, $needHeader, $fixedElementSize);
    +
    +        for (int $index = 0; $index < $numElements; $index++) {
    +          if ($input.isNullAt($index)) {
    +            $arrayWriter.setNullAt($index);
    +          } else {
    +            final $jt $element = ${ctx.getValue(input, elementType, index)};
    +            $writeElement
    +          }
    +        }
    +      }
    +    """
    +  }
    +
    +  // todo: if the nullability of value element is correct, we can use it to save null check.
    +  private def writeMapToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      keyType: DataType,
    +      valueType: DataType,
    +      bufferHolder: String): String = {
    +    val keys = ctx.freshName("keys")
    +    val values = ctx.freshName("values")
    +    val tmpCursor = ctx.freshName("tmpCursor")
    +
    +    s"""
    +      final ArrayData $keys = $input.keyArray();
    +      final ArrayData $values = $input.valueArray();
    +
    +      // Write the numElements into first 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $bufferHolder.cursor, $keys.numElements());
    +
    +      $bufferHolder.cursor += 8;
    +      // Remember the current cursor so that we can write numBytes of key array later.
    +      final int $tmpCursor = $bufferHolder.cursor;
    +
    +      ${writeArrayToBuffer(ctx, keys, keyType, bufferHolder, needHeader = false)}
    +      // Write the numBytes of key array into second 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $tmpCursor - 4, $bufferHolder.cursor - $tmpCursor);
    --- End diff --
    
    Could you document the format of UnsafeArrayData and UnsafeMapData somewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145094287
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r41169832
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java ---
    @@ -0,0 +1,179 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeArrayData` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeArrayWriter {
    +
    +  private BufferHolder holder;
    +  // The offset of the global buffer where we start to write this array.
    +  private int startingOffset;
    +
    +  public void initialize(
    +      BufferHolder holder,
    +      int numElements,
    +      boolean needHeader,
    +      int fixedElementSize) {
    +
    +    // We need 4 bytes each element to store offset.
    +    final int fixedSize = 4 * numElements;
    +
    +    if (needHeader) {
    +      // If header is required, we need extra 4 bytes to store the header.
    +      holder.grow(fixedSize + 4);
    +      // Writes the number of elements into first 4 bytes;
    +      Platform.putInt(holder.buffer, holder.cursor, numElements);
    +      holder.cursor += 4;
    +    } else {
    +      holder.grow(fixedSize);
    +    }
    +
    +    this.holder = holder;
    +    this.startingOffset = holder.cursor;
    +
    +    holder.cursor += fixedSize;
    +
    +    // Grows the global buffer ahead for fixed size data.
    +    if (fixedElementSize > 0) {
    +      holder.grow(fixedElementSize * numElements);
    +    }
    +  }
    +
    +  private long getElementOffset(int ordinal) {
    +    return startingOffset + 4 * ordinal;
    +  }
    +
    +  public void setNullAt(int ordinal) {
    +    final int relativeOffset = holder.cursor - startingOffset;
    +    // Writes negative offset value to represent null element.
    +    Platform.putInt(holder.buffer, getElementOffset(ordinal), -relativeOffset);
    +  }
    +
    +  public void setOffset(int ordinal) {
    +    final int relativeOffset = holder.cursor - startingOffset;
    +    Platform.putInt(holder.buffer, getElementOffset(ordinal), relativeOffset);
    +  }
    +
    +
    +
    +  public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) {
    +    // make sure Decimal object has the same scale as DecimalType
    +    if (input.changePrecision(precision, scale)) {
    +      Platform.putLong(holder.buffer, holder.cursor, input.toUnscaledLong());
    +      setOffset(ordinal);
    +      holder.cursor += 8;
    +    } else {
    +      setNullAt(ordinal);
    +    }
    +  }
    +
    +  public void write(int ordinal, Decimal input, int precision, int scale) {
    +    // make sure Decimal object has the same scale as DecimalType
    +    if (input.changePrecision(precision, scale)) {
    +      final byte[] bytes = input.toJavaBigDecimal().unscaledValue().toByteArray();
    +      assert bytes.length <= 16;
    +      holder.grow(bytes.length);
    +
    +      // Write the bytes to the variable length portion.
    +      Platform.copyMemory(
    +        bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, bytes.length);
    +      setOffset(ordinal);
    +      holder.cursor += bytes.length;
    +    } else {
    +      setNullAt(ordinal);
    +    }
    +  }
    +
    +  public void write(int ordinal, UTF8String input) {
    +    final int numBytes = input.numBytes();
    +
    +    // grow the global buffer before writing data.
    +    holder.grow(numBytes);
    +
    +    // Write the bytes to the variable length portion.
    +    input.writeToMemory(holder.buffer, holder.cursor);
    +
    +    setOffset(ordinal);
    +
    +    // move the cursor forward.
    +    holder.cursor += numBytes;
    +  }
    +
    +  public void write(int ordinal, byte[] input) {
    +    // grow the global buffer before writing data.
    +    holder.grow(input.length);
    +
    +    // Write the bytes to the variable length portion.
    +    Platform.copyMemory(
    +      input, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, input.length);
    +
    +    setOffset(ordinal);
    +
    +    // move the cursor forward.
    +    holder.cursor += input.length;
    +  }
    +
    +  public void write(int ordinal, CalendarInterval input) {
    +    // grow the global buffer before writing data.
    +    holder.grow(16);
    +
    +    // Write the months and microseconds fields of Interval to the variable length portion.
    +    Platform.putLong(holder.buffer, holder.cursor, input.months);
    +    Platform.putLong(holder.buffer, holder.cursor + 8, input.microseconds);
    +
    +    setOffset(ordinal);
    +
    +    // move the cursor forward.
    +    holder.cursor += 16;
    +  }
    +
    +
    +
    +  // If this array is already an UnsafeArray, we don't need to go through all elements, we can
    +  // directly write it.
    +  public static void directWrite(
    +      BufferHolder holder,
    +      UnsafeArrayData input,
    +      boolean needHeader) {
    +    final int numBytes = input.getSizeInBytes();
    +
    +    if (needHeader) {
    +      // If header is required, we need extra 4 bytes to store the header.
    +      holder.grow(numBytes + 4);
    +      // Writes the number of elements into first 4 bytes;
    +      Platform.putInt(holder.buffer, holder.cursor, input.numElements());
    --- End diff --
    
    Same to here, it's better to move this out of this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143155921
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40958537
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -0,0 +1,218 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.array.ByteArrayMethods;
    +import org.apache.spark.unsafe.bitset.BitSetMethods;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeRow` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeRowWriter {
    +
    +  private GlobalBufferHolder holder;
    +  // The offset of the global buffer where we start to write this row.
    +  private int startingOffset;
    +  private int nullBitsSize;
    +
    +  public void initialize(GlobalBufferHolder holder, int numFields) {
    +    this.holder = holder;
    +    this.startingOffset = holder.cursor;
    +    this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
    +
    +    // grow the global buffer to make sure it has enough space to write fixed-length data.
    +    final int fixedSize = nullBitsSize + 8 * numFields;
    +    holder.grow(fixedSize);
    +    holder.cursor += fixedSize;
    +
    +    // zero-out the null bits region
    +    for (int i = 0; i < nullBitsSize; i += 8) {
    +      Platform.putLong(holder.buffer, startingOffset + i, 0L);
    +    }
    +  }
    +
    +  private void zeroOutPaddingBytes(int numBytes) {
    +    if ((numBytes & 0x07) > 0) {
    +      Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 3), 0L);
    +    }
    +  }
    +
    +  public void setNullAt(int ordinal) {
    +    BitSetMethods.set(holder.buffer, startingOffset, ordinal);
    +    Platform.putLong(holder.buffer, getFieldOffset(ordinal), 0L);
    +  }
    +
    +  public long getFieldOffset(int ordinal) {
    +    return startingOffset + nullBitsSize + 8 * ordinal;
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long size) {
    +    setOffsetAndSize(ordinal, holder.cursor, size);
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long currentCursor, long size) {
    +    final long relativeOffset = currentCursor - startingOffset;
    +    final long fieldOffset = getFieldOffset(ordinal);
    +    final long offsetAndSize = (relativeOffset << 32) | size;
    +
    +    Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
    +  }
    +
    +  // Do word alignment for this row and return the number of bytes padded.
    +  // todo: remove this after we make unsafe array data word align.
    +  public void alignWords(int numBytes) {
    +    final int remainder = numBytes & 0x07;
    +
    +    if (remainder > 0) {
    +      final int paddingBytes = 8 - remainder;
    +      holder.grow(paddingBytes);
    +
    +      final byte[] orignalValues = new byte[remainder];
    +      Platform.copyMemory(
    +        holder.buffer,
    +        holder.cursor - remainder,
    +        orignalValues,
    +        Platform.BYTE_ARRAY_OFFSET,
    +        remainder);
    +
    +      Platform.putLong(holder.buffer, holder.cursor - remainder, 0);
    +
    +      Platform.copyMemory(
    +        orignalValues,
    +        Platform.BYTE_ARRAY_OFFSET,
    +        holder.buffer,
    +        holder.cursor - remainder,
    +        remainder);
    +
    +      holder.cursor += paddingBytes;
    +    }
    +  }
    +
    +
    +
    +  public void writeCompactDecimal(int ordinal, Decimal input, int precision, int scale) {
    +    // make sure Decimal object has the same scale as DecimalType
    +    if (input.changePrecision(precision, scale)) {
    +      Platform.putLong(holder.buffer, getFieldOffset(ordinal), input.toUnscaledLong());
    +    } else {
    +      setNullAt(ordinal);
    +    }
    +  }
    +
    +  public void write(int ordinal, Decimal input, int precision, int scale) {
    +    // grow the global buffer before writing data.
    +    holder.grow(16);
    +
    +    // zero-out the bytes
    +    Platform.putLong(holder.buffer, holder.cursor, 0L);
    +    Platform.putLong(holder.buffer, holder.cursor + 8, 0L);
    +
    +    // Make sure Decimal object has the same scale as DecimalType.
    +    // Note that we may pass in null Decimal object to set null for it.
    +    if (input == null || !input.changePrecision(precision, scale)) {
    +      BitSetMethods.set(holder.buffer, startingOffset, ordinal);
    +      // keep the offset for future update
    +      final long relativeOffset = holder.cursor - startingOffset;
    +      Platform.putLong(holder.buffer, getFieldOffset(ordinal), relativeOffset << 32);
    --- End diff --
    
    setOffsetAndSize(ordinal, 0)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40960644
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,278 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    +  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
    +
    +  // todo: if the nullability of field is correct, we can use it to save null check.
    +  private def writeStructToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      fieldTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) =>
    +      val fieldName = ctx.freshName("fieldName")
    +      val code = s"final ${ctx.javaType(dt)} $fieldName = ${ctx.getValue(input, dt, i.toString)};"
    +      val isNull = s"$input.isNullAt($i)"
    +      GeneratedExpressionCode(code, isNull, fieldName)
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeRow) {
    +        $rowWriterClass.directWrite($bufferHolder, (UnsafeRow) $input);
    +      } else {
    +        ${writeExpressionsToBuffer(ctx, input, fieldEvals, fieldTypes, bufferHolder)}
    +      }
    +    """
    +  }
    +
    +  private def writeExpressionsToBuffer(
    +      ctx: CodeGenContext,
    +      row: String,
    +      inputs: Seq[GeneratedExpressionCode],
    +      inputTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val rowWriter = ctx.freshName("rowWriter")
    +    ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new $rowWriterClass();")
    +
    +    val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
    +      case ((input, dt), index) =>
    +        val tmpCursor = ctx.freshName("tmpCursor")
    +
    +        val setNull = dt match {
    +          case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
    +            // Can't call setNullAt() for DecimalType with precision larger than 18.
    +            s"$rowWriter.write($index, null, 0, 0);"
    +          case _ => s"$rowWriter.setNullAt($index);"
    +        }
    +
    +        val writeField = dt match {
    +          case t: StructType =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeStructToBuffer(ctx, input.primitive, t.map(_.dataType), bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case a @ ArrayType(et, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeArrayToBuffer(ctx, input.primitive, et, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case m @ MapType(kt, vt, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeMapToBuffer(ctx, input.primitive, kt, vt, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case _ if ctx.isPrimitiveType(dt) =>
    +            val fieldOffset = ctx.freshName("fieldOffset")
    +            s"""
    +              final long $fieldOffset = $rowWriter.getFieldOffset($index);
    +              Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
    +              ${writePrimitiveType(ctx, input.primitive, dt, s"$bufferHolder.buffer", fieldOffset)}
    +            """
    +
    +          case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +            s"$rowWriter.writeCompactDecimal($index, ${input.primitive}, " +
    +              s"${t.precision}, ${t.scale});"
    +
    +          case t: DecimalType =>
    +            s"$rowWriter.write($index, ${input.primitive}, ${t.precision}, ${t.scale});"
    +
    +          case NullType => ""
    +
    +          case _ => s"$rowWriter.write($index, ${input.primitive});"
    +        }
    +
    +        s"""
    +          ${input.code}
    +          if (${input.isNull}) {
    +            $setNull
    +          } else {
    +            $writeField
    +          }
    +        """
    +    }
    +
    +    s"""
    +      $rowWriter.initialize($bufferHolder, ${inputs.length});
    +      ${ctx.splitExpressions(row, writeFields)}
    +    """
    +  }
    +
    +  // todo: if the nullability of array element is correct, we can use it to save null check.
    +  private def writeArrayToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      elementType: DataType,
    +      bufferHolder: String,
    +      needHeader: Boolean = true): String = {
    +    val arrayWriter = ctx.freshName("arrayWriter")
    +    ctx.addMutableState(arrayWriterClass, arrayWriter,
    +      s"this.$arrayWriter = new $arrayWriterClass();")
    +    val numElements = ctx.freshName("numElements")
    +    val index = ctx.freshName("index")
    +    val element = ctx.freshName("element")
    +
    +    val jt = ctx.javaType(elementType)
    +
    +    val fixedElementSize = elementType match {
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
    +      case _ if ctx.isPrimitiveType(jt) => elementType.defaultSize
    +      case _ => 0
    +    }
    +
    +    val writeElement = elementType match {
    +      case t: StructType =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeStructToBuffer(ctx, element, t.map(_.dataType), bufferHolder)}
    +        """
    +
    +      case a @ ArrayType(et, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeArrayToBuffer(ctx, element, et, bufferHolder)}
    +        """
    +
    +      case m @ MapType(kt, vt, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeMapToBuffer(ctx, element, kt, vt, bufferHolder)}
    +        """
    +
    +      case _ if ctx.isPrimitiveType(elementType) =>
    +        // Should we do word align?
    +        val dataSize = elementType.defaultSize
    +
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writePrimitiveType(ctx, element, elementType,
    +            s"$bufferHolder.buffer", s"$bufferHolder.cursor")}
    +          $bufferHolder.cursor += $dataSize;
    +        """
    +
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +        s"$arrayWriter.writeCompactDecimal($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case t: DecimalType =>
    +        s"$arrayWriter.write($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case NullType => ""
    +
    +      case _ => s"$arrayWriter.write($index, $element);"
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeArrayData) {
    +        $arrayWriterClass.directWrite($bufferHolder, (UnsafeArrayData) $input, $needHeader);
    +      } else {
    +        final int $numElements = $input.numElements();
    +        $arrayWriter.initialize($bufferHolder, $numElements, $needHeader, $fixedElementSize);
    +
    +        for (int $index = 0; $index < $numElements; $index++) {
    +          if ($input.isNullAt($index)) {
    +            $arrayWriter.setNullAt($index);
    +          } else {
    +            final $jt $element = ${ctx.getValue(input, elementType, index)};
    +            $writeElement
    +          }
    +        }
    +      }
    +    """
    +  }
    +
    +  // todo: if the nullability of value element is correct, we can use it to save null check.
    +  private def writeMapToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      keyType: DataType,
    +      valueType: DataType,
    +      bufferHolder: String): String = {
    +    val keys = ctx.freshName("keys")
    +    val values = ctx.freshName("values")
    +    val tmpCursor = ctx.freshName("tmpCursor")
    +
    +    s"""
    +      final ArrayData $keys = $input.keyArray();
    +      final ArrayData $values = $input.valueArray();
    +
    +      // Write the numElements into first 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $bufferHolder.cursor, $keys.numElements());
    +
    +      $bufferHolder.cursor += 8;
    +      // Remember the current cursor so that we can write numBytes of key array later.
    +      final int $tmpCursor = $bufferHolder.cursor;
    +
    +      ${writeArrayToBuffer(ctx, keys, keyType, bufferHolder, needHeader = false)}
    +      // Write the numBytes of key array into second 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $tmpCursor - 4, $bufferHolder.cursor - $tmpCursor);
    +
    +      ${writeArrayToBuffer(ctx, values, valueType, bufferHolder, needHeader = false)}
    +    """
    +  }
    +
    +  private def writePrimitiveType(
    +      ctx: CodeGenContext,
    +      input: String,
    +      dt: DataType,
    +      buffer: String,
    +      offset: String) = {
    +    assert(ctx.isPrimitiveType(dt))
    +
    +    val putMethod = s"put${ctx.primitiveTypeName(dt)}"
    +
    +    dt match {
    +      case FloatType | DoubleType =>
    +        val normalized = ctx.freshName("normalized")
    +        val boxedType = ctx.boxedType(dt)
    +        val handleNaN =
    +          s"""
    +            final ${ctx.javaType(dt)} $normalized;
    +            if ($boxedType.isNaN($input)) {
    +              $normalized = $boxedType.NaN;
    +            } else {
    +              $normalized = $input;
    +            }
    +          """
    +
    +        s"""
    +          $handleNaN
    +          Platform.$putMethod($buffer, $offset, $normalized);
    +        """
    +      case _ => s"Platform.$putMethod($buffer, $offset, $input);"
    +    }
    +  }
    +
       def createCode(ctx: CodeGenContext, expressions: Seq[Expression]): GeneratedExpressionCode = {
         val exprEvals = expressions.map(e => e.gen(ctx))
         val exprTypes = expressions.map(_.dataType)
    -    createCodeForStruct(ctx, "i", exprEvals, exprTypes)
    --- End diff --
    
    Should we delete all the old code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143128730
  
      [Test build #43008 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43008/console) for   PR 8747 at commit [`a50923d`](https://github.com/apache/spark/commit/a50923def1eab00f3a61a60b58fca1579ba4a0e7).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class GlobalBufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143135760
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43010/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145091842
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r41170167
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,282 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    --- End diff --
    
    private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145100463
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r41168782
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java ---
    @@ -0,0 +1,179 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeArrayData` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeArrayWriter {
    +
    +  private BufferHolder holder;
    +  // The offset of the global buffer where we start to write this array.
    +  private int startingOffset;
    +
    +  public void initialize(
    +      BufferHolder holder,
    +      int numElements,
    +      boolean needHeader,
    +      int fixedElementSize) {
    +
    +    // We need 4 bytes each element to store offset.
    +    final int fixedSize = 4 * numElements;
    +
    +    if (needHeader) {
    +      // If header is required, we need extra 4 bytes to store the header.
    +      holder.grow(fixedSize + 4);
    +      // Writes the number of elements into first 4 bytes;
    +      Platform.putInt(holder.buffer, holder.cursor, numElements);
    +      holder.cursor += 4;
    +    } else {
    +      holder.grow(fixedSize);
    +    }
    +
    +    this.holder = holder;
    +    this.startingOffset = holder.cursor;
    +
    +    holder.cursor += fixedSize;
    +
    +    // Grows the global buffer ahead for fixed size data.
    +    if (fixedElementSize > 0) {
    --- End diff --
    
    grow() can handler `needSize` with 0, so this check is not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145182270
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40957228
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/GlobalBufferHolder.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.unsafe.Platform;
    +
    +/**
    + * A helper class to manage the global row buffer used in `GenerateUnsafeProjection`.
    + *
    + * Note that it is only used in `GenerateUnsafeProjection`, so it's safe to mark member variables
    + * public for ease of use.
    + */
    +public class GlobalBufferHolder {
    +  public byte[] buffer = new byte[64];
    +  public int cursor = 0;
    +
    +  public void grow(int neededSize) {
    +    final int length = totalSize() + neededSize;
    +    if (buffer.length < length) {
    +      // This will not happen frequently, because the buffer is re-used.
    +      final byte[] tmp = new byte[length * 2];
    +      Platform.copyMemory(
    +        buffer,
    +        Platform.BYTE_ARRAY_OFFSET,
    +        tmp,
    +        Platform.BYTE_ARRAY_OFFSET,
    +        totalSize());
    +      buffer = tmp;
    +    }
    +  }
    +
    +  public void initialize() {
    --- End diff --
    
    This will called multiple times, so should it be called `reset`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-140162801
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-144845504
  
    @cloud-fan I did one pass, left a few minor comments. Considering all the combination of data types, This is pretty complicated, could you add more tests to improve the test coverage?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145095287
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43184/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145115497
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r41168642
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java ---
    @@ -0,0 +1,179 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeArrayData` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeArrayWriter {
    +
    +  private BufferHolder holder;
    +  // The offset of the global buffer where we start to write this array.
    +  private int startingOffset;
    +
    +  public void initialize(
    +      BufferHolder holder,
    +      int numElements,
    +      boolean needHeader,
    +      int fixedElementSize) {
    +
    +    // We need 4 bytes each element to store offset.
    +    final int fixedSize = 4 * numElements;
    +
    +    if (needHeader) {
    +      // If header is required, we need extra 4 bytes to store the header.
    +      holder.grow(fixedSize + 4);
    +      // Writes the number of elements into first 4 bytes;
    +      Platform.putInt(holder.buffer, holder.cursor, numElements);
    --- End diff --
    
    Maybe we could move this header out of `UnsafeArrayWriter`, that could be more clear (reducing branches).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145182196
  
      [Test build #43198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43198/console) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class BufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145100448
  
      [Test build #43186 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43186/console) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class BufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40957153
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/GlobalBufferHolder.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.unsafe.Platform;
    +
    +/**
    + * A helper class to manage the global row buffer used in `GenerateUnsafeProjection`.
    + *
    + * Note that it is only used in `GenerateUnsafeProjection`, so it's safe to mark member variables
    + * public for ease of use.
    + */
    +public class GlobalBufferHolder {
    +  public byte[] buffer = new byte[64];
    +  public int cursor = 0;
    --- End diff --
    
    Could we use `Platform.BYTE_ARRAY_OFFSET` as initial value? It's easy to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-140162805
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42426/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143134964
  
      [Test build #43010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43010/consoleFull) for   PR 8747 at commit [`ec24edf`](https://github.com/apache/spark/commit/ec24edf67faf354114b19263f2e0b11d50650ef0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145098204
  
      [Test build #43185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43185/console) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class BufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143155835
  
      [Test build #43013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43013/console) for   PR 8747 at commit [`44b1799`](https://github.com/apache/spark/commit/44b1799aa90cd7255ca4f860843fdc4f308626cf).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class GlobalBufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145159678
  
      [Test build #43198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43198/consoleFull) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145117815
  
      [Test build #43190 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43190/consoleFull) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-144935018
  
      [Test build #43174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43174/consoleFull) for   PR 8747 at commit [`427da03`](https://github.com/apache/spark/commit/427da03597df85af83df633d07457dcb1fdb283c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143155922
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43013/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145645748
  
      [Test build #43243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43243/console) for   PR 8747 at commit [`d7f941d`](https://github.com/apache/spark/commit/d7f941d4edc6e3165790f2546fc3e7f378f04250).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class BufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145157718
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145093283
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145095284
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145098218
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143139114
  
      [Test build #43013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43013/consoleFull) for   PR 8747 at commit [`44b1799`](https://github.com/apache/spark/commit/44b1799aa90cd7255ca4f860843fdc4f308626cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143135757
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-140118188
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143135749
  
      [Test build #43010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43010/console) for   PR 8747 at commit [`ec24edf`](https://github.com/apache/spark/commit/ec24edf67faf354114b19263f2e0b11d50650ef0).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class GlobalBufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145609556
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145182273
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43198/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145085586
  
      [Test build #43183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43183/consoleFull) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145088729
  
      [Test build #43183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43183/console) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class BufferHolder `
      * `public class UnsafeArrayWriter `
      * `public class UnsafeRowWriter `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r41170912
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,282 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    +  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
    +
    +  // todo: if the nullability of field is correct, we can use it to save null check.
    +  private def writeStructToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      fieldTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) =>
    +      val fieldName = ctx.freshName("fieldName")
    +      val code = s"final ${ctx.javaType(dt)} $fieldName = ${ctx.getValue(input, dt, i.toString)};"
    +      val isNull = s"$input.isNullAt($i)"
    +      GeneratedExpressionCode(code, isNull, fieldName)
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeRow) {
    +        $rowWriterClass.directWrite($bufferHolder, (UnsafeRow) $input);
    +      } else {
    +        ${writeExpressionsToBuffer(ctx, input, fieldEvals, fieldTypes, bufferHolder)}
    +      }
    +    """
    +  }
    +
    +  private def writeExpressionsToBuffer(
    +      ctx: CodeGenContext,
    +      row: String,
    +      inputs: Seq[GeneratedExpressionCode],
    +      inputTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val rowWriter = ctx.freshName("rowWriter")
    +    ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new $rowWriterClass();")
    +
    +    val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
    +      case ((input, dt), index) =>
    +        val tmpCursor = ctx.freshName("tmpCursor")
    +
    +        val setNull = dt match {
    +          case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
    +            // Can't call setNullAt() for DecimalType with precision larger than 18.
    +            s"$rowWriter.write($index, null, ${t.precision}, ${t.scale});"
    +          case _ => s"$rowWriter.setNullAt($index);"
    +        }
    +
    +        val writeField = dt match {
    +          case t: StructType =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeStructToBuffer(ctx, input.primitive, t.map(_.dataType), bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case a @ ArrayType(et, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeArrayToBuffer(ctx, input.primitive, et, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignToWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case m @ MapType(kt, vt, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeMapToBuffer(ctx, input.primitive, kt, vt, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignToWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case _ if ctx.isPrimitiveType(dt) =>
    +            val fieldOffset = ctx.freshName("fieldOffset")
    +            s"""
    +              final long $fieldOffset = $rowWriter.getFieldOffset($index);
    +              Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
    +              ${writePrimitiveType(ctx, input.primitive, dt, s"$bufferHolder.buffer", fieldOffset)}
    +            """
    +
    +          case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +            s"$rowWriter.writeCompactDecimal($index, ${input.primitive}, " +
    +              s"${t.precision}, ${t.scale});"
    +
    +          case t: DecimalType =>
    +            s"$rowWriter.write($index, ${input.primitive}, ${t.precision}, ${t.scale});"
    +
    +          case NullType => ""
    +
    +          case _ => s"$rowWriter.write($index, ${input.primitive});"
    +        }
    +
    +        s"""
    +          ${input.code}
    +          if (${input.isNull}) {
    +            $setNull
    +          } else {
    +            $writeField
    +          }
    +        """
    +    }
    +
    +    s"""
    +      $rowWriter.initialize($bufferHolder, ${inputs.length});
    +      ${ctx.splitExpressions(row, writeFields)}
    +    """
    +  }
    +
    +  // todo: if the nullability of array element is correct, we can use it to save null check.
    +  private def writeArrayToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      elementType: DataType,
    +      bufferHolder: String,
    +      needHeader: Boolean = true): String = {
    +    val arrayWriter = ctx.freshName("arrayWriter")
    +    ctx.addMutableState(arrayWriterClass, arrayWriter,
    +      s"this.$arrayWriter = new $arrayWriterClass();")
    +    val numElements = ctx.freshName("numElements")
    +    val index = ctx.freshName("index")
    +    val element = ctx.freshName("element")
    +
    +    val jt = ctx.javaType(elementType)
    +
    +    val fixedElementSize = elementType match {
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
    +      case _ if ctx.isPrimitiveType(jt) => elementType.defaultSize
    +      case _ => 0
    +    }
    +
    +    val writeElement = elementType match {
    +      case t: StructType =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeStructToBuffer(ctx, element, t.map(_.dataType), bufferHolder)}
    +        """
    +
    +      case a @ ArrayType(et, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeArrayToBuffer(ctx, element, et, bufferHolder)}
    +        """
    +
    +      case m @ MapType(kt, vt, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeMapToBuffer(ctx, element, kt, vt, bufferHolder)}
    +        """
    +
    +      case _ if ctx.isPrimitiveType(elementType) =>
    +        // Should we do word align?
    +        val dataSize = elementType.defaultSize
    +
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writePrimitiveType(ctx, element, elementType,
    +            s"$bufferHolder.buffer", s"$bufferHolder.cursor")}
    +          $bufferHolder.cursor += $dataSize;
    +        """
    +
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +        s"$arrayWriter.writeCompactDecimal($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case t: DecimalType =>
    +        s"$arrayWriter.write($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case NullType => ""
    +
    +      case _ => s"$arrayWriter.write($index, $element);"
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeArrayData) {
    +        $arrayWriterClass.directWrite($bufferHolder, (UnsafeArrayData) $input, $needHeader);
    +      } else {
    +        final int $numElements = $input.numElements();
    +        $arrayWriter.initialize($bufferHolder, $numElements, $needHeader, $fixedElementSize);
    +
    +        for (int $index = 0; $index < $numElements; $index++) {
    +          if ($input.isNullAt($index)) {
    +            $arrayWriter.setNullAt($index);
    +          } else {
    +            final $jt $element = ${ctx.getValue(input, elementType, index)};
    +            $writeElement
    +          }
    +        }
    +      }
    +    """
    +  }
    +
    +  // todo: if the nullability of value element is correct, we can use it to save null check.
    --- End diff --
    
    todo -> TODO


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40958320
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -0,0 +1,218 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.array.ByteArrayMethods;
    +import org.apache.spark.unsafe.bitset.BitSetMethods;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeRow` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeRowWriter {
    +
    +  private GlobalBufferHolder holder;
    +  // The offset of the global buffer where we start to write this row.
    +  private int startingOffset;
    +  private int nullBitsSize;
    +
    +  public void initialize(GlobalBufferHolder holder, int numFields) {
    +    this.holder = holder;
    +    this.startingOffset = holder.cursor;
    +    this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
    +
    +    // grow the global buffer to make sure it has enough space to write fixed-length data.
    +    final int fixedSize = nullBitsSize + 8 * numFields;
    +    holder.grow(fixedSize);
    +    holder.cursor += fixedSize;
    +
    +    // zero-out the null bits region
    +    for (int i = 0; i < nullBitsSize; i += 8) {
    +      Platform.putLong(holder.buffer, startingOffset + i, 0L);
    +    }
    +  }
    +
    +  private void zeroOutPaddingBytes(int numBytes) {
    +    if ((numBytes & 0x07) > 0) {
    +      Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 3), 0L);
    +    }
    +  }
    +
    +  public void setNullAt(int ordinal) {
    +    BitSetMethods.set(holder.buffer, startingOffset, ordinal);
    +    Platform.putLong(holder.buffer, getFieldOffset(ordinal), 0L);
    +  }
    +
    +  public long getFieldOffset(int ordinal) {
    +    return startingOffset + nullBitsSize + 8 * ordinal;
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long size) {
    +    setOffsetAndSize(ordinal, holder.cursor, size);
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long currentCursor, long size) {
    +    final long relativeOffset = currentCursor - startingOffset;
    +    final long fieldOffset = getFieldOffset(ordinal);
    +    final long offsetAndSize = (relativeOffset << 32) | size;
    +
    +    Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
    +  }
    +
    +  // Do word alignment for this row and return the number of bytes padded.
    +  // todo: remove this after we make unsafe array data word align.
    +  public void alignWords(int numBytes) {
    +    final int remainder = numBytes & 0x07;
    +
    +    if (remainder > 0) {
    +      final int paddingBytes = 8 - remainder;
    +      holder.grow(paddingBytes);
    +
    +      final byte[] orignalValues = new byte[remainder];
    +      Platform.copyMemory(
    +        holder.buffer,
    +        holder.cursor - remainder,
    +        orignalValues,
    +        Platform.BYTE_ARRAY_OFFSET,
    +        remainder);
    +
    +      Platform.putLong(holder.buffer, holder.cursor - remainder, 0);
    --- End diff --
    
    It's better to call `putByte(buffer, cursor, 0)` in a loop. Because there are loops in copyMemory() already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40962693
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala ---
    @@ -393,10 +393,278 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro
         case _ => input
       }
     
    +  val rowWriterClass = classOf[UnsafeRowWriter].getName
    +  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
    +
    +  // todo: if the nullability of field is correct, we can use it to save null check.
    +  private def writeStructToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      fieldTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) =>
    +      val fieldName = ctx.freshName("fieldName")
    +      val code = s"final ${ctx.javaType(dt)} $fieldName = ${ctx.getValue(input, dt, i.toString)};"
    +      val isNull = s"$input.isNullAt($i)"
    +      GeneratedExpressionCode(code, isNull, fieldName)
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeRow) {
    +        $rowWriterClass.directWrite($bufferHolder, (UnsafeRow) $input);
    +      } else {
    +        ${writeExpressionsToBuffer(ctx, input, fieldEvals, fieldTypes, bufferHolder)}
    +      }
    +    """
    +  }
    +
    +  private def writeExpressionsToBuffer(
    +      ctx: CodeGenContext,
    +      row: String,
    +      inputs: Seq[GeneratedExpressionCode],
    +      inputTypes: Seq[DataType],
    +      bufferHolder: String): String = {
    +    val rowWriter = ctx.freshName("rowWriter")
    +    ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new $rowWriterClass();")
    +
    +    val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
    +      case ((input, dt), index) =>
    +        val tmpCursor = ctx.freshName("tmpCursor")
    +
    +        val setNull = dt match {
    +          case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
    +            // Can't call setNullAt() for DecimalType with precision larger than 18.
    +            s"$rowWriter.write($index, null, 0, 0);"
    +          case _ => s"$rowWriter.setNullAt($index);"
    +        }
    +
    +        val writeField = dt match {
    +          case t: StructType =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeStructToBuffer(ctx, input.primitive, t.map(_.dataType), bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case a @ ArrayType(et, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeArrayToBuffer(ctx, input.primitive, et, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case m @ MapType(kt, vt, _) =>
    +            s"""
    +              // Remember the current cursor so that we can calculate how many bytes are
    +              // written later.
    +              final int $tmpCursor = $bufferHolder.cursor;
    +              ${writeMapToBuffer(ctx, input.primitive, kt, vt, bufferHolder)}
    +              $rowWriter.setOffsetAndSize($index, $tmpCursor, $bufferHolder.cursor - $tmpCursor);
    +              $rowWriter.alignWords($bufferHolder.cursor - $tmpCursor);
    +            """
    +
    +          case _ if ctx.isPrimitiveType(dt) =>
    +            val fieldOffset = ctx.freshName("fieldOffset")
    +            s"""
    +              final long $fieldOffset = $rowWriter.getFieldOffset($index);
    +              Platform.putLong($bufferHolder.buffer, $fieldOffset, 0L);
    +              ${writePrimitiveType(ctx, input.primitive, dt, s"$bufferHolder.buffer", fieldOffset)}
    +            """
    +
    +          case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +            s"$rowWriter.writeCompactDecimal($index, ${input.primitive}, " +
    +              s"${t.precision}, ${t.scale});"
    +
    +          case t: DecimalType =>
    +            s"$rowWriter.write($index, ${input.primitive}, ${t.precision}, ${t.scale});"
    +
    +          case NullType => ""
    +
    +          case _ => s"$rowWriter.write($index, ${input.primitive});"
    +        }
    +
    +        s"""
    +          ${input.code}
    +          if (${input.isNull}) {
    +            $setNull
    +          } else {
    +            $writeField
    +          }
    +        """
    +    }
    +
    +    s"""
    +      $rowWriter.initialize($bufferHolder, ${inputs.length});
    +      ${ctx.splitExpressions(row, writeFields)}
    +    """
    +  }
    +
    +  // todo: if the nullability of array element is correct, we can use it to save null check.
    +  private def writeArrayToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      elementType: DataType,
    +      bufferHolder: String,
    +      needHeader: Boolean = true): String = {
    +    val arrayWriter = ctx.freshName("arrayWriter")
    +    ctx.addMutableState(arrayWriterClass, arrayWriter,
    +      s"this.$arrayWriter = new $arrayWriterClass();")
    +    val numElements = ctx.freshName("numElements")
    +    val index = ctx.freshName("index")
    +    val element = ctx.freshName("element")
    +
    +    val jt = ctx.javaType(elementType)
    +
    +    val fixedElementSize = elementType match {
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
    +      case _ if ctx.isPrimitiveType(jt) => elementType.defaultSize
    +      case _ => 0
    +    }
    +
    +    val writeElement = elementType match {
    +      case t: StructType =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeStructToBuffer(ctx, element, t.map(_.dataType), bufferHolder)}
    +        """
    +
    +      case a @ ArrayType(et, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeArrayToBuffer(ctx, element, et, bufferHolder)}
    +        """
    +
    +      case m @ MapType(kt, vt, _) =>
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writeMapToBuffer(ctx, element, kt, vt, bufferHolder)}
    +        """
    +
    +      case _ if ctx.isPrimitiveType(elementType) =>
    +        // Should we do word align?
    +        val dataSize = elementType.defaultSize
    +
    +        s"""
    +          $arrayWriter.setOffset($index);
    +          ${writePrimitiveType(ctx, element, elementType,
    +            s"$bufferHolder.buffer", s"$bufferHolder.cursor")}
    +          $bufferHolder.cursor += $dataSize;
    +        """
    +
    +      case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS =>
    +        s"$arrayWriter.writeCompactDecimal($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case t: DecimalType =>
    +        s"$arrayWriter.write($index, $element, ${t.precision}, ${t.scale});"
    +
    +      case NullType => ""
    +
    +      case _ => s"$arrayWriter.write($index, $element);"
    +    }
    +
    +    s"""
    +      if ($input instanceof UnsafeArrayData) {
    +        $arrayWriterClass.directWrite($bufferHolder, (UnsafeArrayData) $input, $needHeader);
    +      } else {
    +        final int $numElements = $input.numElements();
    +        $arrayWriter.initialize($bufferHolder, $numElements, $needHeader, $fixedElementSize);
    +
    +        for (int $index = 0; $index < $numElements; $index++) {
    +          if ($input.isNullAt($index)) {
    +            $arrayWriter.setNullAt($index);
    +          } else {
    +            final $jt $element = ${ctx.getValue(input, elementType, index)};
    +            $writeElement
    +          }
    +        }
    +      }
    +    """
    +  }
    +
    +  // todo: if the nullability of value element is correct, we can use it to save null check.
    +  private def writeMapToBuffer(
    +      ctx: CodeGenContext,
    +      input: String,
    +      keyType: DataType,
    +      valueType: DataType,
    +      bufferHolder: String): String = {
    +    val keys = ctx.freshName("keys")
    +    val values = ctx.freshName("values")
    +    val tmpCursor = ctx.freshName("tmpCursor")
    +
    +    s"""
    +      final ArrayData $keys = $input.keyArray();
    +      final ArrayData $values = $input.valueArray();
    +
    +      // Write the numElements into first 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $bufferHolder.cursor, $keys.numElements());
    +
    +      $bufferHolder.cursor += 8;
    +      // Remember the current cursor so that we can write numBytes of key array later.
    +      final int $tmpCursor = $bufferHolder.cursor;
    +
    +      ${writeArrayToBuffer(ctx, keys, keyType, bufferHolder, needHeader = false)}
    +      // Write the numBytes of key array into second 4 bytes.
    +      Platform.putInt($bufferHolder.buffer, $tmpCursor - 4, $bufferHolder.cursor - $tmpCursor);
    --- End diff --
    
    Never mind, I was wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145603625
  
    LGTM, left some minor comments about coding style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145645883
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43243/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145088789
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43183/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8747#discussion_r40958023
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -0,0 +1,218 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
    +import org.apache.spark.sql.types.Decimal;
    +import org.apache.spark.unsafe.Platform;
    +import org.apache.spark.unsafe.array.ByteArrayMethods;
    +import org.apache.spark.unsafe.bitset.BitSetMethods;
    +import org.apache.spark.unsafe.types.CalendarInterval;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A helper class to write data into global row buffer using `UnsafeRow` format,
    + * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}.
    + */
    +public class UnsafeRowWriter {
    +
    +  private GlobalBufferHolder holder;
    +  // The offset of the global buffer where we start to write this row.
    +  private int startingOffset;
    +  private int nullBitsSize;
    +
    +  public void initialize(GlobalBufferHolder holder, int numFields) {
    +    this.holder = holder;
    +    this.startingOffset = holder.cursor;
    +    this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
    +
    +    // grow the global buffer to make sure it has enough space to write fixed-length data.
    +    final int fixedSize = nullBitsSize + 8 * numFields;
    +    holder.grow(fixedSize);
    +    holder.cursor += fixedSize;
    +
    +    // zero-out the null bits region
    +    for (int i = 0; i < nullBitsSize; i += 8) {
    +      Platform.putLong(holder.buffer, startingOffset + i, 0L);
    +    }
    +  }
    +
    +  private void zeroOutPaddingBytes(int numBytes) {
    +    if ((numBytes & 0x07) > 0) {
    +      Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 3), 0L);
    +    }
    +  }
    +
    +  public void setNullAt(int ordinal) {
    +    BitSetMethods.set(holder.buffer, startingOffset, ordinal);
    +    Platform.putLong(holder.buffer, getFieldOffset(ordinal), 0L);
    +  }
    +
    +  public long getFieldOffset(int ordinal) {
    +    return startingOffset + nullBitsSize + 8 * ordinal;
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long size) {
    +    setOffsetAndSize(ordinal, holder.cursor, size);
    +  }
    +
    +  public void setOffsetAndSize(int ordinal, long currentCursor, long size) {
    +    final long relativeOffset = currentCursor - startingOffset;
    +    final long fieldOffset = getFieldOffset(ordinal);
    +    final long offsetAndSize = (relativeOffset << 32) | size;
    +
    +    Platform.putLong(holder.buffer, fieldOffset, offsetAndSize);
    +  }
    +
    +  // Do word alignment for this row and return the number of bytes padded.
    +  // todo: remove this after we make unsafe array data word align.
    +  public void alignWords(int numBytes) {
    +    final int remainder = numBytes & 0x07;
    +
    +    if (remainder > 0) {
    +      final int paddingBytes = 8 - remainder;
    +      holder.grow(paddingBytes);
    +
    +      final byte[] orignalValues = new byte[remainder];
    --- End diff --
    
    This could be re-used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145084677
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-140119151
  
      [Test build #42426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42426/consoleFull) for   PR 8747 at commit [`858f198`](https://github.com/apache/spark/commit/858f198d9bfe6f7b5a2750463ae0f5383d2a7f1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143128736
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43008/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-140118156
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145098687
  
      [Test build #43186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43186/consoleFull) for   PR 8747 at commit [`8d5725b`](https://github.com/apache/spark/commit/8d5725bd3db9be82b19d51b6899bced88482df67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145092217
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143128733
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145096291
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145610401
  
      [Test build #43243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43243/consoleFull) for   PR 8747 at commit [`d7f941d`](https://github.com/apache/spark/commit/d7f941d4edc6e3165790f2546fc3e7f378f04250).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145115477
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145158369
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by tyro89 <gi...@git.apache.org>.

Github user tyro89 commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-161338170
  
    I believe there might be an issue here with respect to the buffer growing strategy: https://issues.apache.org/jira/browse/SPARK-12089


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145609517
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145092195
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145088785
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8747


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-145114704
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8747#issuecomment-143138322
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org