You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2017/07/19 10:52:46 UTC

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/18680

    [SPARK-21472][SQL] Introduce ArrowColumnVector as a reader for Arrow vectors.

    ## What changes were proposed in this pull request?
    
    Introducing `ArrowColumnVector` as a reader for Arrow vectors.
    It extends `ColumnVector`, so we will be able to use it with `ColumnarBatch` and its functionalities.
    Currently it supports primitive types and `StringType`, `ArrayType` and `StructType`.
    
    ## How was this patch tested?
    
    Added tests for `ArrowColumnVector` and existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-21472

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18680
    
----
commit 73899b26d10ed3763569f4aa2e836643a5ce941a
Author: Takuya UESHIN <ue...@databricks.com>
Date:   2017-07-19T05:50:22Z

    Introduce ArrowColumnVector as a reader for Arrow vectors.

commit 689f86f761009d2220d9679102770a8763e55573
Author: Takuya UESHIN <ue...@databricks.com>
Date:   2017-07-19T06:22:24Z

    Import ArrowUtils and use it.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128314448
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ReadOnlyColumnVector.java ---
    @@ -0,0 +1,250 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.types.*;
    +
    +/**
    + * An abstract class for read-only column vector.
    + */
    +public abstract class ReadOnlyColumnVector extends ColumnVector {
    +
    +  protected ReadOnlyColumnVector(int capacity, MemoryMode memMode) {
    --- End diff --
    
    Is there any reason not to accept `dataType` as one of argument? It would be more flexible for future usages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128216750
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,510 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ColumnVector {
    +
    +  private ValueVector vector;
    +  private ValueVector.Accessor nulls;
    +
    +  private NullableBitVector boolData;
    +  private NullableTinyIntVector byteData;
    +  private NullableSmallIntVector shortData;
    +  private NullableIntVector intData;
    +  private NullableBigIntVector longData;
    +
    +  private NullableFloat4Vector floatData;
    +  private NullableFloat8Vector doubleData;
    +  private NullableDecimalVector decimalData;
    +
    +  private NullableVarCharVector stringData;
    +
    +  private NullableVarBinaryVector binaryData;
    +
    +  private UInt4Vector listOffsetData;
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), DataTypes.NullType, MemoryMode.OFF_HEAP);
    +    initialize(vector);
    +  }
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    vector.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public void putNotNull(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNull(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNulls(int rowId, int count) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNotNulls(int rowId, int count) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return nulls.isNull(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public void putBoolean(int rowId, boolean value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putBooleans(int rowId, int count, boolean value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return boolData.getAccessor().get(rowId) == 1;
    --- End diff --
    
    Can we use `nulls`? Ditto for other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128477508
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Bytes
    +  //
    +
    +  @Override
    +  public byte getByte(int rowId) {
    +    return accessor.getByte(rowId);
    +  }
    +
    +  @Override
    +  public byte[] getBytes(int rowId, int count) {
    +    byte[] array = new byte[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getByte(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Shorts
    +  //
    +
    +  @Override
    +  public short getShort(int rowId) {
    +    return accessor.getShort(rowId);
    +  }
    +
    +  @Override
    +  public short[] getShorts(int rowId, int count) {
    +    short[] array = new short[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getShort(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Ints
    +  //
    +
    +  @Override
    +  public int getInt(int rowId) {
    +    return accessor.getInt(rowId);
    +  }
    +
    +  @Override
    +  public int[] getInts(int rowId, int count) {
    +    int[] array = new int[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getInt(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  @Override
    +  public int getDictId(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Longs
    +  //
    +
    +  @Override
    +  public long getLong(int rowId) {
    +    return accessor.getLong(rowId);
    +  }
    +
    +  @Override
    +  public long[] getLongs(int rowId, int count) {
    +    long[] array = new long[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getLong(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public float getFloat(int rowId) {
    +    return accessor.getFloat(rowId);
    +  }
    +
    +  @Override
    +  public float[] getFloats(int rowId, int count) {
    +    float[] array = new float[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getFloat(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with doubles
    +  //
    +
    +  @Override
    +  public double getDouble(int rowId) {
    +    return accessor.getDouble(rowId);
    +  }
    +
    +  @Override
    +  public double[] getDoubles(int rowId, int count) {
    +    double[] array = new double[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getDouble(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Arrays
    +  //
    +
    +  @Override
    +  public int getArrayLength(int rowId) {
    +    return accessor.getArrayLength(rowId);
    +  }
    +
    +  @Override
    +  public int getArrayOffset(int rowId) {
    +    return accessor.getArrayOffset(rowId);
    +  }
    +
    +  @Override
    +  public void loadBytes(Array array) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Decimals
    +  //
    +
    +  @Override
    +  public Decimal getDecimal(int rowId, int precision, int scale) {
    +    return accessor.getDecimal(rowId, precision, scale);
    +  }
    +
    +  //
    +  // APIs dealing with UTF8Strings
    +  //
    +
    +  @Override
    +  public UTF8String getUTF8String(int rowId) {
    +    return accessor.getUTF8String(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Binaries
    +  //
    +
    +  @Override
    +  public byte[] getBinary(int rowId) {
    +    return accessor.getBinary(rowId);
    +  }
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), ArrowUtils.fromArrowField(vector.getField()),
    +      MemoryMode.OFF_HEAP);
    +
    +    if (vector instanceof NullableBitVector) {
    +      accessor = new BooleanAccessor((NullableBitVector) vector);
    +    } else if (vector instanceof NullableTinyIntVector) {
    +      accessor = new ByteAccessor((NullableTinyIntVector) vector);
    +    } else if (vector instanceof NullableSmallIntVector) {
    +      accessor = new ShortAccessor((NullableSmallIntVector) vector);
    +    } else if (vector instanceof NullableIntVector) {
    +      accessor = new IntAccessor((NullableIntVector) vector);
    +    } else if (vector instanceof NullableBigIntVector) {
    +      accessor = new LongAccessor((NullableBigIntVector) vector);
    +    } else if (vector instanceof NullableFloat4Vector) {
    +      accessor = new FloatAccessor((NullableFloat4Vector) vector);
    +    } else if (vector instanceof NullableFloat8Vector) {
    +      accessor = new DoubleAccessor((NullableFloat8Vector) vector);
    +    } else if (vector instanceof NullableDecimalVector) {
    +      accessor = new DecimalAccessor((NullableDecimalVector) vector);
    +    } else if (vector instanceof NullableVarCharVector) {
    +      accessor = new StringAccessor((NullableVarCharVector) vector);
    +    } else if (vector instanceof NullableVarBinaryVector) {
    +      accessor = new BinaryAccessor((NullableVarBinaryVector) vector);
    +    } else if (vector instanceof ListVector) {
    +      ListVector listVector = (ListVector) vector;
    +      accessor = new ArrayAccessor(listVector);
    +
    +      childColumns = new ColumnVector[1];
    +      childColumns[0] = new ArrowColumnVector(listVector.getDataVector());
    +      resultArray = new Array(childColumns[0]);
    +    } else if (vector instanceof MapVector) {
    +      MapVector mapVector = (MapVector) vector;
    +      accessor = new StructAccessor(mapVector);
    +
    +      childColumns = new ArrowColumnVector[mapVector.size()];
    +      for (int i = 0; i < childColumns.length; ++i) {
    +        childColumns[i] = new ArrowColumnVector(mapVector.getVectorById(i));
    +      }
    +      resultStruct = new ColumnarBatch.Row(childColumns);
    +    } else {
    +      throw new UnsupportedOperationException();
    +    }
    +    numNulls = accessor.getNullCount();
    +    anyNullsSet = numNulls > 0;
    +  }
    +
    +  private static abstract class ArrowVectorAccessor {
    +
    +    private final ValueVector vector;
    +    private final ValueVector.Accessor nulls;
    +
    +    ArrowVectorAccessor(ValueVector vector) {
    +      this.vector = vector;
    +      this.nulls = vector.getAccessor();
    +    }
    +
    +    final boolean isNullAt(int rowId) {
    +      return nulls.isNull(rowId);
    +    }
    +
    +    final int getNullCount() {
    +      return nulls.getNullCount();
    +    }
    +
    +    final void close() {
    +      vector.close();
    +    }
    +
    +    boolean getBoolean(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    byte getByte(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    short getShort(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    int getInt(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    long getLong(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    float getFloat(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    double getDouble(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    Decimal getDecimal(int rowId, int precision, int scale) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    UTF8String getUTF8String(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    byte[] getBinary(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    int getArrayLength(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    int getArrayOffset(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +  }
    +
    +  private static class BooleanAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableBitVector.Accessor accessor;
    +
    +    BooleanAccessor(NullableBitVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final boolean getBoolean(int rowId) {
    +      return accessor.get(rowId) == 1;
    +    }
    +  }
    +
    +  private static class ByteAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableTinyIntVector.Accessor accessor;
    +
    +    ByteAccessor(NullableTinyIntVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final byte getByte(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class ShortAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableSmallIntVector.Accessor accessor;
    +
    +    ShortAccessor(NullableSmallIntVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final short getShort(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class IntAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableIntVector.Accessor accessor;
    +
    +    IntAccessor(NullableIntVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final int getInt(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class LongAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableBigIntVector.Accessor accessor;
    +
    +    LongAccessor(NullableBigIntVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final long getLong(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class FloatAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableFloat4Vector.Accessor accessor;
    +
    +    FloatAccessor(NullableFloat4Vector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final float getFloat(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class DoubleAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableFloat8Vector.Accessor accessor;
    +
    +    DoubleAccessor(NullableFloat8Vector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final double getDouble(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class DecimalAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableDecimalVector.Accessor accessor;
    +
    +    DecimalAccessor(NullableDecimalVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final Decimal getDecimal(int rowId, int precision, int scale) {
    +      if (isNullAt(rowId)) return null;
    +      return Decimal.apply(accessor.getObject(rowId), precision, scale);
    +    }
    +  }
    +
    +  private static class StringAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableVarCharVector.Accessor accessor;
    +    private final NullableVarCharHolder stringResult = new NullableVarCharHolder();
    +
    +    StringAccessor(NullableVarCharVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final UTF8String getUTF8String(int rowId) {
    +      accessor.get(rowId, stringResult);
    +      if (stringResult.isSet == 0) {
    +        return null;
    +      } else {
    +        return UTF8String.fromAddress(null,
    +          stringResult.buffer.memoryAddress() + stringResult.start,
    +          stringResult.end - stringResult.start);
    +      }
    +    }
    +  }
    +
    +  private static class BinaryAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableVarBinaryVector.Accessor accessor;
    +
    +    BinaryAccessor(NullableVarBinaryVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final byte[] getBinary(int rowId) {
    +      return accessor.getObject(rowId);
    +    }
    +  }
    +
    +  private static class ArrayAccessor extends ArrowVectorAccessor {
    +
    +    private final UInt4Vector.Accessor accessor;
    +
    +    ArrayAccessor(ListVector vector) {
    +      super(vector);
    +      this.accessor = vector.getOffsetVector().getAccessor();
    +    }
    +
    +    @Override
    +    final int getArrayLength(int rowId) {
    +      return accessor.get(rowId + 1) - accessor.get(rowId);
    --- End diff --
    
    Yes, the offset vector for `ListVector` should have num of arrays + 1 values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79752/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    **[Test build #79793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79793/testReport)** for PR 18680 at commit [`2d1dad9`](https://github.com/apache/spark/commit/2d1dad9ac6bc2cfa4a4dcad32ef99464bc7f6541).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128316054
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala ---
    @@ -0,0 +1,109 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.arrow
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.arrow.memory.RootAllocator
    +import org.apache.arrow.vector.types.FloatingPointPrecision
    +import org.apache.arrow.vector.types.pojo.{ArrowType, Field, FieldType, Schema}
    +
    +import org.apache.spark.sql.types._
    +
    +object ArrowUtils {
    --- End diff --
    
    shouldn't this be private[sql]?  also in other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    @BryanCutler Thank you for reviewing!
    As for scope, yes, I'd like these APIs to be public. Do you have any concerns about it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128449351
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    --- End diff --
    
    we don't need to address this now, but do we have a better implementation with arrow? cc @BryanCutler 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128451896
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Bytes
    +  //
    +
    +  @Override
    +  public byte getByte(int rowId) {
    +    return accessor.getByte(rowId);
    +  }
    +
    +  @Override
    +  public byte[] getBytes(int rowId, int count) {
    +    byte[] array = new byte[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getByte(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Shorts
    +  //
    +
    +  @Override
    +  public short getShort(int rowId) {
    +    return accessor.getShort(rowId);
    +  }
    +
    +  @Override
    +  public short[] getShorts(int rowId, int count) {
    +    short[] array = new short[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getShort(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Ints
    +  //
    +
    +  @Override
    +  public int getInt(int rowId) {
    +    return accessor.getInt(rowId);
    +  }
    +
    +  @Override
    +  public int[] getInts(int rowId, int count) {
    +    int[] array = new int[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getInt(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  @Override
    +  public int getDictId(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Longs
    +  //
    +
    +  @Override
    +  public long getLong(int rowId) {
    +    return accessor.getLong(rowId);
    +  }
    +
    +  @Override
    +  public long[] getLongs(int rowId, int count) {
    +    long[] array = new long[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getLong(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public float getFloat(int rowId) {
    +    return accessor.getFloat(rowId);
    +  }
    +
    +  @Override
    +  public float[] getFloats(int rowId, int count) {
    +    float[] array = new float[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getFloat(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with doubles
    +  //
    +
    +  @Override
    +  public double getDouble(int rowId) {
    +    return accessor.getDouble(rowId);
    +  }
    +
    +  @Override
    +  public double[] getDoubles(int rowId, int count) {
    +    double[] array = new double[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getDouble(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Arrays
    +  //
    +
    +  @Override
    +  public int getArrayLength(int rowId) {
    +    return accessor.getArrayLength(rowId);
    +  }
    +
    +  @Override
    +  public int getArrayOffset(int rowId) {
    +    return accessor.getArrayOffset(rowId);
    +  }
    +
    +  @Override
    +  public void loadBytes(Array array) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Decimals
    +  //
    +
    +  @Override
    +  public Decimal getDecimal(int rowId, int precision, int scale) {
    +    return accessor.getDecimal(rowId, precision, scale);
    +  }
    +
    +  //
    +  // APIs dealing with UTF8Strings
    +  //
    +
    +  @Override
    +  public UTF8String getUTF8String(int rowId) {
    +    return accessor.getUTF8String(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Binaries
    +  //
    +
    +  @Override
    +  public byte[] getBinary(int rowId) {
    +    return accessor.getBinary(rowId);
    +  }
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), ArrowUtils.fromArrowField(vector.getField()),
    +      MemoryMode.OFF_HEAP);
    +
    +    if (vector instanceof NullableBitVector) {
    +      accessor = new BooleanAccessor((NullableBitVector) vector);
    +    } else if (vector instanceof NullableTinyIntVector) {
    +      accessor = new ByteAccessor((NullableTinyIntVector) vector);
    +    } else if (vector instanceof NullableSmallIntVector) {
    +      accessor = new ShortAccessor((NullableSmallIntVector) vector);
    +    } else if (vector instanceof NullableIntVector) {
    +      accessor = new IntAccessor((NullableIntVector) vector);
    +    } else if (vector instanceof NullableBigIntVector) {
    +      accessor = new LongAccessor((NullableBigIntVector) vector);
    +    } else if (vector instanceof NullableFloat4Vector) {
    +      accessor = new FloatAccessor((NullableFloat4Vector) vector);
    +    } else if (vector instanceof NullableFloat8Vector) {
    +      accessor = new DoubleAccessor((NullableFloat8Vector) vector);
    +    } else if (vector instanceof NullableDecimalVector) {
    +      accessor = new DecimalAccessor((NullableDecimalVector) vector);
    +    } else if (vector instanceof NullableVarCharVector) {
    +      accessor = new StringAccessor((NullableVarCharVector) vector);
    +    } else if (vector instanceof NullableVarBinaryVector) {
    +      accessor = new BinaryAccessor((NullableVarBinaryVector) vector);
    +    } else if (vector instanceof ListVector) {
    +      ListVector listVector = (ListVector) vector;
    +      accessor = new ArrayAccessor(listVector);
    +
    +      childColumns = new ColumnVector[1];
    +      childColumns[0] = new ArrowColumnVector(listVector.getDataVector());
    +      resultArray = new Array(childColumns[0]);
    +    } else if (vector instanceof MapVector) {
    +      MapVector mapVector = (MapVector) vector;
    +      accessor = new StructAccessor(mapVector);
    +
    +      childColumns = new ArrowColumnVector[mapVector.size()];
    +      for (int i = 0; i < childColumns.length; ++i) {
    +        childColumns[i] = new ArrowColumnVector(mapVector.getVectorById(i));
    +      }
    +      resultStruct = new ColumnarBatch.Row(childColumns);
    +    } else {
    +      throw new UnsupportedOperationException();
    +    }
    +    numNulls = accessor.getNullCount();
    +    anyNullsSet = numNulls > 0;
    +  }
    +
    +  private static abstract class ArrowVectorAccessor {
    +
    +    private final ValueVector vector;
    +    private final ValueVector.Accessor nulls;
    +
    +    ArrowVectorAccessor(ValueVector vector) {
    +      this.vector = vector;
    +      this.nulls = vector.getAccessor();
    +    }
    +
    +    final boolean isNullAt(int rowId) {
    +      return nulls.isNull(rowId);
    +    }
    +
    +    final int getNullCount() {
    +      return nulls.getNullCount();
    +    }
    +
    +    final void close() {
    +      vector.close();
    +    }
    +
    +    boolean getBoolean(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    byte getByte(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    short getShort(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    int getInt(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    long getLong(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    float getFloat(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    double getDouble(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    Decimal getDecimal(int rowId, int precision, int scale) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    UTF8String getUTF8String(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    byte[] getBinary(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    int getArrayLength(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +
    +    int getArrayOffset(int rowId) {
    +      throw new UnsupportedOperationException();
    +    }
    +  }
    +
    +  private static class BooleanAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableBitVector.Accessor accessor;
    +
    +    BooleanAccessor(NullableBitVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final boolean getBoolean(int rowId) {
    +      return accessor.get(rowId) == 1;
    +    }
    +  }
    +
    +  private static class ByteAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableTinyIntVector.Accessor accessor;
    +
    +    ByteAccessor(NullableTinyIntVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final byte getByte(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class ShortAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableSmallIntVector.Accessor accessor;
    +
    +    ShortAccessor(NullableSmallIntVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final short getShort(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class IntAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableIntVector.Accessor accessor;
    +
    +    IntAccessor(NullableIntVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final int getInt(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class LongAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableBigIntVector.Accessor accessor;
    +
    +    LongAccessor(NullableBigIntVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final long getLong(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class FloatAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableFloat4Vector.Accessor accessor;
    +
    +    FloatAccessor(NullableFloat4Vector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final float getFloat(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class DoubleAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableFloat8Vector.Accessor accessor;
    +
    +    DoubleAccessor(NullableFloat8Vector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final double getDouble(int rowId) {
    +      return accessor.get(rowId);
    +    }
    +  }
    +
    +  private static class DecimalAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableDecimalVector.Accessor accessor;
    +
    +    DecimalAccessor(NullableDecimalVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final Decimal getDecimal(int rowId, int precision, int scale) {
    +      if (isNullAt(rowId)) return null;
    +      return Decimal.apply(accessor.getObject(rowId), precision, scale);
    +    }
    +  }
    +
    +  private static class StringAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableVarCharVector.Accessor accessor;
    +    private final NullableVarCharHolder stringResult = new NullableVarCharHolder();
    +
    +    StringAccessor(NullableVarCharVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final UTF8String getUTF8String(int rowId) {
    +      accessor.get(rowId, stringResult);
    +      if (stringResult.isSet == 0) {
    +        return null;
    +      } else {
    +        return UTF8String.fromAddress(null,
    +          stringResult.buffer.memoryAddress() + stringResult.start,
    +          stringResult.end - stringResult.start);
    +      }
    +    }
    +  }
    +
    +  private static class BinaryAccessor extends ArrowVectorAccessor {
    +
    +    private final NullableVarBinaryVector.Accessor accessor;
    +
    +    BinaryAccessor(NullableVarBinaryVector vector) {
    +      super(vector);
    +      this.accessor = vector.getAccessor();
    +    }
    +
    +    @Override
    +    final byte[] getBinary(int rowId) {
    +      return accessor.getObject(rowId);
    +    }
    +  }
    +
    +  private static class ArrayAccessor extends ArrowVectorAccessor {
    +
    +    private final UInt4Vector.Accessor accessor;
    +
    +    ArrayAccessor(ListVector vector) {
    +      super(vector);
    +      this.accessor = vector.getOffsetVector().getAccessor();
    +    }
    +
    +    @Override
    +    final int getArrayLength(int rowId) {
    +      return accessor.get(rowId + 1) - accessor.get(rowId);
    --- End diff --
    
    If the given rowId is the last row, is it still valid to call `get(rowId + 1)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79793/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128315305
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala ---
    @@ -0,0 +1,109 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.arrow
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.arrow.memory.RootAllocator
    +import org.apache.arrow.vector.types.FloatingPointPrecision
    +import org.apache.arrow.vector.types.pojo.{ArrowType, Field, FieldType, Schema}
    +
    +import org.apache.spark.sql.types._
    +
    +object ArrowUtils {
    +
    +  val rootAllocator = new RootAllocator(Long.MaxValue)
    +
    +  // todo: support more types.
    +
    +  def toArrowType(dt: DataType): ArrowType = dt match {
    +    case BooleanType => ArrowType.Bool.INSTANCE
    +    case ByteType => new ArrowType.Int(8, true)
    +    case ShortType => new ArrowType.Int(8 * 2, true)
    +    case IntegerType => new ArrowType.Int(8 * 4, true)
    +    case LongType => new ArrowType.Int(8 * 8, true)
    +    case FloatType => new ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
    +    case DoubleType => new ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
    +    case StringType => ArrowType.Utf8.INSTANCE
    +    case BinaryType => ArrowType.Binary.INSTANCE
    +    case DecimalType.Fixed(precision, scale) => new ArrowType.Decimal(precision, scale)
    +    case _ => throw new UnsupportedOperationException(s"Unsupported data type: ${dt.simpleString}")
    +  }
    +
    +  def fromArrowType(dt: ArrowType): DataType = dt match {
    +    case ArrowType.Bool.INSTANCE => BooleanType
    +    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 => ByteType
    +    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 2 => ShortType
    +    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 4 => IntegerType
    +    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 8 => LongType
    +    case float: ArrowType.FloatingPoint
    +      if float.getPrecision() == FloatingPointPrecision.SINGLE => FloatType
    +    case float: ArrowType.FloatingPoint
    +      if float.getPrecision() == FloatingPointPrecision.DOUBLE => DoubleType
    +    case ArrowType.Utf8.INSTANCE => StringType
    +    case ArrowType.Binary.INSTANCE => BinaryType
    +    case d: ArrowType.Decimal => DecimalType(d.getPrecision, d.getScale)
    +    case _ => throw new UnsupportedOperationException(s"Unsupported data type: $dt")
    +  }
    +
    +  def toArrowField(name: String, dt: DataType, nullable: Boolean): Field = {
    --- End diff --
    
    Is this only used for testing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    **[Test build #79763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79763/testReport)** for PR 18680 at commit [`ddfcf36`](https://github.com/apache/spark/commit/ddfcf3670c86c7d0498f2193df1525fc60662e40).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128448056
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    --- End diff --
    
    nit: `a column vector`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128503537
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    --- End diff --
    
    I checked Arrow's API docs. I didn't find batch read API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128317443
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Bytes
    +  //
    +
    +  @Override
    +  public byte getByte(int rowId) {
    +    return accessor.getByte(rowId);
    +  }
    +
    +  @Override
    +  public byte[] getBytes(int rowId, int count) {
    +    byte[] array = new byte[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getByte(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Shorts
    +  //
    +
    +  @Override
    +  public short getShort(int rowId) {
    +    return accessor.getShort(rowId);
    +  }
    +
    +  @Override
    +  public short[] getShorts(int rowId, int count) {
    +    short[] array = new short[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getShort(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Ints
    +  //
    +
    +  @Override
    +  public int getInt(int rowId) {
    +    return accessor.getInt(rowId);
    +  }
    +
    +  @Override
    +  public int[] getInts(int rowId, int count) {
    +    int[] array = new int[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getInt(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  @Override
    +  public int getDictId(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Longs
    +  //
    +
    +  @Override
    +  public long getLong(int rowId) {
    +    return accessor.getLong(rowId);
    +  }
    +
    +  @Override
    +  public long[] getLongs(int rowId, int count) {
    +    long[] array = new long[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getLong(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public float getFloat(int rowId) {
    +    return accessor.getFloat(rowId);
    +  }
    +
    +  @Override
    +  public float[] getFloats(int rowId, int count) {
    +    float[] array = new float[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getFloat(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with doubles
    +  //
    +
    +  @Override
    +  public double getDouble(int rowId) {
    +    return accessor.getDouble(rowId);
    +  }
    +
    +  @Override
    +  public double[] getDoubles(int rowId, int count) {
    +    double[] array = new double[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getDouble(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Arrays
    +  //
    +
    +  @Override
    +  public int getArrayLength(int rowId) {
    +    return accessor.getArrayLength(rowId);
    +  }
    +
    +  @Override
    +  public int getArrayOffset(int rowId) {
    +    return accessor.getArrayOffset(rowId);
    +  }
    +
    +  @Override
    +  public void loadBytes(Array array) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Decimals
    +  //
    +
    +  @Override
    +  public Decimal getDecimal(int rowId, int precision, int scale) {
    +    return accessor.getDecimal(rowId, precision, scale);
    +  }
    +
    +  //
    +  // APIs dealing with UTF8Strings
    +  //
    +
    +  @Override
    +  public UTF8String getUTF8String(int rowId) {
    +    return accessor.getUTF8String(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Binaries
    +  //
    +
    +  @Override
    +  public byte[] getBinary(int rowId) {
    +    return accessor.getBinary(rowId);
    +  }
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), MemoryMode.OFF_HEAP);
    +
    +    type = ArrowUtils.fromArrowField(vector.getField());
    +    if (vector instanceof NullableBitVector) {
    +      accessor = new BooleanAccessor((NullableBitVector) vector);
    +    } else if (vector instanceof NullableTinyIntVector) {
    +      accessor = new ByteAccessor((NullableTinyIntVector) vector);
    +    } else if (vector instanceof NullableSmallIntVector) {
    +      accessor = new ShortAccessor((NullableSmallIntVector) vector);
    +    } else if (vector instanceof NullableIntVector) {
    +      accessor = new IntAccessor((NullableIntVector) vector);
    +    } else if (vector instanceof NullableBigIntVector) {
    +      accessor = new LongAccessor((NullableBigIntVector) vector);
    +    } else if (vector instanceof NullableFloat4Vector) {
    +      accessor = new FloatAccessor((NullableFloat4Vector) vector);
    +    } else if (vector instanceof NullableFloat8Vector) {
    +      accessor = new DoubleAccessor((NullableFloat8Vector) vector);
    +    } else if (vector instanceof NullableDecimalVector) {
    +      accessor = new DecimalAccessor((NullableDecimalVector) vector);
    +    } else if (vector instanceof NullableVarCharVector) {
    +      accessor = new StringAccessor((NullableVarCharVector) vector);
    +    } else if (vector instanceof NullableVarBinaryVector) {
    +      accessor = new BinaryAccessor((NullableVarBinaryVector) vector);
    +    } else if (vector instanceof ListVector) {
    +      ListVector listVector = (ListVector) vector;
    +      accessor = new ArrayAccessor(listVector);
    +
    +      childColumns = new ColumnVector[1];
    +      childColumns[0] = new ArrowColumnVector(listVector.getDataVector());
    +      resultArray = new Array(childColumns[0]);
    +    } else if (vector instanceof MapVector) {
    +      MapVector mapVector = (MapVector) vector;
    +      accessor = new StructAccessor(mapVector);
    +
    +      childColumns = new ArrowColumnVector[mapVector.size()];
    +      for (int i = 0; i < childColumns.length; ++i) {
    +        childColumns[i] = new ArrowColumnVector(mapVector.getVectorById(i));
    +      }
    +      resultStruct = new ColumnarBatch.Row(childColumns);
    +    } else {
    +      throw new UnsupportedOperationException();
    --- End diff --
    
    Can this whole "if else" block be put into a pattern match instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128316524
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    --- End diff --
    
    Is this planned to be a public API right now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79763/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    **[Test build #79752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79752/testReport)** for PR 18680 at commit [`73899b2`](https://github.com/apache/spark/commit/73899b26d10ed3763569f4aa2e836643a5ce941a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class ArrowColumnVector extends ColumnVector `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128268438
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,510 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ColumnVector {
    +
    +  private ValueVector vector;
    +  private ValueVector.Accessor nulls;
    +
    +  private NullableBitVector boolData;
    +  private NullableTinyIntVector byteData;
    +  private NullableSmallIntVector shortData;
    +  private NullableIntVector intData;
    +  private NullableBigIntVector longData;
    +
    +  private NullableFloat4Vector floatData;
    +  private NullableFloat8Vector doubleData;
    +  private NullableDecimalVector decimalData;
    +
    +  private NullableVarCharVector stringData;
    +
    +  private NullableVarBinaryVector binaryData;
    +
    +  private UInt4Vector listOffsetData;
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), DataTypes.NullType, MemoryMode.OFF_HEAP);
    +    initialize(vector);
    +  }
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    vector.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public void putNotNull(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNull(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNulls(int rowId, int count) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNotNulls(int rowId, int count) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return nulls.isNull(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public void putBoolean(int rowId, boolean value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putBooleans(int rowId, int count, boolean value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return boolData.getAccessor().get(rowId) == 1;
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    assert(dictionary == null);
    +    NullableBitVector.Accessor accessor = boolData.getAccessor();
    --- End diff --
    
    I see. Can we keep `NullableBitVector.Accessor` instead of `NullableBitVector` while we keep the same reference in two instance variables. I am afraid about the cost of runtime cast in `getBoolean()` method rather than `getBooleans()` method.
    This is why I expect `get()` method will be inlined into by a JIT compiler since each Accessor class is `final`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128449912
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Bytes
    +  //
    +
    +  @Override
    +  public byte getByte(int rowId) {
    +    return accessor.getByte(rowId);
    +  }
    +
    +  @Override
    +  public byte[] getBytes(int rowId, int count) {
    +    byte[] array = new byte[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getByte(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Shorts
    +  //
    +
    +  @Override
    +  public short getShort(int rowId) {
    +    return accessor.getShort(rowId);
    +  }
    +
    +  @Override
    +  public short[] getShorts(int rowId, int count) {
    +    short[] array = new short[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getShort(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Ints
    +  //
    +
    +  @Override
    +  public int getInt(int rowId) {
    +    return accessor.getInt(rowId);
    +  }
    +
    +  @Override
    +  public int[] getInts(int rowId, int count) {
    +    int[] array = new int[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getInt(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  @Override
    +  public int getDictId(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Longs
    +  //
    +
    +  @Override
    +  public long getLong(int rowId) {
    +    return accessor.getLong(rowId);
    +  }
    +
    +  @Override
    +  public long[] getLongs(int rowId, int count) {
    +    long[] array = new long[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getLong(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public float getFloat(int rowId) {
    +    return accessor.getFloat(rowId);
    +  }
    +
    +  @Override
    +  public float[] getFloats(int rowId, int count) {
    +    float[] array = new float[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getFloat(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with doubles
    +  //
    +
    +  @Override
    +  public double getDouble(int rowId) {
    +    return accessor.getDouble(rowId);
    +  }
    +
    +  @Override
    +  public double[] getDoubles(int rowId, int count) {
    +    double[] array = new double[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getDouble(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Arrays
    +  //
    +
    +  @Override
    +  public int getArrayLength(int rowId) {
    +    return accessor.getArrayLength(rowId);
    +  }
    +
    +  @Override
    +  public int getArrayOffset(int rowId) {
    +    return accessor.getArrayOffset(rowId);
    +  }
    +
    +  @Override
    +  public void loadBytes(Array array) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Decimals
    +  //
    +
    +  @Override
    +  public Decimal getDecimal(int rowId, int precision, int scale) {
    +    return accessor.getDecimal(rowId, precision, scale);
    +  }
    +
    +  //
    +  // APIs dealing with UTF8Strings
    +  //
    +
    +  @Override
    +  public UTF8String getUTF8String(int rowId) {
    +    return accessor.getUTF8String(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Binaries
    +  //
    +
    +  @Override
    +  public byte[] getBinary(int rowId) {
    +    return accessor.getBinary(rowId);
    +  }
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), ArrowUtils.fromArrowField(vector.getField()),
    +      MemoryMode.OFF_HEAP);
    +
    +    if (vector instanceof NullableBitVector) {
    +      accessor = new BooleanAccessor((NullableBitVector) vector);
    +    } else if (vector instanceof NullableTinyIntVector) {
    +      accessor = new ByteAccessor((NullableTinyIntVector) vector);
    +    } else if (vector instanceof NullableSmallIntVector) {
    +      accessor = new ShortAccessor((NullableSmallIntVector) vector);
    +    } else if (vector instanceof NullableIntVector) {
    +      accessor = new IntAccessor((NullableIntVector) vector);
    +    } else if (vector instanceof NullableBigIntVector) {
    +      accessor = new LongAccessor((NullableBigIntVector) vector);
    +    } else if (vector instanceof NullableFloat4Vector) {
    +      accessor = new FloatAccessor((NullableFloat4Vector) vector);
    +    } else if (vector instanceof NullableFloat8Vector) {
    +      accessor = new DoubleAccessor((NullableFloat8Vector) vector);
    +    } else if (vector instanceof NullableDecimalVector) {
    +      accessor = new DecimalAccessor((NullableDecimalVector) vector);
    +    } else if (vector instanceof NullableVarCharVector) {
    +      accessor = new StringAccessor((NullableVarCharVector) vector);
    +    } else if (vector instanceof NullableVarBinaryVector) {
    +      accessor = new BinaryAccessor((NullableVarBinaryVector) vector);
    +    } else if (vector instanceof ListVector) {
    +      ListVector listVector = (ListVector) vector;
    +      accessor = new ArrayAccessor(listVector);
    +
    +      childColumns = new ColumnVector[1];
    +      childColumns[0] = new ArrowColumnVector(listVector.getDataVector());
    +      resultArray = new Array(childColumns[0]);
    +    } else if (vector instanceof MapVector) {
    --- End diff --
    
    a unrelated question: why a vector for struct type is called `MapVector` in arrow? cc @BryanCutler 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    **[Test build #79787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79787/testReport)** for PR 18680 at commit [`91b94ef`](https://github.com/apache/spark/commit/91b94ef6d08771fe8e5eb5d41f43153af9a75f06).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128320023
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ReadOnlyColumnVector.java ---
    @@ -0,0 +1,250 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.types.*;
    +
    +/**
    + * An abstract class for read-only column vector.
    + */
    +public abstract class ReadOnlyColumnVector extends ColumnVector {
    --- End diff --
    
    Wouldn't it be better to refactor `ColumnVector` into classes that separate reading/writing so you could just extend the read portion instead of making this class that throws exceptions on writes?  e.g.
    
    ColumnVector -> ColumnVectorWritable -> ColumnVectorReadable
    ArrowColumnVector -> ColumnVectorReadable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128217817
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,510 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ColumnVector {
    +
    +  private ValueVector vector;
    +  private ValueVector.Accessor nulls;
    +
    +  private NullableBitVector boolData;
    +  private NullableTinyIntVector byteData;
    +  private NullableSmallIntVector shortData;
    +  private NullableIntVector intData;
    +  private NullableBigIntVector longData;
    +
    +  private NullableFloat4Vector floatData;
    +  private NullableFloat8Vector doubleData;
    +  private NullableDecimalVector decimalData;
    +
    +  private NullableVarCharVector stringData;
    +
    +  private NullableVarBinaryVector binaryData;
    +
    +  private UInt4Vector listOffsetData;
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), DataTypes.NullType, MemoryMode.OFF_HEAP);
    +    initialize(vector);
    +  }
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    vector.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public void putNotNull(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNull(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNulls(int rowId, int count) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNotNulls(int rowId, int count) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return nulls.isNull(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public void putBoolean(int rowId, boolean value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putBooleans(int rowId, int count, boolean value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return boolData.getAccessor().get(rowId) == 1;
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    assert(dictionary == null);
    +    NullableBitVector.Accessor accessor = boolData.getAccessor();
    --- End diff --
    
    I'm afraid not, because the type of `nulls` is `ValueVector.Accessor` which has only simple methods such as `isNull()`.
    The concrete accessor APIs are different for each types.
    Or should we cast `nulls` to the concrete type each time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128216686
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,510 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ColumnVector {
    +
    +  private ValueVector vector;
    +  private ValueVector.Accessor nulls;
    +
    +  private NullableBitVector boolData;
    +  private NullableTinyIntVector byteData;
    +  private NullableSmallIntVector shortData;
    +  private NullableIntVector intData;
    +  private NullableBigIntVector longData;
    +
    +  private NullableFloat4Vector floatData;
    +  private NullableFloat8Vector doubleData;
    +  private NullableDecimalVector decimalData;
    +
    +  private NullableVarCharVector stringData;
    +
    +  private NullableVarBinaryVector binaryData;
    +
    +  private UInt4Vector listOffsetData;
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), DataTypes.NullType, MemoryMode.OFF_HEAP);
    +    initialize(vector);
    +  }
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    vector.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public void putNotNull(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNull(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNulls(int rowId, int count) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putNotNulls(int rowId, int count) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return nulls.isNull(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public void putBoolean(int rowId, boolean value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public void putBooleans(int rowId, int count, boolean value) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return boolData.getAccessor().get(rowId) == 1;
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    assert(dictionary == null);
    +    NullableBitVector.Accessor accessor = boolData.getAccessor();
    --- End diff --
    
    Can we use `nulls`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18680


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    @BryanCutler all classes under the `execution` package are meant to be private, in the future we will move them to a new package if we are ready to public them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79787/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128422124
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ReadOnlyColumnVector.java ---
    @@ -0,0 +1,250 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.types.*;
    +
    +/**
    + * An abstract class for read-only column vector.
    + */
    +public abstract class ReadOnlyColumnVector extends ColumnVector {
    +
    +  protected ReadOnlyColumnVector(int capacity, MemoryMode memMode) {
    --- End diff --
    
    I see, I'll modify it to accept `dataType` but I guess we shouldn't pass it to `ColumnVector` to avoid illegally allocating child columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128425617
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Bytes
    +  //
    +
    +  @Override
    +  public byte getByte(int rowId) {
    +    return accessor.getByte(rowId);
    +  }
    +
    +  @Override
    +  public byte[] getBytes(int rowId, int count) {
    +    byte[] array = new byte[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getByte(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Shorts
    +  //
    +
    +  @Override
    +  public short getShort(int rowId) {
    +    return accessor.getShort(rowId);
    +  }
    +
    +  @Override
    +  public short[] getShorts(int rowId, int count) {
    +    short[] array = new short[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getShort(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Ints
    +  //
    +
    +  @Override
    +  public int getInt(int rowId) {
    +    return accessor.getInt(rowId);
    +  }
    +
    +  @Override
    +  public int[] getInts(int rowId, int count) {
    +    int[] array = new int[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getInt(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  @Override
    +  public int getDictId(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Longs
    +  //
    +
    +  @Override
    +  public long getLong(int rowId) {
    +    return accessor.getLong(rowId);
    +  }
    +
    +  @Override
    +  public long[] getLongs(int rowId, int count) {
    +    long[] array = new long[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getLong(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public float getFloat(int rowId) {
    +    return accessor.getFloat(rowId);
    +  }
    +
    +  @Override
    +  public float[] getFloats(int rowId, int count) {
    +    float[] array = new float[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getFloat(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with doubles
    +  //
    +
    +  @Override
    +  public double getDouble(int rowId) {
    +    return accessor.getDouble(rowId);
    +  }
    +
    +  @Override
    +  public double[] getDoubles(int rowId, int count) {
    +    double[] array = new double[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getDouble(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Arrays
    +  //
    +
    +  @Override
    +  public int getArrayLength(int rowId) {
    +    return accessor.getArrayLength(rowId);
    +  }
    +
    +  @Override
    +  public int getArrayOffset(int rowId) {
    +    return accessor.getArrayOffset(rowId);
    +  }
    +
    +  @Override
    +  public void loadBytes(Array array) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Decimals
    +  //
    +
    +  @Override
    +  public Decimal getDecimal(int rowId, int precision, int scale) {
    +    return accessor.getDecimal(rowId, precision, scale);
    +  }
    +
    +  //
    +  // APIs dealing with UTF8Strings
    +  //
    +
    +  @Override
    +  public UTF8String getUTF8String(int rowId) {
    +    return accessor.getUTF8String(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Binaries
    +  //
    +
    +  @Override
    +  public byte[] getBinary(int rowId) {
    +    return accessor.getBinary(rowId);
    +  }
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), MemoryMode.OFF_HEAP);
    +
    +    type = ArrowUtils.fromArrowField(vector.getField());
    +    if (vector instanceof NullableBitVector) {
    +      accessor = new BooleanAccessor((NullableBitVector) vector);
    +    } else if (vector instanceof NullableTinyIntVector) {
    +      accessor = new ByteAccessor((NullableTinyIntVector) vector);
    +    } else if (vector instanceof NullableSmallIntVector) {
    +      accessor = new ShortAccessor((NullableSmallIntVector) vector);
    +    } else if (vector instanceof NullableIntVector) {
    +      accessor = new IntAccessor((NullableIntVector) vector);
    +    } else if (vector instanceof NullableBigIntVector) {
    +      accessor = new LongAccessor((NullableBigIntVector) vector);
    +    } else if (vector instanceof NullableFloat4Vector) {
    +      accessor = new FloatAccessor((NullableFloat4Vector) vector);
    +    } else if (vector instanceof NullableFloat8Vector) {
    +      accessor = new DoubleAccessor((NullableFloat8Vector) vector);
    +    } else if (vector instanceof NullableDecimalVector) {
    +      accessor = new DecimalAccessor((NullableDecimalVector) vector);
    +    } else if (vector instanceof NullableVarCharVector) {
    +      accessor = new StringAccessor((NullableVarCharVector) vector);
    +    } else if (vector instanceof NullableVarBinaryVector) {
    +      accessor = new BinaryAccessor((NullableVarBinaryVector) vector);
    +    } else if (vector instanceof ListVector) {
    +      ListVector listVector = (ListVector) vector;
    +      accessor = new ArrayAccessor(listVector);
    +
    +      childColumns = new ColumnVector[1];
    +      childColumns[0] = new ArrowColumnVector(listVector.getDataVector());
    +      resultArray = new Array(childColumns[0]);
    +    } else if (vector instanceof MapVector) {
    +      MapVector mapVector = (MapVector) vector;
    +      accessor = new StructAccessor(mapVector);
    +
    +      childColumns = new ArrowColumnVector[mapVector.size()];
    +      for (int i = 0; i < childColumns.length; ++i) {
    +        childColumns[i] = new ArrowColumnVector(mapVector.getVectorById(i));
    +      }
    +      resultStruct = new ColumnarBatch.Row(childColumns);
    +    } else {
    +      throw new UnsupportedOperationException();
    --- End diff --
    
    Unfortunately, this class is written in Java, so we can't use a pattern match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    cc @BryanCutler @kiszk @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    **[Test build #79752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79752/testReport)** for PR 18680 at commit [`73899b2`](https://github.com/apache/spark/commit/73899b26d10ed3763569f4aa2e836643a5ce941a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    @viirya and the original reporter, thank you for reporting it!
    I submitted a follow-up pr #18701.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128446137
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ReadOnlyColumnVector.java ---
    @@ -0,0 +1,250 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.types.*;
    +
    +/**
    + * An abstract class for read-only column vector.
    + */
    +public abstract class ReadOnlyColumnVector extends ColumnVector {
    --- End diff --
    
    +1 on separating the read/write, we should definitely do this before we publish the `ColumnVector` interfaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128587279
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Bytes
    +  //
    +
    +  @Override
    +  public byte getByte(int rowId) {
    +    return accessor.getByte(rowId);
    +  }
    +
    +  @Override
    +  public byte[] getBytes(int rowId, int count) {
    +    byte[] array = new byte[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getByte(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Shorts
    +  //
    +
    +  @Override
    +  public short getShort(int rowId) {
    +    return accessor.getShort(rowId);
    +  }
    +
    +  @Override
    +  public short[] getShorts(int rowId, int count) {
    +    short[] array = new short[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getShort(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Ints
    +  //
    +
    +  @Override
    +  public int getInt(int rowId) {
    +    return accessor.getInt(rowId);
    +  }
    +
    +  @Override
    +  public int[] getInts(int rowId, int count) {
    +    int[] array = new int[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getInt(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  @Override
    +  public int getDictId(int rowId) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Longs
    +  //
    +
    +  @Override
    +  public long getLong(int rowId) {
    +    return accessor.getLong(rowId);
    +  }
    +
    +  @Override
    +  public long[] getLongs(int rowId, int count) {
    +    long[] array = new long[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getLong(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with floats
    +  //
    +
    +  @Override
    +  public float getFloat(int rowId) {
    +    return accessor.getFloat(rowId);
    +  }
    +
    +  @Override
    +  public float[] getFloats(int rowId, int count) {
    +    float[] array = new float[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getFloat(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with doubles
    +  //
    +
    +  @Override
    +  public double getDouble(int rowId) {
    +    return accessor.getDouble(rowId);
    +  }
    +
    +  @Override
    +  public double[] getDoubles(int rowId, int count) {
    +    double[] array = new double[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getDouble(rowId + i);
    +    }
    +    return array;
    +  }
    +
    +  //
    +  // APIs dealing with Arrays
    +  //
    +
    +  @Override
    +  public int getArrayLength(int rowId) {
    +    return accessor.getArrayLength(rowId);
    +  }
    +
    +  @Override
    +  public int getArrayOffset(int rowId) {
    +    return accessor.getArrayOffset(rowId);
    +  }
    +
    +  @Override
    +  public void loadBytes(Array array) {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  //
    +  // APIs dealing with Decimals
    +  //
    +
    +  @Override
    +  public Decimal getDecimal(int rowId, int precision, int scale) {
    +    return accessor.getDecimal(rowId, precision, scale);
    +  }
    +
    +  //
    +  // APIs dealing with UTF8Strings
    +  //
    +
    +  @Override
    +  public UTF8String getUTF8String(int rowId) {
    +    return accessor.getUTF8String(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Binaries
    +  //
    +
    +  @Override
    +  public byte[] getBinary(int rowId) {
    +    return accessor.getBinary(rowId);
    +  }
    +
    +  public ArrowColumnVector(ValueVector vector) {
    +    super(vector.getValueCapacity(), ArrowUtils.fromArrowField(vector.getField()),
    +      MemoryMode.OFF_HEAP);
    +
    +    if (vector instanceof NullableBitVector) {
    +      accessor = new BooleanAccessor((NullableBitVector) vector);
    +    } else if (vector instanceof NullableTinyIntVector) {
    +      accessor = new ByteAccessor((NullableTinyIntVector) vector);
    +    } else if (vector instanceof NullableSmallIntVector) {
    +      accessor = new ShortAccessor((NullableSmallIntVector) vector);
    +    } else if (vector instanceof NullableIntVector) {
    +      accessor = new IntAccessor((NullableIntVector) vector);
    +    } else if (vector instanceof NullableBigIntVector) {
    +      accessor = new LongAccessor((NullableBigIntVector) vector);
    +    } else if (vector instanceof NullableFloat4Vector) {
    +      accessor = new FloatAccessor((NullableFloat4Vector) vector);
    +    } else if (vector instanceof NullableFloat8Vector) {
    +      accessor = new DoubleAccessor((NullableFloat8Vector) vector);
    +    } else if (vector instanceof NullableDecimalVector) {
    +      accessor = new DecimalAccessor((NullableDecimalVector) vector);
    +    } else if (vector instanceof NullableVarCharVector) {
    +      accessor = new StringAccessor((NullableVarCharVector) vector);
    +    } else if (vector instanceof NullableVarBinaryVector) {
    +      accessor = new BinaryAccessor((NullableVarBinaryVector) vector);
    +    } else if (vector instanceof ListVector) {
    +      ListVector listVector = (ListVector) vector;
    +      accessor = new ArrayAccessor(listVector);
    +
    +      childColumns = new ColumnVector[1];
    +      childColumns[0] = new ArrowColumnVector(listVector.getDataVector());
    +      resultArray = new Array(childColumns[0]);
    +    } else if (vector instanceof MapVector) {
    --- End diff --
    
    I'm not sure about the design decision behind it, but it's meant to lookup child vectors by name so uses a kind of hash map.  I agree that another name would have been more intuitive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    **[Test build #79787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79787/testReport)** for PR 18680 at commit [`91b94ef`](https://github.com/apache/spark/commit/91b94ef6d08771fe8e5eb5d41f43153af9a75f06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128425605
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala ---
    @@ -0,0 +1,109 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.arrow
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.arrow.memory.RootAllocator
    +import org.apache.arrow.vector.types.FloatingPointPrecision
    +import org.apache.arrow.vector.types.pojo.{ArrowType, Field, FieldType, Schema}
    +
    +import org.apache.spark.sql.types._
    +
    +object ArrowUtils {
    +
    +  val rootAllocator = new RootAllocator(Long.MaxValue)
    +
    +  // todo: support more types.
    +
    +  def toArrowType(dt: DataType): ArrowType = dt match {
    +    case BooleanType => ArrowType.Bool.INSTANCE
    +    case ByteType => new ArrowType.Int(8, true)
    +    case ShortType => new ArrowType.Int(8 * 2, true)
    +    case IntegerType => new ArrowType.Int(8 * 4, true)
    +    case LongType => new ArrowType.Int(8 * 8, true)
    +    case FloatType => new ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
    +    case DoubleType => new ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
    +    case StringType => ArrowType.Utf8.INSTANCE
    +    case BinaryType => ArrowType.Binary.INSTANCE
    +    case DecimalType.Fixed(precision, scale) => new ArrowType.Decimal(precision, scale)
    +    case _ => throw new UnsupportedOperationException(s"Unsupported data type: ${dt.simpleString}")
    +  }
    +
    +  def fromArrowType(dt: ArrowType): DataType = dt match {
    +    case ArrowType.Bool.INSTANCE => BooleanType
    +    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 => ByteType
    +    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 2 => ShortType
    +    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 4 => IntegerType
    +    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 8 => LongType
    +    case float: ArrowType.FloatingPoint
    +      if float.getPrecision() == FloatingPointPrecision.SINGLE => FloatType
    +    case float: ArrowType.FloatingPoint
    +      if float.getPrecision() == FloatingPointPrecision.DOUBLE => DoubleType
    +    case ArrowType.Utf8.INSTANCE => StringType
    +    case ArrowType.Binary.INSTANCE => BinaryType
    +    case d: ArrowType.Decimal => DecimalType(d.getPrecision, d.getScale)
    +    case _ => throw new UnsupportedOperationException(s"Unsupported data type: $dt")
    +  }
    +
    +  def toArrowField(name: String, dt: DataType, nullable: Boolean): Field = {
    --- End diff --
    
    No, this is used to create an Arrow schema from `StructType` in `ArrowUtils .toArrowSchema()`, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128425637
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ReadOnlyColumnVector.java ---
    @@ -0,0 +1,250 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.types.*;
    +
    +/**
    + * An abstract class for read-only column vector.
    + */
    +public abstract class ReadOnlyColumnVector extends ColumnVector {
    --- End diff --
    
    I agree that it'd be better to refactor `ColumnVector`, but I think `ColumnVector` is related to `ColumnarBatch` or other classes, so we should do it, and also refactor `ColumnarBatch` at the same time, in the future PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18680#discussion_r128449421
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java ---
    @@ -0,0 +1,545 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import org.apache.arrow.vector.*;
    +import org.apache.arrow.vector.complex.*;
    +import org.apache.arrow.vector.holders.NullableVarCharHolder;
    +
    +import org.apache.spark.memory.MemoryMode;
    +import org.apache.spark.sql.execution.arrow.ArrowUtils;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.types.UTF8String;
    +
    +/**
    + * A column backed by Apache Arrow.
    + */
    +public final class ArrowColumnVector extends ReadOnlyColumnVector {
    +
    +  private final ArrowVectorAccessor accessor;
    +
    +  @Override
    +  public long nullsNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public long valuesNativeAddress() {
    +    throw new RuntimeException("Cannot get native address for arrow column");
    +  }
    +
    +  @Override
    +  public void close() {
    +    if (childColumns != null) {
    +      for (int i = 0; i < childColumns.length; i++) {
    +        childColumns[i].close();
    +      }
    +    }
    +    accessor.close();
    +  }
    +
    +  //
    +  // APIs dealing with nulls
    +  //
    +
    +  @Override
    +  public boolean isNullAt(int rowId) {
    +    return accessor.isNullAt(rowId);
    +  }
    +
    +  //
    +  // APIs dealing with Booleans
    +  //
    +
    +  @Override
    +  public boolean getBoolean(int rowId) {
    +    return accessor.getBoolean(rowId);
    +  }
    +
    +  @Override
    +  public boolean[] getBooleans(int rowId, int count) {
    +    boolean[] array = new boolean[count];
    +    for (int i = 0; i < count; ++i) {
    +      array[i] = accessor.getBoolean(rowId + i);
    --- End diff --
    
    kind of a batch get API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    **[Test build #79763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79763/testReport)** for PR 18680 at commit [`ddfcf36`](https://github.com/apache/spark/commit/ddfcf3670c86c7d0498f2193df1525fc60662e40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    cc @ueshin http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-spark-git-commit-SPARK-21472-SQL-Introduce-ArrowColumnVector-as-a-reader-for-Arrow-vectors-tc22003.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    Have you guys checked the performance of this change? It changes the number of concrete implementations for column vector from 2 to 3 (and potentially 1 to 2 at runtime). This might (or might not) have huge performance implications because it might disable inlining, or force virtual dispatches. (It depends on how we can column vector).
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18680
  
    **[Test build #79793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79793/testReport)** for PR 18680 at commit [`2d1dad9`](https://github.com/apache/spark/commit/2d1dad9ac6bc2cfa4a4dcad32ef99464bc7f6541).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org