You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/10 22:59:18 UTC
[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

sunchao commented on a change in pull request #35483:
URL: https://github.com/apache/spark/pull/35483#discussion_r804201836



##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
##########
@@ -557,6 +579,9 @@ protected void reserveInternal(int newCapacity) {
           Platform.reallocateMemory(lengthData, oldCapacity * 4L, newCapacity * 4L);
       this.offsetData =
           Platform.reallocateMemory(offsetData, oldCapacity * 4L, newCapacity * 4L);
+    } else if (isStruct()) {
+      this.structOffsetData =
+        Platform.reallocateMemory(structOffsetData, oldCapacity * 4L, newCapacity * 4L);

Review comment:
       OK

##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -374,6 +375,14 @@ public void putBooleans(int rowId, int count, byte src, int srcIndex) {
    */
   public abstract void putArray(int rowId, int offset, int length);
 
+  /**
+   * Puts a new non-null struct at 'rowId' of this vector, which is backed by elements at
+   * 'offset' of child vectors.
+   *
+   * NOTE: this MUST be called after new elements are appended to child vectors of a struct vector.
+   */
+  public abstract void putStruct(int rowId, int offset);

Review comment:
       Hmm, I'd prefer `putStruct`, similar to `putArray` above.

##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -457,6 +466,7 @@ public final int appendNull() {
   }
 
   public final int appendNotNull() {
+    assert (!(dataType() instanceof StructType)); // Use appendStruct()

Review comment:
       As mentioned in the PR description, I think it's fine since `WritableColumnVector` is a Spark internal API. Also see some [previous discussion](https://github.com/apache/spark/pull/34659#discussion_r769131591).

##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -703,13 +720,13 @@ public WritableColumnVector arrayData() {
 
   public abstract int getArrayOffset(int rowId);
 
-  @Override
-  public WritableColumnVector getChild(int ordinal) { return childColumns[ordinal]; }
-
   /**
-   * Returns the elements appended.
+   * Returns the offset of a struct element at 'rowId' in the child vectors of this.
    */
-  public final int getElementsAppended() { return elementsAppended; }

Review comment:
       Oops I removed this by accident. Seems it isn't used anywhere though.

##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -374,6 +375,14 @@ public void putBooleans(int rowId, int count, byte src, int srcIndex) {
    */
   public abstract void putArray(int rowId, int offset, int length);
 
+  /**
+   * Puts a new non-null struct at 'rowId' of this vector, which is backed by elements at
+   * 'offset' of child vectors.

Review comment:
       Yea for non-null struct the offset is the same: struct[rowId] is constitute of children[i][rowId] for i in [0, struct.len).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org