You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by ppadma <gi...@git.apache.org> on 2018/02/21 18:02:57 UTC

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

GitHub user ppadma opened a pull request:

    https://github.com/apache/drill/pull/1125

     DRILL-6126: Allocate memory for value vectors upfront in flatten operator

    Made changes to allocate memory upfront for flatten operator based on sizing calculations.
    Need to do allocation of single column (can be nested) for a particular record count
    and allocation hints. Refactored the code a bit for that.
    Instead of assuming worst case fragmentation factor of 2, changed the logic to round down the number of rows calculated to nearest power of two. This will allow us to pack value vectors more densely and will help with memory utilization.
    RepeatedMapvector and RepeatedListVector are extended from RepeatedFixedWidthVectorLike. This is wrong and causing problems in Allocation logic (allocatePrecomputedChildCount in AllocationHelper more specifically). Fixed that.
    This PR has 2 commits. One for all of the above and second one for
    DRILL-6162: Enhance record batch sizer to retain nesting information.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ppadma/drill DRILL-6126

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1125
    
----
commit 58c6b9ad584e56c71d982feaaa43ad32b5011eef
Author: Padma Penumarthy <pp...@...>
Date:   2018-02-21T17:33:12Z

    DRILL-6162: Enhance record batch sizer to retain nesting information for map columns.

commit f7c09131179b75d10ffe195785c9aef3b9c7ed97
Author: Padma Penumarthy <pp...@...>
Date:   2018-02-21T17:35:47Z

    DRILL-6126: Allocate memory for value vectors upfront in flatten operator

----


---

[GitHub] drill issue #1125: DRILL-6126: Allocate memory for value vectors upfront in ...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/1125
  
    @paul-rogers Thank you Paul. I will be incorporating some of your suggestions in future PRs. This change will allow us to build more things on top. 


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169858105
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -108,6 +108,12 @@
         public final float estElementCountPerArray;
         public final boolean isVariableWidth;
     
    +    public Map<String, ColumnSize> childColumnSizes = CaseInsensitiveMap.newHashMap();
    --- End diff --
    
    Nit: `children` might be shorter and simpler. It is clear that the children are sizes from the member type.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169859837
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorInitializer.java ---
    @@ -97,7 +97,7 @@ public void allocateBatch(VectorAccessible va, int recordCount) {
         }
       }
     
    -  private void allocateVector(ValueVector vector, String prefix, int recordCount) {
    +  public void allocateVector(ValueVector vector, String prefix, int recordCount) {
    --- End diff --
    
    Note the error in the member variable in this class:
    
    ```
       private Map<String, AllocationHint> hints = new HashMap<>();
    ```
    
    Should be a case insensitive map.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r171999276
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -245,16 +251,30 @@ private void buildVectorInitializer(VectorInitializer initializer) {
           else if (width > 0) {
             initializer.variableWidth(name, width);
           }
    +
    +      for (ColumnSize columnSize : childColumnSizes.values()) {
    +        columnSize.buildVectorInitializer(initializer);
    +      }
         }
    +
       }
     
       public static ColumnSize getColumn(ValueVector v, String prefix) {
         return new ColumnSize(v, prefix);
       }
     
    +  public ColumnSize getColumn(String name) {
    +    return allColumnSizes.get(name);
    +  }
    +
       public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; // 16 MiB
     
    -  private Map<String, ColumnSize> columnSizes = CaseInsensitiveMap.newHashMap();
    +  // This keeps information for all columns i.e. all top columns and nested columns underneath
    +  private Map<String, ColumnSize> allColumnSizes = CaseInsensitiveMap.newHashMap();
    --- End diff --
    
    yes, I got rid of allColumnSizes. We will have only top level columns.


---

[GitHub] drill issue #1125: DRILL-6126: Allocate memory for value vectors upfront in ...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/1125
  
    @paul-rogers Paul, will you be able to review this ?


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172105104
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -76,110 +82,327 @@
          * greater than (but unlikely) same as the row count.
          */
     
    -    public final int valueCount;
    +    private final int valueCount;
     
         /**
    -     * Total number of elements for a repeated type, or 1 if this is
    -     * a non-repeated type. That is, a batch of 100 rows may have an
    -     * array with 10 elements per row. In this case, the element count
    -     * is 1000.
    +     * Total number of elements for a repeated type, or same as
    +     * valueCount if this is a non-repeated type. That is, a batch
    +     * of 100 rows may have an array with 10 elements per row.
    +     * In this case, the element count is 1000.
          */
     
    -    public final int elementCount;
    +    private int elementCount;
     
         /**
    -     * Size of the top level value vector. For map and repeated list,
    -     * this is just size of offset vector.
    +     * The estimated, average number of elements per parent value.
    +     * Always 1 for a non-repeated type. For a repeated type,
    +     * this is the average entries per array (per repeated element).
          */
    -    public int dataSize;
    +
    +    private float estElementCountPerArray;
     
         /**
    -     * Total size of the column includes the sum total of memory for all
    -     * value vectors representing the column.
    +     * Indicates if it is variable width column.
    +     * For map columns, this is true if any of the children is variable
    +     * width column.
          */
    -    public int netSize;
    +
    +    private boolean isVariableWidth;
     
         /**
    -     * The estimated, average number of elements per parent value.
    -     * Always 1 for a non-repeated type. For a repeated type,
    -     * this is the average entries per array (per repeated element).
    +     * Indicates if cardinality is repeated(top level only).
    +     */
    +
    +    private boolean isRepeated;
    +
    +    /**
    +     * Indicates if cardinality is optional i.e. nullable(top level only).
    +     */
    +    private boolean isOptional;
    +
    +    /**
    +     * Child columns if this is a map column.
    +     */
    +    private Map<String, ColumnSize> children = CaseInsensitiveMap.newHashMap();
    +
    +    /**
    +     * std pure data size per entry from Drill metadata, based on type.
    +     * Does not include metadata vector overhead we add for cardinality,
    +     * variable length etc.
    +     * For variable-width columns, we use 50 as std size for entry width.
    +     * For repeated column, we assume repetition of 10.
    +     */
    +    public int getStdDataSizePerEntry() {
    +      int stdDataSize;
    +
    +      try {
    +        stdDataSize = TypeHelper.getSize(metadata.getType());
    +
    +        // For variable width, typeHelper includes offset vector width. Adjust for that.
    +        if (isVariableWidth) {
    +          stdDataSize -= OFFSET_VECTOR_WIDTH;
    +        }
    +
    +        if (isRepeated) {
    +          stdDataSize = stdDataSize * STD_REPETITION_FACTOR;
    +        }
    +      } catch (Exception e) {
    +        // For unsupported types, just set stdSize to 0.
    +        // Map, Union, List etc.
    +        stdDataSize = 0;
    +      }
    +
    +      // Add sizes of children.
    +      for (ColumnSize columnSize : children.values()) {
    +        stdDataSize += columnSize.getStdDataSizePerEntry();
    +      }
    +
    +      if (isRepeatedList()) {
    +        stdDataSize = stdDataSize * STD_REPETITION_FACTOR;
    +      }
    +
    +      return stdDataSize;
    +    }
    +
    +    /**
    +     * std net size per entry taking into account additional metadata vectors
    +     * we add on top for variable length, cardinality etc.
    +     * For variable-width columns, we use 50 as std data size for entry width.
    +     * For repeated column, we assume repetition of 10.
    +     */
    +    public int getStdNetSizePerEntry() {
    +      int stdNetSize;
    +      try {
    +        stdNetSize = TypeHelper.getSize(metadata.getType());
    +      } catch (Exception e) {
    +        stdNetSize = 0;
    +      }
    +
    +      if (isOptional) {
    +        stdNetSize += BIT_VECTOR_WIDTH;
    +      }
    +
    +      if (isRepeated) {
    +        stdNetSize = (stdNetSize * STD_REPETITION_FACTOR) + OFFSET_VECTOR_WIDTH;
    +      }
    +
    +      for (ColumnSize columnSize : children.values()) {
    +        stdNetSize += columnSize.getStdNetSizePerEntry();
    +      }
    +
    +      if (isRepeatedList()) {
    +        stdNetSize = (stdNetSize * STD_REPETITION_FACTOR) + OFFSET_VECTOR_WIDTH;
    +      }
    +
    +      return stdNetSize;
    +    }
    +
    +    /**
    +     * This is the average actual per entry data size in bytes. Does not
    +     * include any overhead of metadata vectors.
    +     * For repeated columns, it is average for the repeated array, not
    +     * individual entry in the array.
    +     */
    +    public int getDataSizePerEntry() {
    +      return safeDivide(getTotalDataSize(), getValueCount());
    +    }
    +
    +    /**
    +     * This is the average per entry size of just pure data plus
    +     * overhead of additional vectors we add on top like bits vector,
    +     * offset vector etc. This
    +     * size is larger than the actual data size since this size includes per-
    +     * column overhead for additional vectors we add for
    +     * cardinality, variable length etc.
    +     */
    +    public int getNetSizePerEntry() {
    +      return safeDivide(getTotalNetSize(), getValueCount());
    +    }
    +
    +    /**
    +     * This is the total data size for the column, including children for map
    +     * columns. Does not include any overhead of metadata vectors.
    +     */
    +    public int getTotalDataSize() {
    +      int dataSize = this.totalDataSize;
    +      for (ColumnSize columnSize : children.values()) {
    +        dataSize += columnSize.getTotalDataSize();
    +      }
    +      return dataSize;
    +    }
    +
    +    /**
    +     * This is the total net size for the column, including children for map
    +     * columns. Includes overhead of metadata vectors.
          */
    +    public int getTotalNetSize() {
    +      return this.totalNetSize;
    +    }
    +
    +    public int getValueCount() {
    +      return valueCount;
    +    }
     
    -    public final float estElementCountPerArray;
    -    public final boolean isVariableWidth;
    +    public int getElementCount() {
    +      return elementCount;
    +    }
    +
    +    public float getEstElementCountPerArray() {
    +      return estElementCountPerArray;
    +    }
     
    -    public Map<String, ColumnSize> children = CaseInsensitiveMap.newHashMap();
    +    public boolean isVariableWidth() {
    +      return isVariableWidth;
    +    }
     
         public Map<String, ColumnSize> getChildren() {
           return children;
         }
     
    +    public boolean isComplex() {
    +      if (metadata.getType().getMinorType() == MinorType.MAP ||
    +        metadata.getType().getMinorType() == MinorType.UNION ||
    +        metadata.getType().getMinorType() == MinorType.LIST) {
    +        return true;
    +      }
    +      return false;
    +    }
    +
    +    public boolean isRepeatedList() {
    +      if (metadata.getType().getMinorType() == MinorType.LIST &&
    +        metadata.getDataMode() == DataMode.REPEATED) {
    +        return true;
    +      }
    +      return false;
    +    }
    +
    +    /**
    +     * This is the average per entry width, used for vector allocation.
    +     */
    +    public int getEntryWidth() {
    +      int width = 0;
    +      if (isVariableWidth) {
    +        width = getNetSizePerEntry() - OFFSET_VECTOR_WIDTH;
    +
    +        // Subtract out the bits (is-set) vector width
    +        if (metadata.getDataMode() == DataMode.OPTIONAL) {
    +          width -= BIT_VECTOR_WIDTH;
    +        }
    +      }
    +
    +      return (safeDivide(width, estElementCountPerArray));
    +    }
    +
         public ColumnSize(ValueVector v, String prefix) {
           this.prefix = prefix;
           valueCount = v.getAccessor().getValueCount();
           metadata = v.getField();
    -      isVariableWidth = v instanceof VariableWidthVector;
    -
    -      // The amount of memory consumed by the payload: the actual
    -      // data stored in the vectors.
    -
    -      if (v.getField().getDataMode() == DataMode.REPEATED) {
    -        elementCount = buildRepeated(v);
    -        estElementCountPerArray = valueCount == 0 ? 0 : elementCount * 1.0f / valueCount;
    -      } else {
    -        elementCount = 1;
    -        estElementCountPerArray = 1;
    +      isVariableWidth = (v instanceof VariableWidthVector || v instanceof RepeatedVariableWidthVectorLike);
    +      elementCount = valueCount;
    +      estElementCountPerArray = 1;
    +      totalNetSize = v.getPayloadByteCount(valueCount);
    +
    +      // Special case. For union and list vectors, it is very complex
    +      // to figure out raw data size. Make it same as net size.
    +      if (metadata.getType().getMinorType() == MinorType.UNION ||
    +        (metadata.getType().getMinorType() == MinorType.LIST && v.getField().getDataMode() != DataMode.REPEATED)) {
    +        totalDataSize = totalNetSize;
           }
    -      switch (metadata.getType().getMinorType()) {
    -      case LIST:
    -        buildList(v);
    -        break;
    -      case MAP:
    -      case UNION:
    -        // No standard size for Union type
    -        dataSize = v.getPayloadByteCount(valueCount);
    -        break;
    -      default:
    -        dataSize = v.getPayloadByteCount(valueCount);
    -        try {
    -          stdSize = TypeHelper.getSize(metadata.getType()) * elementCount;
    -        } catch (Exception e) {
    -          // For unsupported types, just set stdSize to 0.
    -          stdSize = 0;
    -        }
    +
    +      switch(v.getField().getDataMode()) {
    --- End diff --
    
    Once code gets this complex, it may be time to create subclasses for the three modes. Plus subclasses for the categories of types (variable-width, map, etc.)
    
    The column size would be a composition of the type size class plus a cardinality size class.
    
    Maybe overkill for the current round of changes, but something to keep in mind if this code continues to grow in complexity.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169860846
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java ---
    @@ -266,12 +270,18 @@ public void addComplexWriter(ComplexWriter writer) {
         complexWriters.add(writer);
       }
     
    -  private boolean doAlloc() {
    -    //Allocate vv in the allocationVectors.
    +  private boolean doAlloc(int recordCount) {
    +
    +    //Allocate v in the allocationVectors.
         for (ValueVector v : this.allocationVectors) {
    -      if (!v.allocateNewSafe()) {
    -        return false;
    -      }
    +      // build vector initializer for the column.
    +      // This will iteratively include all nested columns underneath.
    +      RecordBatchSizer.ColumnSize colSize = flattenMemoryManager.getColumnSize(v.getField().getName());
    +      VectorInitializer initializer = new VectorInitializer();
    +      colSize.buildVectorInitializer(initializer);
    +      // Allocate memory for the vector. If it is map, it will allocate memory
    +      // for all nested child columns as well.
    +      initializer.allocateVector(v, "", recordCount);
    --- End diff --
    
    While this code can be made to work, it does introduce a more complex path than was intended. The idea is that a `VectorInirializer` is a recursive structure. The top-level one holds hints for the row. Nested instances can hold data for maps.
    
    Because this class was meant to be temporary, it holds "hints": the information is used if available, defaults used if the hints are not available.
    
    So, a better approach would be to assemble a `VectorInitializer` for the output row in one step. Then, apply it to the entire row in another step.
    
    Further if we do that, we don't have to create initializer objects for every vector allocation; we can reuse the same set if the output row sizes don't change.
    
    Further, the code will be easier to reason about since we won't have two distinct paths.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172105268
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -525,4 +763,11 @@ public VectorInitializer buildVectorInitializer() {
         }
         return initializer;
       }
    +
    +  public void allocateVectors(VectorContainer container, int recordCount) {
    +    for (VectorWrapper w : container) {
    +      ColumnSize colSize = columnSizes.get(w.getField().getName());
    +      colSize.allocateVector(w.getValueVector(), recordCount);
    +    }
    +  }
    --- End diff --
    
    Very nice clean-up and enhancements. Now I don't feel so bad about this stuff being quick & dirty...


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172104027
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -36,13 +39,17 @@
     import org.apache.drill.exec.vector.VariableWidthVector;
     
     import com.google.common.collect.Sets;
    +import org.apache.drill.exec.vector.complex.RepeatedVariableWidthVectorLike;
     
     /**
      * Given a record batch or vector container, determines the actual memory
      * consumed by each column, the average row, and the entire record batch.
      */
     
     public class RecordBatchSizer {
    +  private static final int OFFSET_VECTOR_WIDTH = 4;
    --- End diff --
    
    Perhaps use `UInt4Vector.VALUE_WIDTH` to initialize this constant. Use `UInt1Vector.VALUE_WIDTH` for the bit vector width.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172105435
  
    --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java ---
    @@ -608,4 +608,22 @@ public void collectLedgers(Set<BufferLedger> ledgers) {
       public void toNullable(ValueVector nullableVector) {
         throw new UnsupportedOperationException();
       }
    +
    +  @Override
    +  public int getPayloadByteCount(int valueCount) {
    +    if (valueCount == 0) {
    +      return 0;
    +    }
    +
    +    int count = 0;
    +
    +    int entryCount = offsets.getAccessor().get(valueCount);
    +    count += offsets.getPayloadByteCount(valueCount);
    --- End diff --
    
    ```
    int count = offsets.get...
    ```


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172103839
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java ---
    @@ -492,8 +504,12 @@ private void allocateBatch(boolean newSchema) {
         } else {
           container.zeroVectors();
         }
    +
    +    // Allocate memory for the vectors.
    +    // This will iteratively allocate memory for all nested columns underneath.
         for (VectorWrapper w : container) {
    -      AllocationHelper.allocateNew(w.getValueVector(), Character.MAX_VALUE);
    +      RecordBatchSizer.ColumnSize colSize = mergeJoinMemoryManager.getColumnSize(w.getField().getName());
    +      colSize.allocateVector(w.getValueVector(), mergeJoinMemoryManager.getOutputRowCount());
    --- End diff --
    
    Nice! Much simpler. Just one nit: might as well get the row count once and store it in a local variable to avoid getting the same number repeatedly.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169857960
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -418,11 +438,13 @@ private void measureColumn(ValueVector v, String prefix) {
         netRowWidthCap50 += ! colSize.isVariableWidth ? colSize.estSize :
             8 /* offset vector */ + roundUpToPowerOf2(Math.min(colSize.estSize,50));
             // above change 8 to 4 after DRILL-5446 is fixed
    +
    +    return colSize;
       }
     
    -  private void expandMap(AbstractMapVector mapVector, String prefix) {
    +  private void expandMap(ColumnSize colSize, AbstractMapVector mapVector, String prefix) {
         for (ValueVector vector : mapVector) {
    -      measureColumn(vector, prefix);
    +      colSize.childColumnSizes.put(prefix + vector.getField().getName(), measureColumn(vector, prefix));
    --- End diff --
    
    This is subject to aliasing. Suppose I have two maps:
    
    ```
    aa(b)
    a(ab)
    ```
    When I add the child vectors, both will produce a combined name of `aab`.
    
    We can't use dots n names for the same reason:
    
    ```
    a.b(c)
    a(b.c)
    ```
    
    Both will produce `a.b.c`.
    
    In the new "result set loader" code, all places that handle trees of columns use actual trees of maps.
    
    A crude-but-effecive solution is to use a non-legal name character. The only valid one is the back-tick since we use that in SQL to quote names. If we do that, we now have
    
    ```
    aa`b
    a`ab
    a.b`c
    a`b.c
    ```
    
    And the names are now un-aliased.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172104808
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -76,110 +82,327 @@
          * greater than (but unlikely) same as the row count.
          */
     
    -    public final int valueCount;
    +    private final int valueCount;
     
         /**
    -     * Total number of elements for a repeated type, or 1 if this is
    -     * a non-repeated type. That is, a batch of 100 rows may have an
    -     * array with 10 elements per row. In this case, the element count
    -     * is 1000.
    +     * Total number of elements for a repeated type, or same as
    +     * valueCount if this is a non-repeated type. That is, a batch
    +     * of 100 rows may have an array with 10 elements per row.
    +     * In this case, the element count is 1000.
          */
     
    -    public final int elementCount;
    +    private int elementCount;
     
         /**
    -     * Size of the top level value vector. For map and repeated list,
    -     * this is just size of offset vector.
    +     * The estimated, average number of elements per parent value.
    +     * Always 1 for a non-repeated type. For a repeated type,
    +     * this is the average entries per array (per repeated element).
          */
    -    public int dataSize;
    +
    +    private float estElementCountPerArray;
     
         /**
    -     * Total size of the column includes the sum total of memory for all
    -     * value vectors representing the column.
    +     * Indicates if it is variable width column.
    +     * For map columns, this is true if any of the children is variable
    +     * width column.
          */
    -    public int netSize;
    +
    +    private boolean isVariableWidth;
     
         /**
    -     * The estimated, average number of elements per parent value.
    -     * Always 1 for a non-repeated type. For a repeated type,
    -     * this is the average entries per array (per repeated element).
    +     * Indicates if cardinality is repeated(top level only).
    +     */
    +
    +    private boolean isRepeated;
    +
    +    /**
    +     * Indicates if cardinality is optional i.e. nullable(top level only).
    +     */
    +    private boolean isOptional;
    +
    +    /**
    +     * Child columns if this is a map column.
    +     */
    +    private Map<String, ColumnSize> children = CaseInsensitiveMap.newHashMap();
    +
    +    /**
    +     * std pure data size per entry from Drill metadata, based on type.
    +     * Does not include metadata vector overhead we add for cardinality,
    +     * variable length etc.
    +     * For variable-width columns, we use 50 as std size for entry width.
    +     * For repeated column, we assume repetition of 10.
    +     */
    +    public int getStdDataSizePerEntry() {
    +      int stdDataSize;
    +
    +      try {
    +        stdDataSize = TypeHelper.getSize(metadata.getType());
    +
    +        // For variable width, typeHelper includes offset vector width. Adjust for that.
    +        if (isVariableWidth) {
    +          stdDataSize -= OFFSET_VECTOR_WIDTH;
    +        }
    +
    +        if (isRepeated) {
    +          stdDataSize = stdDataSize * STD_REPETITION_FACTOR;
    +        }
    +      } catch (Exception e) {
    +        // For unsupported types, just set stdSize to 0.
    +        // Map, Union, List etc.
    +        stdDataSize = 0;
    +      }
    +
    +      // Add sizes of children.
    +      for (ColumnSize columnSize : children.values()) {
    +        stdDataSize += columnSize.getStdDataSizePerEntry();
    +      }
    +
    +      if (isRepeatedList()) {
    +        stdDataSize = stdDataSize * STD_REPETITION_FACTOR;
    +      }
    +
    +      return stdDataSize;
    +    }
    +
    +    /**
    +     * std net size per entry taking into account additional metadata vectors
    +     * we add on top for variable length, cardinality etc.
    +     * For variable-width columns, we use 50 as std data size for entry width.
    +     * For repeated column, we assume repetition of 10.
    +     */
    +    public int getStdNetSizePerEntry() {
    +      int stdNetSize;
    +      try {
    +        stdNetSize = TypeHelper.getSize(metadata.getType());
    +      } catch (Exception e) {
    +        stdNetSize = 0;
    +      }
    +
    +      if (isOptional) {
    +        stdNetSize += BIT_VECTOR_WIDTH;
    +      }
    +
    +      if (isRepeated) {
    +        stdNetSize = (stdNetSize * STD_REPETITION_FACTOR) + OFFSET_VECTOR_WIDTH;
    +      }
    +
    +      for (ColumnSize columnSize : children.values()) {
    +        stdNetSize += columnSize.getStdNetSizePerEntry();
    +      }
    +
    +      if (isRepeatedList()) {
    +        stdNetSize = (stdNetSize * STD_REPETITION_FACTOR) + OFFSET_VECTOR_WIDTH;
    +      }
    +
    +      return stdNetSize;
    +    }
    +
    +    /**
    +     * This is the average actual per entry data size in bytes. Does not
    +     * include any overhead of metadata vectors.
    +     * For repeated columns, it is average for the repeated array, not
    +     * individual entry in the array.
    +     */
    +    public int getDataSizePerEntry() {
    +      return safeDivide(getTotalDataSize(), getValueCount());
    +    }
    +
    +    /**
    +     * This is the average per entry size of just pure data plus
    +     * overhead of additional vectors we add on top like bits vector,
    +     * offset vector etc. This
    +     * size is larger than the actual data size since this size includes per-
    +     * column overhead for additional vectors we add for
    +     * cardinality, variable length etc.
    +     */
    +    public int getNetSizePerEntry() {
    +      return safeDivide(getTotalNetSize(), getValueCount());
    +    }
    +
    +    /**
    +     * This is the total data size for the column, including children for map
    +     * columns. Does not include any overhead of metadata vectors.
    +     */
    +    public int getTotalDataSize() {
    +      int dataSize = this.totalDataSize;
    +      for (ColumnSize columnSize : children.values()) {
    +        dataSize += columnSize.getTotalDataSize();
    +      }
    +      return dataSize;
    +    }
    +
    +    /**
    +     * This is the total net size for the column, including children for map
    +     * columns. Includes overhead of metadata vectors.
          */
    +    public int getTotalNetSize() {
    +      return this.totalNetSize;
    +    }
    +
    +    public int getValueCount() {
    +      return valueCount;
    +    }
     
    -    public final float estElementCountPerArray;
    -    public final boolean isVariableWidth;
    +    public int getElementCount() {
    +      return elementCount;
    +    }
    +
    +    public float getEstElementCountPerArray() {
    +      return estElementCountPerArray;
    +    }
     
    -    public Map<String, ColumnSize> children = CaseInsensitiveMap.newHashMap();
    +    public boolean isVariableWidth() {
    +      return isVariableWidth;
    +    }
     
         public Map<String, ColumnSize> getChildren() {
           return children;
         }
     
    +    public boolean isComplex() {
    +      if (metadata.getType().getMinorType() == MinorType.MAP ||
    +        metadata.getType().getMinorType() == MinorType.UNION ||
    +        metadata.getType().getMinorType() == MinorType.LIST) {
    +        return true;
    +      }
    +      return false;
    --- End diff --
    
    Nit, but
    ```
    return metadata.getType().getMinorType() == MinorType.MAP ||
           metadata.getType().getMinorType() == MinorType.UNION ||
           metadata.getType().getMinorType() == MinorType.LIST;
    ```
    
    And below.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172104395
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -76,110 +82,327 @@
          * greater than (but unlikely) same as the row count.
          */
     
    -    public final int valueCount;
    +    private final int valueCount;
     
         /**
    -     * Total number of elements for a repeated type, or 1 if this is
    -     * a non-repeated type. That is, a batch of 100 rows may have an
    -     * array with 10 elements per row. In this case, the element count
    -     * is 1000.
    +     * Total number of elements for a repeated type, or same as
    +     * valueCount if this is a non-repeated type. That is, a batch
    +     * of 100 rows may have an array with 10 elements per row.
    +     * In this case, the element count is 1000.
          */
     
    -    public final int elementCount;
    +    private int elementCount;
     
         /**
    -     * Size of the top level value vector. For map and repeated list,
    -     * this is just size of offset vector.
    +     * The estimated, average number of elements per parent value.
    +     * Always 1 for a non-repeated type. For a repeated type,
    +     * this is the average entries per array (per repeated element).
          */
    -    public int dataSize;
    +
    +    private float estElementCountPerArray;
    --- End diff --
    
    Perhaps `estCardinality`? I've been using the term "cardinality" in new code to refer to array sizes.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172105917
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -199,12 +422,18 @@ public String toString() {
                .append(", per-array: ")
                .append(estElementCountPerArray);
           }
    -      buf .append(", std size: ")
    -          .append(stdSize)
    -          .append(", actual size: ")
    -          .append(estSize)
    -          .append(", data size: ")
    -          .append(dataSize)
    +      buf .append(", std size per entry: ")
    +          .append(getStdDataSizePerEntry())
    +          .append(", std net size per entry: ")
    +          .append(getStdNetSizePerEntry())
    +          .append(", data size per entry: ")
    +          .append(getDataSizePerEntry())
    +          .append(", net size per entry: ")
    +          .append(getNetSizePerEntry())
    +          .append(", totalDataSize: ")
    +          .append(getTotalDataSize())
    +          .append(", totalNetSize: ")
    +          .append(getTotalNetSize())
    --- End diff --
    
    Extra info is always good. This info is printed by the sort operator when debug logging is enabled. (It was invaluable when investigating issues found by QA tests.) Suggestion: find a way to compact the labels. Maybe:
    ```
    Per entry: std size: xx, std net: xxx; Totals: data size: xxx, net size: xxx
    ```
    
    Also, `totalDataSize` and `totalNetSize` should have spaces: they are labels, not variable names here.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172104598
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -76,110 +82,327 @@
          * greater than (but unlikely) same as the row count.
          */
     
    -    public final int valueCount;
    +    private final int valueCount;
     
         /**
    -     * Total number of elements for a repeated type, or 1 if this is
    -     * a non-repeated type. That is, a batch of 100 rows may have an
    -     * array with 10 elements per row. In this case, the element count
    -     * is 1000.
    +     * Total number of elements for a repeated type, or same as
    +     * valueCount if this is a non-repeated type. That is, a batch
    +     * of 100 rows may have an array with 10 elements per row.
    +     * In this case, the element count is 1000.
          */
     
    -    public final int elementCount;
    +    private int elementCount;
     
         /**
    -     * Size of the top level value vector. For map and repeated list,
    -     * this is just size of offset vector.
    +     * The estimated, average number of elements per parent value.
    +     * Always 1 for a non-repeated type. For a repeated type,
    +     * this is the average entries per array (per repeated element).
          */
    -    public int dataSize;
    +
    +    private float estElementCountPerArray;
     
         /**
    -     * Total size of the column includes the sum total of memory for all
    -     * value vectors representing the column.
    +     * Indicates if it is variable width column.
    +     * For map columns, this is true if any of the children is variable
    +     * width column.
          */
    -    public int netSize;
    +
    +    private boolean isVariableWidth;
     
         /**
    -     * The estimated, average number of elements per parent value.
    -     * Always 1 for a non-repeated type. For a repeated type,
    -     * this is the average entries per array (per repeated element).
    +     * Indicates if cardinality is repeated(top level only).
    +     */
    +
    +    private boolean isRepeated;
    --- End diff --
    
    Might be fun to check out the new metadata classes added for the result set loader. They parse the `MajorType` to pull out this kind of information. You could embed an instance of the `ColumnMetadata` class here to provide this detailed information.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r171999148
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -418,11 +438,13 @@ private void measureColumn(ValueVector v, String prefix) {
         netRowWidthCap50 += ! colSize.isVariableWidth ? colSize.estSize :
             8 /* offset vector */ + roundUpToPowerOf2(Math.min(colSize.estSize,50));
             // above change 8 to 4 after DRILL-5446 is fixed
    +
    +    return colSize;
       }
     
    -  private void expandMap(AbstractMapVector mapVector, String prefix) {
    +  private void expandMap(ColumnSize colSize, AbstractMapVector mapVector, String prefix) {
         for (ValueVector vector : mapVector) {
    -      measureColumn(vector, prefix);
    +      colSize.childColumnSizes.put(prefix + vector.getField().getName(), measureColumn(vector, prefix));
    --- End diff --
    
    I made the change to maintain trees of columns for trees of maps. Any one who wants to access the lower level columns, they get top level column and have to walk through the tree.  



---

[GitHub] drill issue #1125: DRILL-6126: Allocate memory for value vectors upfront in ...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/1125
  
    @paul-rogers Updated the PR with latest changes.  I have decided not to use vector initializer for allocation as it is subject to alias issues like you mentioned. Instead, I added allocate vector method to columnSize in batch sizer, which will use internal sizing information it has to allocate memory (including all it's children) for a particular record count. 
    Refactored the batch sizer code, added unit tests for verifying sizing and allocation for different vector types. 
    Please take a look when you get a chance and let me know what you think.



---

[GitHub] drill issue #1125: DRILL-6126: Allocate memory for value vectors upfront in ...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/1125
  
    @paul-rogers Paul, Thanks a lot for your review comments and bringing up some good issues. Just want to let you know. I am working on refactoring the batch sizer code, writing bunch of unit tests to test sizing and vector allocation for all different vector types. Found some bugs in the process and fixed them. I will be posting new changes soon and need your review once they are ready.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169859674
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -245,16 +251,30 @@ private void buildVectorInitializer(VectorInitializer initializer) {
           else if (width > 0) {
             initializer.variableWidth(name, width);
           }
    +
    +      for (ColumnSize columnSize : childColumnSizes.values()) {
    +        columnSize.buildVectorInitializer(initializer);
    +      }
         }
    +
       }
     
       public static ColumnSize getColumn(ValueVector v, String prefix) {
         return new ColumnSize(v, prefix);
       }
     
    +  public ColumnSize getColumn(String name) {
    +    return allColumnSizes.get(name);
    +  }
    +
       public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; // 16 MiB
     
    -  private Map<String, ColumnSize> columnSizes = CaseInsensitiveMap.newHashMap();
    +  // This keeps information for all columns i.e. all top columns and nested columns underneath
    +  private Map<String, ColumnSize> allColumnSizes = CaseInsensitiveMap.newHashMap();
    --- End diff --
    
    I'm a bit confused. I saw above that a `ColumnSize` has a list of children. Why are the children repeated here, introducing the naming issues described below?
    
    Given the column alias issues, and how column index is used elsewhere, can this just be an index of top-level columns? Then, to size map vectors (the only one that has the nesting issue), use the code for recursion that already exists in the `VectorInitializer`?


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169860310
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java ---
    @@ -245,16 +251,30 @@ private void buildVectorInitializer(VectorInitializer initializer) {
           else if (width > 0) {
             initializer.variableWidth(name, width);
           }
    +
    --- End diff --
    
    There may be a but in the original code above this line.
    
    ```
           String name = prefix + metadata.getName();
    ...
               initializer.variableWidthArray(name, ...
    ```
    
    First problem: this method does not recurse into the child vectors of maps. Second, if it did, since each initializer is supposed to hold a name relative to its parent tuple (row or map), the prefix is not needed.
    
    The vector initializer code was originally created for sort and none of the tests for sort include maps, so I probably did not catch the fact that maps don't work correctly with the original code.
    
    Suggestion: add unit tests for this stuff. This code is getting very complex and tests would be very helpful to catch these subtle bugs.
    
    Disclaimer: this is all supposed to be replaced with the ResultSetLoader code. The vector allocation code there is much cleaner, does handle maps, and has been unit tested.


---

[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1125


---