You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/13 22:21:00 UTC
[jira] [Commented] (DRILL-5601) Rollup of External Sort memory
management fixes
[ https://issues.apache.org/jira/browse/DRILL-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086490#comment-16086490 ]
ASF GitHub Bot commented on DRILL-5601:
---------------------------------------
Github user Ben-Zvi commented on a diff in the pull request:
https://github.com/apache/drill/pull/860#discussion_r123814368
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java ---
@@ -19,117 +19,162 @@
import java.util.ArrayList;
import java.util.List;
+import java.util.Set;
+import org.apache.drill.common.types.TypeProtos.DataMode;
import org.apache.drill.common.types.TypeProtos.MinorType;
import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.memory.AllocationManager.BufferLedger;
import org.apache.drill.exec.memory.BaseAllocator;
import org.apache.drill.exec.record.BatchSchema;
import org.apache.drill.exec.record.MaterializedField;
import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.SmartAllocationHelper;
import org.apache.drill.exec.record.VectorAccessible;
import org.apache.drill.exec.record.VectorWrapper;
import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.vector.UInt4Vector;
import org.apache.drill.exec.vector.ValueVector;
import org.apache.drill.exec.vector.complex.AbstractMapVector;
+import org.apache.drill.exec.vector.complex.RepeatedValueVector;
import org.apache.drill.exec.vector.VariableWidthVector;
+import com.google.common.collect.Sets;
+
/**
* Given a record batch or vector container, determines the actual memory
* consumed by each column, the average row, and the entire record batch.
*/
public class RecordBatchSizer {
-// private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(RecordBatchSizer.class);
/**
* Column size information.
*/
public static class ColumnSize {
+ public final String prefix;
public final MaterializedField metadata;
/**
- * Assumed size from Drill metadata.
+ * Assumed size from Drill metadata. Note that this information is
+ * 100% bogus. Do not use it.
*/
+ @Deprecated
public int stdSize;
/**
- * Actual memory consumed by all the vectors associated with this column.
- */
-
- public int totalSize;
-
- /**
* Actual average column width as determined from actual memory use. This
* size is larger than the actual data size since this size includes per-
* column overhead such as any unused vector space, etc.
*/
- public int estSize;
- public int capacity;
- public int density;
- public int dataSize;
- public boolean variableWidth;
-
- public ColumnSize(ValueVector vv) {
- metadata = vv.getField();
- stdSize = TypeHelper.getSize(metadata.getType());
-
- // Can't get size estimates if this is an empty batch.
- int rowCount = vv.getAccessor().getValueCount();
- if (rowCount == 0) {
- estSize = stdSize;
- return;
- }
+ public final int estSize;
+ public final int valueCount;
+ public final int entryCount;
+ public final int dataSize;
+ public final int estElementCount;
+ public final boolean isVariableWidth;
- // Total size taken by all vectors (and underlying buffers)
- // associated with this vector.
+ public ColumnSize(ValueVector v, String prefix, int valueCount) {
--- End diff --
Programming 101: "v" - The use of a single letter variable name should be discouraged ...
> Rollup of External Sort memory management fixes
> -----------------------------------------------
>
> Key: DRILL-5601
> URL: https://issues.apache.org/jira/browse/DRILL-5601
> Project: Apache Drill
> Issue Type: Task
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> Rollup of a set of specific JIRA entries that all relate to the very difficult problem of managing memory within Drill in order for the external sort to stay within a memory budget. In general, the fixes relate to better estimating memory used by the three ways that Drill allocates vector memory (see DRILL-5522) and to predicting the size of vectors that the sort will create, to avoid repeated realloc-copy cycles (see DRILL-5594).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)