You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/10/01 00:07:04 UTC
[jira] [Commented] (DRILL-5758) Rollup of external sort fixes to
issues found by QA
[ https://issues.apache.org/jira/browse/DRILL-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187231#comment-16187231 ]
ASF GitHub Bot commented on DRILL-5758:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/932#discussion_r142017748
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java ---
@@ -74,53 +74,52 @@
public final int estSize;
/**
- * Number of times the value here (possibly repeated) appears in
- * the record batch.
+ * Number of occurrences of the value in the batch. This is trivial
+ * for top-level scalars: it is the record count. For a top-level
+ * repeated vector, this is the number of arrays, also the record
+ * count. For a value nested inside a repeated map, it is the
+ * total number of values across all maps, and may be less than,
+ * greater than (but unlikely) same as the row count.
*/
public final int valueCount;
/**
- * The number of elements in the value vector. Consider two cases.
- * A required or nullable vector has one element per row, so the
- * <tt>entryCount</tt> is the same as the <tt>valueCount</tt> (which,
- * in turn, is the same as the row count.) But, if this vector is an
- * array, then the <tt>valueCount</tt> is the number of columns, while
- * <tt>entryCount</tt> is the total number of elements in all the arrays
- * that make up the columns, so <tt>entryCount</tt> will be different than
- * the <tt>valueCount</tt> (normally larger, but possibly smaller if most
- * arrays are empty.
- * <p>
- * Finally, the column may be part of another list. In this case, the above
- * logic still applies, but the <tt>valueCount</tt> is the number of entries
- * in the outer array, not the row count.
+ * Total number of elements for a repeated type, or 1 if this is
+ * a non-repeated type. That is, a batch of 100 rows may have an
+ * array with 10 elements per row. In this case, the element count
+ * is 1000.
*/
- public int entryCount;
+ public final int elementCount;
--- End diff --
Good point. However, a single batch of greater than 2 GB is far more than the sort can handle, so we'd not even get this far if the batch was this large.
Still, the point is valid, so a new commit changes batch size variables from int to long.
> Rollup of external sort fixes to issues found by QA
> ---------------------------------------------------
>
> Key: DRILL-5758
> URL: https://issues.apache.org/jira/browse/DRILL-5758
> Project: Apache Drill
> Issue Type: Task
> Affects Versions: 1.12.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Labels: ready-to-commit
> Fix For: 1.12.0
>
>
> Tracking JIRA to used for the PR that combines fixes for various JIRA entries. Bugs fixed in this task are given by the linked issues.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)