You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (JIRA)" <ji...@apache.org> on 2017/08/17 23:33:00 UTC
[jira] [Commented] (DRILL-5728) Hash Aggregate: Useless bigint
value vector in the values batch
[ https://issues.apache.org/jira/browse/DRILL-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131478#comment-16131478 ]
Boaz Ben-Zvi commented on DRILL-5728:
-------------------------------------
Similar code is used when the underlying value column is nullable (see below). In this case the additional value vector may be needed, but maybe can be replaced by a bitset instead of bigint vector to save memory.
{code}
public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
throws SchemaChangeException
{
{
NullableBigIntHolder out11 = new NullableBigIntHolder();
{
out11 .isSet = vv8 .getAccessor().isSet((incomingRowIdx));
if (out11 .isSet == 1) {
out11 .value = vv8 .getAccessor().get((incomingRowIdx));
}
}
NullableBigIntHolder in = out11;
work0 .value = vv1 .getAccessor().get((htRowIdx));
BigIntHolder value = work0;
work4 .value = vv5 .getAccessor().get((htRowIdx));
BigIntHolder nonNullCount = work4;
SumFunctions$NullableBigIntSum_add: {
sout:
{
if (in.isSet == 0) {
break sout;
}
nonNullCount.value = 1;
value.value += in.value;
}
}
work0 = value;
vv1 .getMutator().set((htRowIdx), work0 .value);
work4 = nonNullCount;
vv5 .getMutator().set((htRowIdx), work4 .value);
}
}
{code}
> Hash Aggregate: Useless bigint value vector in the values batch
> ---------------------------------------------------------------
>
> Key: DRILL-5728
> URL: https://issues.apache.org/jira/browse/DRILL-5728
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Codegen
> Affects Versions: 1.11.0
> Reporter: Boaz Ben-Zvi
> Priority: Minor
>
> When aggregating a non-nullable column (like *sum(l_partkey)* below), the code generation creates an extra value vector (in addition to the actual "sum" vector) which is used as a "nonNullCount".
> This is useless (as the underlying column is non-nullable), and wastes considerable memory ( 8 * 64K = 512K per each value in a batch !!)
> Example query:
> {{select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by l_orderkry;}}
> And as can be seen in the generated code below, the bigint value vector *vv5* is only used to hold a *1* flag to note "not null":
> {code}
> public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx)
> throws SchemaChangeException
> {
> {
> IntHolder out11 = new IntHolder();
> {
> out11 .value = vv8 .getAccessor().get((incomingRowIdx));
> }
> IntHolder in = out11;
> work0 .value = vv1 .getAccessor().get((htRowIdx));
> BigIntHolder value = work0;
> work4 .value = vv5 .getAccessor().get((htRowIdx));
> BigIntHolder nonNullCount = work4;
>
> SumFunctions$IntSum_add: {
> nonNullCount.value = 1;
> value.value += in.value;
> }
>
> work0 = value;
> vv1 .getMutator().set((htRowIdx), work0 .value);
> work4 = nonNullCount;
> vv5 .getMutator().set((htRowIdx), work4 .value);
> }
> }
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)