You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by weijie tong <to...@gmail.com> on 2018/06/01 05:40:42 UTC

Re: How to generate hash code for each build side one of the hash join columns

Hi Boaz:

  Your propose is valuable though I have implemented the dynamic generating
code logic.  If a  ``` long hash64(int index, long seed) ``` method is
added to the ValueVector , it will also benefit others to implement
specific storage plugin's filter logic by using the pushed down bloom
filter.  To HashJoin and HashAggregate , methods ```double
hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
seed)```  will also be needed to the ValueVector.  If no one else gives
objection , I will be pleasure to take this work.

   Btw, I will share my thought about the scan side's filter logic by the
BloomFilter. The scan side filter logic here I supposed to do is to filter
the materialized ValueVector ,not at the process to construct the
ValueVector from the original storage format data. The reason is the
checking logic will break down the performance to materialize the original
deep storage format data to ValueVector.

On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <bb...@mapr.com> wrote:

>  Hi Weijie,
>
>     Another option is to totally avoid the generated code.
> We were considering the idea of replacing the generated code used for
> computing hash values with “real java” code.
>
> This idea is analogous to the usage of the copyEntry() method in the
> ValueVector interface (that Paul added last year).
> See an example of using the copyEntry() (via the appendRow() in
> VectorContainer) in the new Hash-Join-Spill code.
> Basically no need to generate “type specific” code, as the virtual
> copyEntry() method does the “type specific” work.
>
> Similarly we could have a hash64() method in ValueVector, which would
> perform the “type specific” computation.
> (One difference from copyEntry() – the hash64() would also need to take
> the “seed” parameter, which is the hash value produced by the previous
> hash).
> And similar to appendRow(), there would be evalHash() iterating over the
> key columns.
> (And one difference from appendRow() – need to iterate only on the key
> columns; these are the first columns; their number can be found from the
> config: e.g., htConfig.getKeyExprsBuild().size() )
>
>    With such implementation, that evalHash() could be used anywhere (e.g.,
> to match the Bloom filters on the left side of the join).
>
>        Thanks,
>
>              Boaz
>
>
> On 5/30/18, 7:49 PM, "weijie tong" <to...@gmail.com> wrote:
>
>     Hi Aman:
>
>       Thanks for your tips. I have rebased the latest code from the master
>     branch . Yes, the spill-to-disk feature does changed the original
>     implementation. I have adjusted my implementation according to the new
>     feature. But as you say, it will take some challenge to integration as
> I
>     noticed the spill-to-disk feature will continue to tune its
> implementation
>     performance.
>
>       The BloomFilter was implemented natively in Drill , not an external
>     library. It's implemented the algorithm of the paper which was
> mentioned by
>     you.
>
>
>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <am...@apache.org>
> wrote:
>
>     > Hi Weijie,
>     > I was hoping you could leverage the existing methods..so its good
> that you
>     > found the ones that work for your use case.
>     > One thing I want to point out (maybe you're already aware) .. the
> Hash Join
>     > code has changed significantly in the master branch due to the
>     > spill-to-disk feature.
>     > So, this may pose some integration challenges for your run-time join
>     > pushdown feature.
>     > Also, one other question/clarification:  for the bloom filter itself
> are
>     > you implementing it natively in Drill or using an external library ?
>     >
>     > -Aman
>     >
>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
> tongweijie178@gmail.com>
>     > wrote:
>     >
>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>     > unNestEvalBlock()
>     > > which has the same effect to what I change to the ClassGenerator.
> So I
>     > give
>     > > up what I change to the ClassGenerator and hope this can help
> someone
>     > else.
>     > >
>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
> tongweijie178@gmail.com>
>     > > wrote:
>     > >
>     > > > The code formatting is not nice. Put them again:
>     > > >
>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>     > > MappingSet
>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
> keyExprs,
>     > > > TypedFieldId[] buildKeyFieldIds)
>     > > > throws SchemaChangeException {
>     > > > cg.setMappingSet(incomingMapping);
>     > > > if (keyExprs == null || keyExprs.length == 0) {
>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
>     > > > }
>     > > > String seedValue = "seedValue";
>     > > > String fieldId = "fieldId";
>     > > > LogicalExpression seed =
>     > > > ValueExpressions.getParameterExpression(seedValue,
> Types.required(
>     > > > TypeProtos.MinorType.INT));
>     > > >
>     > > > LogicalExpression fieldIdParamExpr =
>     > > > ValueExpressions.getParameterExpression(fieldId, Types.required(
>     > > > TypeProtos.MinorType.INT) );
>     > > > HoldingContainer fieldIdParamHolder =
> cg.addExpr(fieldIdParamExpr);
>     > > > int i = 0;
>     > > >  for (LogicalExpression expr : keyExprs) {
>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp = new
>     > > >
> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>     > > > ExpressionPosition.UNKNOWN);
>     > > >
>     > > >     JFieldRef targetBuildSideFieldId =
>     > cg.addExpr(targetBuildFieldIdExp,
>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>     > > >     JBlock ifBlock =
>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>     > > eq(targetBuildSideFieldId))._then();
>     > > >     //specify a special JBlock which is a inner one of the eval
> block
>     > to
>     > > > the ClassGenerator to substitute the returned JBlock of
> getEvalBlock()
>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
>     > > >     LogicalExpression hashExpression =
>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
> null);
>     > > >     LogicalExpression materializedExpr =
>     > > >
> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>     > > batch,
>     > > > context.getFunctionRegistry());
>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>     > > >     ifBlock._return(hash.getValue());
>     > > >     //reset the customized block to null ,so the getEvalBlock()
> return
>     > > the
>     > > > truly eval JBlock
>     > > >     cg.setCustomizedEvalInnerBlock(null);
>     > > >     i++;
>     > > >  }
>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
>     > > > }
>     > > >
>     > > >
>     > > >
>     > > >
>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
> seedValue,
>     > > int
>     > > > fieldId)
>     > > > throws SchemaChangeException
>     > > > {
>     > > > {
>     > > > IntHolder fieldId12 = new IntHolder();
>     > > > fieldId12 .value = fieldId;
>     > > > if (fieldId12 .value == constant14 .value) {
>     > > >    IntHolder out18 = new IntHolder();
>     > > >    {
>     > > >      out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>     > > >    }
>     > > >    IntHolder seedValue19 = new IntHolder();
>     > > >    seedValue19 .value = seedValue;
>     > > >    //---- start of eval portion of hash32AsDouble function.
> ----//
>     > > >    IntHolder out20 = new IntHolder();
>     > > >   {
>     > > >       final IntHolder out = new IntHolder();
>     > > >       IntHolder in = out18;
>     > > >       IntHolder seed = seedValue19;
>     > > >
>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
>     > > >       out.value =
>     > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
> in.value,
>     > > > seed.value);
>     > > >    }
>     > > >
>     > > >    out20 = out;
>     > > > }
>     > > > //---- end of eval portion of hash32AsDouble function. ----//
>     > > > return out20 .value;
>     > > > }
>     > > > return 0;
>     > > > }
>     > > > }
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
> tongweijie178@gmail.com>
>     > > > wrote:
>     > > >
>     > > >> HI Paul:
>     > > >>
>     > > >>  Thanks for your enthusiasm. I have managed this skill as you
> ever
>     > > >> mentioned me at another mail thread. It's really helpful
> ,thanks for
>     > > your
>     > > >> valuable work.
>     > > >>
>     > > >>   Now I have solved this tough problem by adding a customized
> JBlock
>     > > >> member field to the ClassGenerator. So once you want the
>     > getEvalBlock()
>     > > of
>     > > >> the ClassGenerator to return a inner customized JBlock , then
> you set
>     > > this
>     > > >> member, if you want the method to return eval self JBlock , you
> reset
>     > > this
>     > > >> member to null.
>     > > >>
>     > > >>   Here is my changed setup method :
>     > > >>
>     > > >>
>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>     > > MappingSet incomingMapping, VectorAccessible batch,
> LogicalExpression[]
>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>     > > >>   throws SchemaChangeException {
>     > > >>   cg.setMappingSet(incomingMapping);
>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
>     > > >>   }
>     > > >>   String seedValue = "seedValue";
>     > > >>   String fieldId = "fieldId";
>     > > >>   LogicalExpression seed =
>     > ValueExpressions.getParameterExpression(seedValue,
>     > > Types.required(TypeProtos.MinorType.INT));
>     > > >>
>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
>     > > getParameterExpression(fieldId, Types.required(
> TypeProtos.MinorType.INT)
>     > > );
>     > > >>   HoldingContainer fieldIdParamHolder =
> cg.addExpr(fieldIdParamExpr);
>     > > >>   int i = 0;
>     > > >>   for (LogicalExpression expr : keyExprs) {
>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp = new
>     > > ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>     > > ExpressionPosition.UNKNOWN);
>     > > >>
>     > > >>     JFieldRef targetBuildSideFieldId =
>     > cg.addExpr(targetBuildFieldIdExp,
>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
>     > > fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>     > > >>     //specify a special JBlock which is a inner one of the eval
> block
>     > > to the ClassGenerator to substitute the returned JBlock of
> getEvalBlock()
>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
>     > > >>     LogicalExpression hashExpression =
>     > HashPrelUtil.getHashExpression(expr,
>     > > seed, incomingProbe != null);
>     > > >>     LogicalExpression materializedExpr =
> ExpressionTreeMaterializer.
>     > > materializeAndCheckErrors(hashExpression, batch,
>     > > context.getFunctionRegistry());
>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>     > > >>     ifBlock._return(hash.getValue());
>     > > >>     //reset the customized block to null ,so the getEvalBlock()
> return
>     > > the truly eval JBlock
>     > > >>     cg.setCustomizedEvalInnerBlock(null);
>     > > >>     i++;
>     > > >>   }
>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
>     > > >> }
>     > > >>
>     > > >>
>     > > >>  The corresponding generated codes :
>     > > >>
>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx, int
>     > > seedValue, int fieldId)
>     > > >>         throws SchemaChangeException
>     > > >>     {
>     > > >>         {
>     > > >>             IntHolder fieldId12 = new IntHolder();
>     > > >>             fieldId12 .value = fieldId;
>     > > >>             if (fieldId12 .value == constant14 .value) {
>     > > >>                 IntHolder out18 = new IntHolder();
>     > > >>                 {
>     > > >>                     out18 .value = vv15 .getAccessor().get((
>     > > incomingRowIdx));
>     > > >>                 }
>     > > >>                 IntHolder seedValue19 = new IntHolder();
>     > > >>                 seedValue19 .value = seedValue;
>     > > >>                 //---- start of eval portion of hash32AsDouble
>     > > function. ----//
>     > > >>                 IntHolder out20 = new IntHolder();
>     > > >>                 {
>     > > >>                     final IntHolder out = new IntHolder();
>     > > >>                     IntHolder in = out18;
>     > > >>                     IntHolder seed = seedValue19;
>     > > >>
>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>     > > >>     out.value =
>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>     > > in.value, seed.value);
>     > > >> }
>     > > >>
>     > > >>                     out20 = out;
>     > > >>                 }
>     > > >>                 //---- end of eval portion of hash32AsDouble
> function.
>     > > ----//
>     > > >>                 return out20 .value;
>     > > >>             }
>     > > >>             return  0;
>     > > >>         }
>     > > >>     }
>     > > >>
>     > > >>
>     > > >>
>     > > >>   Some other explanation:
>     > > >>   1st : The if checking won't hurt the performance , as I
> invoke this
>     > > >> method column by column , so it's branch predication friendly.
>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ，since the
>     > efficient
>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid the
>     > conflict.
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
>     > <par0328@yahoo.com.invalid
>     > > >
>     > > >> wrote:
>     > > >>
>     > > >>> Hi Weijie,
>     > > >>>
>     > > >>> Seeing the discussion about the details of JCodeModel suggests
> you
>     > may
>     > > >>> be trying to debug your generated code at the level of the code
>     > > generator.
>     > > >>>
>     > > >>> Some time ago we added the ability to step through the
> generated
>     > code.
>     > > >>> Look for the following line in the generator code:
>     > > >>>
>     > > >>>
>     > > >>>     // Uncomment out this line to debug the generated code.
>     > > >>>
>     > > >>> //    cg.saveCodeForDebugging(true);
>     > > >>>
>     > > >>>
>     > > >>> Uncomment the code line and Drill will save each generated
> file to a
>     > > >>> configured location (which, if I recall correctly, is
>     > > /tmp/drill/codegen,
>     > > >>> though it may have changed after Tim's test directory changes.)
>     > > >>>
>     > > >>> Then, set a breakpoint in the template setup() method and you
> can
>     > step
>     > > >>> directly into the generated doSetup() method. Same for the
> eval()
>     > > method.
>     > > >>>
>     > > >>> This way, you can not only see the generated code, you can step
>     > through
>     > > >>> it. I've found this to be a far easier way to understand the
>     > generated
>     > > code
>     > > >>> than the older techniques folks have used (look at byte codes,
> use
>     > > print
>     > > >>> statements, brute force reasoning, etc.)
>     > > >>>
>     > > >>> Tim, Boaz and others have used this technique more recently
> and can
>     > > >>> probably give you additional pointers.
>     > > >>>
>     > > >>> Thanks,
>     > > >>> - Paul
>     > > >>>
>     > > >>>
>     > > >>>
>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
>     > > >>> tongweijie178@gmail.com> wrote:
>     > > >>>
>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you need an
>     > _else()
>     > > >>> block
>     > > >>> also ?"  I give a default return logic at the method, so I
> don't need
>     > > the
>     > > >>> _else() block.  I have noticed the IfExpression's evaluation
> method
>     > at
>     > > >>> EvaluationVisitor which also uses the JConditional . But that
> also
>     > > >>> doesn't
>     > > >>> match my requirement. I think the key point here is the
>     > > >>> FunctionHolderExpression and ValueVectorReadExpression will
> put their
>     > > >>> corresponding generated codes to the eval method's JBlock ,
> not our
>     > > >>> specific IfBlock which is a inner block of the eval method's
> JBlock .
>     > > >>>
>     > > >>> So it seems I should make some changes to the ClassGenerator
> to let
>     > the
>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
> JConditional's
>     > > then
>     > > >>> block) or implement some special FunctionHolderExpression
>     > > >>> 、ValueVectorReadExpression and corresponding visiting methods
> at the
>     > > >>> EvaluationVisitor to generate the special code blocks. Hope
> someone
>     > who
>     > > >>> are
>     > > >>> familiar with these part of codes to point out whether there
> are more
>     > > >>> easy
>     > > >>> or different choices to achieve the target.
>     > > >>>
>     > > >>> To make discussion more accurate, I put the generated codes of
> the
>     > > >>> previous
>     > > >>> setupGetBuild64Hash method here:
>     > > >>>
>     > > >>>     public long getBuild64HashCodeInner(int incomingRowIdx, int
>     > > >>> seedValue, int fieldId)
>     > > >>>         throws SchemaChangeException
>     > > >>>     {
>     > > >>>         {
>     > > >>>             IntHolder fieldId16 = new IntHolder();
>     > > >>>             fieldId16 .value = fieldId;
>     > > >>>             if (fieldId16 .value == constant18 .value) {
>     > > >>>                 return out24 .value;
>     > > >>>             }
>     > > >>>             IntHolder out22 = new IntHolder();
>     > > >>>             {
>     > > >>>                 out22 .value = vv19 .getAccessor().get((
>     > > incomingRowIdx));
>     > > >>>             }
>     > > >>>             IntHolder seedValue23 = new IntHolder();
>     > > >>>             seedValue23 .value = seedValue;
>     > > >>>             //---- start of eval portion of hash32AsDouble
> function.
>     > > >>> ----//
>     > > >>>             IntHolder out24 = new IntHolder();
>     > > >>>             {
>     > > >>>                 final IntHolder out = new IntHolder();
>     > > >>>                 IntHolder in = out22;
>     > > >>>                 IntHolder seed = seedValue23;
>     > > >>>
>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>     > > >>>     out.value =
>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>     > > >>> in.value, seed.value);
>     > > >>> }
>     > > >>>
>     > > >>>                 out24 = out;
>     > > >>>             }
>     > > >>>             //---- end of eval portion of hash32AsDouble
> function.
>     > > ----//
>     > > >>>             if (fieldId16 .value == constant18 .value) {
>     > > >>>                 return out26 .value;
>     > > >>>             }
>     > > >>>             IntHolder seedValue25 = new IntHolder();
>     > > >>>             seedValue25 .value = seedValue;
>     > > >>>             //---- start of eval portion of hash32AsDouble
> function.
>     > > >>> ----//
>     > > >>>             IntHolder out26 = new IntHolder();
>     > > >>>             {
>     > > >>>                 final IntHolder out = new IntHolder();
>     > > >>>                 IntHolder in = out22;
>     > > >>>                 IntHolder seed = seedValue25;
>     > > >>>
>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>     > > >>>     out.value =
>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>     > > >>> in.value, seed.value);
>     > > >>> }
>     > > >>>
>     > > >>>                 out26 = out;
>     > > >>>             }
>     > > >>>             //---- end of eval portion of hash32AsDouble
> function.
>     > > ----//
>     > > >>>             return  0;
>     > > >>>         }
>     > > >>>     }
>     > > >>>
>     > > >>>
>     > > >>>
>     > > >>>
>     > > >>>
>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
> amansinha@apache.org>
>     > > >>> wrote:
>     > > >>>
>     > > >>> > sorry, the previous email is incomplete.
>     > > >>> > For the ifBlock, do you need an _else() block also ?
>     > > >>> >
>     > > >>> > I have sometimes found that 'JConditional' is a good way to
> break
>     > > down
>     > > >>> the
>     > > >>> > logic further.  Please see example usages of JConditional
> here [1].
>     > > >>> >
>     > > >>> > -Aman
>     > > >>> >
>     > > >>> > [1]
>     > > >>> >
>     > > >>> >
>     > > >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
> .
>     > > sun.codemodel.JBlock
>     > > >>> >
>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
> amansinha@apache.org>
>     > > >>> wrote:
>     > > >>> >
>     > > >>> > > Hi Weijie,
>     > > >>> > > It would be a little cumbersome to debug such issues over
> email
>     > > >>> since one
>     > > >>> > > has to look at the generated code output and iteratively
> debug.
>     > > >>> > > Couple of thoughts I have that might help:
>     > > >>> > >
>     > > >>> > > For this particular if-then block, should you also
>     > > >>> > > JBlock ifBlock =
>     > > >>> > >
> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>     > > >>> > > tBuildSideFieldId))._then();
>     > > >>> > >
>     > > >>> > >
>     > > >>> > >
>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
>     > > >>> tongweijie178@gmail.com>
>     > > >>> > > wrote:
>     > > >>> > >
>     > > >>> > >> HI All:
>     > > >>> > >>  Through implementing the JPPD feature (
>     > > >>> > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
> , I was
>     > blocked
>     > > >>> by
>     > > >>> > the
>     > > >>> > >> problem: how to get the hash code of each build side of
> the hash
>     > > >>> join
>     > > >>> > >> columns through the dynamic generated java code. Hope
> someone
>     > can
>     > > >>> give
>     > > >>> > >> some
>     > > >>> > >> advice.
>     > > >>> > >>
>     > > >>> > >>    I supposed to add methods as below to the
> HashTableTemplate :
>     > > >>> > >>
>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int
>     > seedValue,
>     > > >>> int
>     > > >>> > >> fieldId) throws SchemaChangeException{
>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
> seedValue,
>     > > >>> fieldId);
>     > > >>> > >> }
>     > > >>> > >>
>     > > >>> > >> protected abstract long
>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int
>     > > incomingRowIdx,
>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId") int
>     > fieldId)
>     > > >>> > >> throws SchemaChangeException;
>     > > >>> > >>
>     > > >>> > >>
>     > > >>> > >>    The high level code to invoke the getBuild64HashCode
> method
>     > is
>     > > >>> at the
>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
>     > > >>> > >>
>     > > >>> > >> //create runtime filter
>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) {
>     > > >>> > >>  //create runtime filter and send out async
>     > > >>> > >>  int condFieldIndex = 0;
>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters) {
>     > > >>> > >>    //VV
>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount; ind++) {
>     > > >>> > >>      long hashCode = partitions[0].getBuild64HashCode(ind,
>     > > >>> > >> condFieldIndex);
>     > > >>> > >>      bloomFilter.insert(hashCode);
>     > > >>> > >>    }
>     > > >>> > >>    condFieldIndex++;
>     > > >>> > >>  }
>     > > >>> > >>  //TODO sered out async
>     > > >>> > >> }
>     > > >>> > >>
>     > > >>> > >>
>     > > >>> > >>  As you know, the abstract method getBuild64HashCodeInner
> needs
>     > to
>     > > >>> > >> calculate the hash codes of each build side column by the
>     > fieldId
>     > > >>> input
>     > > >>> > >> parameter. In order to achieve this target, I plan to have
>     > > different
>     > > >>> > >> solving parts corresponding to different column
> ValueVector ,
>     > > using
>     > > >>> the
>     > > >>> > if
>     > > >>> > >> statement to distinguish different solving parts through
> the id
>     > of
>     > > >>> the
>     > > >>> > >> column.  The corresponding method to generate the dynamic
> codes
>     > > is
>     > > >>> as
>     > > >>> > >> below:
>     > > >>> > >>
>     > > >>> > >> private void
> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>     > > >>> > >> MappingSet incomingMapping, VectorAccessible batch,
>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
> buildKeyFieldIds)
>     > > >>> > >>  throws SchemaChangeException {
>     > > >>> > >>  cg.setMappingSet(incomingMapping);
>     > > >>> > >>  if (keyExprs == null || keyExprs.length == 0) {
>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
>     > > >>> > >>  }
>     > > >>> > >>  String seedValue = "seedValue";
>     > > >>> > >>  String fieldId = "fieldId";
>     > > >>> > >>  LogicalExpression seed =
>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>     > > >>> > >>
>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
>     > > cg.addExpr(fieldIdParamExpr);
>     > > >>> > >>  int i = 0;
>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
>     > > >>> > >>    TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>     > > >>> > >>    ValueExpressions.IntExpression targetBuildFieldIdExp =
> new
>     > > >>> > >>
> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>     > > )[0],
>     > > >>> > >> ExpressionPosition.UNKNOWN);
>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>     > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>     > > >>> > >>    JBlock ifBlock =
>     > > >>> > >>
> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>     > > >>> > >> tBuildSideFieldId))._then();
>     > > >>> > >>
>     > > >>> > >>    LogicalExpression hashExpression =
>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed, incomingProbe
> !=
>     > > null);
>     > > >>> > >>    LogicalExpression materializedExpr =
>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>     > > hashExpression,
>     > > >>> > >> batch, context.getFunctionRegistry());
>     > > >>> > >>    HoldingContainer hash = cg.addExpr(materializedExpr,
>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>     > > >>> > >>
>     > > >>> > >>
>     > > >>> > >>    ifBlock._return(hash.getValue());
>     > > >>> > >>    i++;
>     > > >>> > >>  }
>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
>     > > >>> > >>
>     > > >>> > >> }
>     > > >>> > >>
>     > > >>> > >> But unfortunately, the generated codes are not what I
> expected.
>     > > The
>     > > >>> > codes
>     > > >>> > >> to read ValueVector , calculate hash code of the read
> value do
>     > not
>     > > >>> stay
>     > > >>> > in
>     > > >>> > >> the if block.  So how can I let the related codes stay in
> the if
>     > > >>> block ?
>     > > >>> > >>
>     > > >>> > >
>     > > >>> > >
>     > > >>> >
>     > > >>
>     > > >>
>     > >
>     >
>
>
>

Re: How to generate hash code for each build side one of the hash join columns

Posted by weijie tong <to...@gmail.com>.

Thanks Sorabh,I got it. Subsequent calls of next() keep to share the same
RecordBatch objects. Only data content of RecordBatch e.g. vector container
changed if there’s no schema changed happened.


On Fri, Jun 1, 2018 at 10:44 PM Sorabh Hamirwasia <sh...@mapr.com>
wrote:

> Your understanding about SelectionVector2 is correct. Internally it holds
> a DrillBuff which stores all the indexes of interest.
>
>
> The reason why setup of SV2 is done only once in ProjectTemplate::setup is
> because for subsequent next() call's from Project to Filter, each new
> incoming will have same object reference of SV2 java object. Only the
> underlying buffer for the SV2 object will change to store new indexes for
> new incoming batch. It's only after OK_NEW_SCHEMA outcome is seen when
> setup will be again called.
>
>
> Thanks,
> Sorabh
>
>
> ________________________________
> From: weijie tong <to...@gmail.com>
> Sent: Friday, June 1, 2018 7:27 AM
> To: dev@drill.apache.org
> Subject: Re: How to generate hash code for each build side one of the hash
> join columns
>
> Could someone explain the theory of SelectionVector2 ? I was confused about
> the code implementation. I think it acts as an indirect index to the
> filtered RecordBatch. To FilterRecordBatch, it filters the RecordBatch and
> the satisfied row index will be storied in the SelectionVector2. To
> ProjectRecordBatch, it uses the incoming RecordBatch's SelectionVector2's
> index to access the filtered RecordBatch data. The operated memory batch
> between the Filter and Project are the same by leverage the
> SelectionVector2 to access actual data.  If I was right ,then I found a
> confused case that the SelectionVector2 of ProjectRecordBatch was
> initialized only one time at ProjectTemplate's setup method.  Hope someone
> could give a explanation why ProjectTemplate not get a fresh
> SelectionVector2 of incoming batch (e.g. generated from the
> FilterRecordBatch) every next call.
>
> On Fri, Jun 1, 2018 at 5:14 PM weijie tong <to...@gmail.com>
> wrote:
>
> > I find the answer that RecordBatch's max size is 2^16 which is defined at
> > RecordBatch's MAX_BATCH_SIZE.
> >
> > On Fri, Jun 1, 2018 at 3:36 PM weijie tong <to...@gmail.com>
> > wrote:
> >
> >> Some questions about SelectionVector2 and SelectionVector4:
> >>
> >> I want to create SelectionVector4 or SelectionVector2 to represent the
> >> filtered ScanBatch to avoid memory copy. But I found the ProjectBatch
> does
> >> not support SelectVector4 . And the SelectionVector2's record count
> size is
> >> char type size .  So why SelectionVector4 is not supported by the
> >> ProjectBatch ? The same question is to the FilterBatch's SelectVector2
> >> which also only support the 2 Byte size record count.
> >>
> >> On Fri, Jun 1, 2018 at 1:40 PM weijie tong <to...@gmail.com>
> >> wrote:
> >>
> >>> Hi Boaz:
> >>>
> >>>   Your propose is valuable though I have implemented the dynamic
> >>> generating code logic.  If a  ``` long hash64(int index, long seed) ```
> >>> method is added to the ValueVector , it will also benefit others to
> >>> implement specific storage plugin's filter logic by using the pushed
> down
> >>> bloom filter.  To HashJoin and HashAggregate , methods ```double
> >>> hash32AsDouble(int index, int seed) ``` and ```int hash32(int index,
> int
> >>> seed)```  will also be needed to the ValueVector.  If no one else gives
> >>> objection , I will be pleasure to take this work.
> >>>
> >>>    Btw, I will share my thought about the scan side's filter logic by
> >>> the BloomFilter. The scan side filter logic here I supposed to do is to
> >>> filter the materialized ValueVector ,not at the process to construct
> the
> >>> ValueVector from the original storage format data. The reason is the
> >>> checking logic will break down the performance to materialize the
> original
> >>> deep storage format data to ValueVector.
> >>>
> >>> On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <bb...@mapr.com> wrote:
> >>>
> >>>>  Hi Weijie,
> >>>>
> >>>>     Another option is to totally avoid the generated code.
> >>>> We were considering the idea of replacing the generated code used for
> >>>> computing hash values with “real java” code.
> >>>>
> >>>> This idea is analogous to the usage of the copyEntry() method in the
> >>>> ValueVector interface (that Paul added last year).
> >>>> See an example of using the copyEntry() (via the appendRow() in
> >>>> VectorContainer) in the new Hash-Join-Spill code.
> >>>> Basically no need to generate “type specific” code, as the virtual
> >>>> copyEntry() method does the “type specific” work.
> >>>>
> >>>> Similarly we could have a hash64() method in ValueVector, which would
> >>>> perform the “type specific” computation.
> >>>> (One difference from copyEntry() – the hash64() would also need to
> take
> >>>> the “seed” parameter, which is the hash value produced by the previous
> >>>> hash).
> >>>> And similar to appendRow(), there would be evalHash() iterating over
> >>>> the key columns.
> >>>> (And one difference from appendRow() – need to iterate only on the key
> >>>> columns; these are the first columns; their number can be found from
> the
> >>>> config: e.g., htConfig.getKeyExprsBuild().size() )
> >>>>
> >>>>    With such implementation, that evalHash() could be used anywhere
> >>>> (e.g., to match the Bloom filters on the left side of the join).
> >>>>
> >>>>        Thanks,
> >>>>
> >>>>              Boaz
> >>>>
> >>>>
> >>>> On 5/30/18, 7:49 PM, "weijie tong" <to...@gmail.com> wrote:
> >>>>
> >>>>     Hi Aman:
> >>>>
> >>>>       Thanks for your tips. I have rebased the latest code from the
> >>>> master
> >>>>     branch . Yes, the spill-to-disk feature does changed the original
> >>>>     implementation. I have adjusted my implementation according to the
> >>>> new
> >>>>     feature. But as you say, it will take some challenge to
> integration
> >>>> as I
> >>>>     noticed the spill-to-disk feature will continue to tune its
> >>>> implementation
> >>>>     performance.
> >>>>
> >>>>       The BloomFilter was implemented natively in Drill , not an
> >>>> external
> >>>>     library. It's implemented the algorithm of the paper which was
> >>>> mentioned by
> >>>>     you.
> >>>>
> >>>>
> >>>>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <am...@apache.org>
> >>>> wrote:
> >>>>
> >>>>     > Hi Weijie,
> >>>>     > I was hoping you could leverage the existing methods..so its
> good
> >>>> that you
> >>>>     > found the ones that work for your use case.
> >>>>     > One thing I want to point out (maybe you're already aware) ..
> the
> >>>> Hash Join
> >>>>     > code has changed significantly in the master branch due to the
> >>>>     > spill-to-disk feature.
> >>>>     > So, this may pose some integration challenges for your run-time
> >>>> join
> >>>>     > pushdown feature.
> >>>>     > Also, one other question/clarification:  for the bloom filter
> >>>> itself are
> >>>>     > you implementing it natively in Drill or using an external
> >>>> library ?
> >>>>     >
> >>>>     > -Aman
> >>>>     >
> >>>>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
> >>>> tongweijie178@gmail.com>
> >>>>     > wrote:
> >>>>     >
> >>>>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
> >>>>     > unNestEvalBlock()
> >>>>     > > which has the same effect to what I change to the
> >>>> ClassGenerator. So I
> >>>>     > give
> >>>>     > > up what I change to the ClassGenerator and hope this can help
> >>>> someone
> >>>>     > else.
> >>>>     > >
> >>>>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
> >>>> tongweijie178@gmail.com>
> >>>>     > > wrote:
> >>>>     > >
> >>>>     > > > The code formatting is not nice. Put them again:
> >>>>     > > >
> >>>>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable>
> cg,
> >>>>     > > MappingSet
> >>>>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
> >>>> keyExprs,
> >>>>     > > > TypedFieldId[] buildKeyFieldIds)
> >>>>     > > > throws SchemaChangeException {
> >>>>     > > > cg.setMappingSet(incomingMapping);
> >>>>     > > > if (keyExprs == null || keyExprs.length == 0) {
> >>>>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
> >>>>     > > > }
> >>>>     > > > String seedValue = "seedValue";
> >>>>     > > > String fieldId = "fieldId";
> >>>>     > > > LogicalExpression seed =
> >>>>     > > > ValueExpressions.getParameterExpression(seedValue,
> >>>> Types.required(
> >>>>     > > > TypeProtos.MinorType.INT));
> >>>>     > > >
> >>>>     > > > LogicalExpression fieldIdParamExpr =
> >>>>     > > > ValueExpressions.getParameterExpression(fieldId,
> >>>> Types.required(
> >>>>     > > > TypeProtos.MinorType.INT) );
> >>>>     > > > HoldingContainer fieldIdParamHolder =
> >>>> cg.addExpr(fieldIdParamExpr);
> >>>>     > > > int i = 0;
> >>>>     > > >  for (LogicalExpression expr : keyExprs) {
> >>>>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
> >>>>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp =
> >>>> new
> >>>>     > > >
> >>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
> >>>>     > > > ExpressionPosition.UNKNOWN);
> >>>>     > > >
> >>>>     > > >     JFieldRef targetBuildSideFieldId =
> >>>>     > cg.addExpr(targetBuildFieldIdExp,
> >>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
> >>>>     > > >     JBlock ifBlock =
> >>>>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
> >>>>     > > eq(targetBuildSideFieldId))._then();
> >>>>     > > >     //specify a special JBlock which is a inner one of the
> >>>> eval block
> >>>>     > to
> >>>>     > > > the ClassGenerator to substitute the returned JBlock of
> >>>> getEvalBlock()
> >>>>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
> >>>>     > > >     LogicalExpression hashExpression =
> >>>>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
> >>>> null);
> >>>>     > > >     LogicalExpression materializedExpr =
> >>>>     > > >
> >>>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
> >>>>     > > batch,
> >>>>     > > > context.getFunctionRegistry());
> >>>>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
> >>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
> >>>>     > > >     ifBlock._return(hash.getValue());
> >>>>     > > >     //reset the customized block to null ,so the
> >>>> getEvalBlock() return
> >>>>     > > the
> >>>>     > > > truly eval JBlock
> >>>>     > > >     cg.setCustomizedEvalInnerBlock(null);
> >>>>     > > >     i++;
> >>>>     > > >  }
> >>>>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
> >>>>     > > > }
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
> >>>> seedValue,
> >>>>     > > int
> >>>>     > > > fieldId)
> >>>>     > > > throws SchemaChangeException
> >>>>     > > > {
> >>>>     > > > {
> >>>>     > > > IntHolder fieldId12 = new IntHolder();
> >>>>     > > > fieldId12 .value = fieldId;
> >>>>     > > > if (fieldId12 .value == constant14 .value) {
> >>>>     > > >    IntHolder out18 = new IntHolder();
> >>>>     > > >    {
> >>>>     > > >      out18 .value = vv15
> .getAccessor().get((incomingRowIdx));
> >>>>     > > >    }
> >>>>     > > >    IntHolder seedValue19 = new IntHolder();
> >>>>     > > >    seedValue19 .value = seedValue;
> >>>>     > > >    //---- start of eval portion of hash32AsDouble function.
> >>>> ----//
> >>>>     > > >    IntHolder out20 = new IntHolder();
> >>>>     > > >   {
> >>>>     > > >       final IntHolder out = new IntHolder();
> >>>>     > > >       IntHolder in = out18;
> >>>>     > > >       IntHolder seed = seedValue19;
> >>>>     > > >
> >>>>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
> >>>>     > > >       out.value =
> >>>>     > > >
> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
> >>>> in.value,
> >>>>     > > > seed.value);
> >>>>     > > >    }
> >>>>     > > >
> >>>>     > > >    out20 = out;
> >>>>     > > > }
> >>>>     > > > //---- end of eval portion of hash32AsDouble function.
> ----//
> >>>>     > > > return out20 .value;
> >>>>     > > > }
> >>>>     > > > return 0;
> >>>>     > > > }
> >>>>     > > > }
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > >
> >>>>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
> >>>> tongweijie178@gmail.com>
> >>>>     > > > wrote:
> >>>>     > > >
> >>>>     > > >> HI Paul:
> >>>>     > > >>
> >>>>     > > >>  Thanks for your enthusiasm. I have managed this skill as
> >>>> you ever
> >>>>     > > >> mentioned me at another mail thread. It's really helpful
> >>>> ,thanks for
> >>>>     > > your
> >>>>     > > >> valuable work.
> >>>>     > > >>
> >>>>     > > >>   Now I have solved this tough problem by adding a
> >>>> customized JBlock
> >>>>     > > >> member field to the ClassGenerator. So once you want the
> >>>>     > getEvalBlock()
> >>>>     > > of
> >>>>     > > >> the ClassGenerator to return a inner customized JBlock ,
> >>>> then you set
> >>>>     > > this
> >>>>     > > >> member, if you want the method to return eval self JBlock ,
> >>>> you reset
> >>>>     > > this
> >>>>     > > >> member to null.
> >>>>     > > >>
> >>>>     > > >>   Here is my changed setup method :
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable>
> >>>> cg,
> >>>>     > > MappingSet incomingMapping, VectorAccessible batch,
> >>>> LogicalExpression[]
> >>>>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
> >>>>     > > >>   throws SchemaChangeException {
> >>>>     > > >>   cg.setMappingSet(incomingMapping);
> >>>>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
> >>>>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
> >>>>     > > >>   }
> >>>>     > > >>   String seedValue = "seedValue";
> >>>>     > > >>   String fieldId = "fieldId";
> >>>>     > > >>   LogicalExpression seed =
> >>>>     > ValueExpressions.getParameterExpression(seedValue,
> >>>>     > > Types.required(TypeProtos.MinorType.INT));
> >>>>     > > >>
> >>>>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
> >>>>     > > getParameterExpression(fieldId, Types.required(
> >>>> TypeProtos.MinorType.INT)
> >>>>     > > );
> >>>>     > > >>   HoldingContainer fieldIdParamHolder =
> >>>> cg.addExpr(fieldIdParamExpr);
> >>>>     > > >>   int i = 0;
> >>>>     > > >>   for (LogicalExpression expr : keyExprs) {
> >>>>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
> >>>>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp =
> >>>> new
> >>>>     > >
> >>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
> >>>>     > > ExpressionPosition.UNKNOWN);
> >>>>     > > >>
> >>>>     > > >>     JFieldRef targetBuildSideFieldId =
> >>>>     > cg.addExpr(targetBuildFieldIdExp,
> >>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
> >>>>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
> >>>>     > >
> >>>> fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
> >>>>     > > >>     //specify a special JBlock which is a inner one of the
> >>>> eval block
> >>>>     > > to the ClassGenerator to substitute the returned JBlock of
> >>>> getEvalBlock()
> >>>>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
> >>>>     > > >>     LogicalExpression hashExpression =
> >>>>     > HashPrelUtil.getHashExpression(expr,
> >>>>     > > seed, incomingProbe != null);
> >>>>     > > >>     LogicalExpression materializedExpr =
> >>>> ExpressionTreeMaterializer.
> >>>>     > > materializeAndCheckErrors(hashExpression, batch,
> >>>>     > > context.getFunctionRegistry());
> >>>>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
> >>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
> >>>>     > > >>     ifBlock._return(hash.getValue());
> >>>>     > > >>     //reset the customized block to null ,so the
> >>>> getEvalBlock() return
> >>>>     > > the truly eval JBlock
> >>>>     > > >>     cg.setCustomizedEvalInnerBlock(null);
> >>>>     > > >>     i++;
> >>>>     > > >>   }
> >>>>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
> >>>>     > > >> }
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>  The corresponding generated codes :
> >>>>     > > >>
> >>>>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx,
> >>>> int
> >>>>     > > seedValue, int fieldId)
> >>>>     > > >>         throws SchemaChangeException
> >>>>     > > >>     {
> >>>>     > > >>         {
> >>>>     > > >>             IntHolder fieldId12 = new IntHolder();
> >>>>     > > >>             fieldId12 .value = fieldId;
> >>>>     > > >>             if (fieldId12 .value == constant14 .value) {
> >>>>     > > >>                 IntHolder out18 = new IntHolder();
> >>>>     > > >>                 {
> >>>>     > > >>                     out18 .value = vv15
> .getAccessor().get((
> >>>>     > > incomingRowIdx));
> >>>>     > > >>                 }
> >>>>     > > >>                 IntHolder seedValue19 = new IntHolder();
> >>>>     > > >>                 seedValue19 .value = seedValue;
> >>>>     > > >>                 //---- start of eval portion of
> >>>> hash32AsDouble
> >>>>     > > function. ----//
> >>>>     > > >>                 IntHolder out20 = new IntHolder();
> >>>>     > > >>                 {
> >>>>     > > >>                     final IntHolder out = new IntHolder();
> >>>>     > > >>                     IntHolder in = out18;
> >>>>     > > >>                     IntHolder seed = seedValue19;
> >>>>     > > >>
> >>>>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
> >>>>     > > >>     out.value =
> >>>>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
> >>>>     > > in.value, seed.value);
> >>>>     > > >> }
> >>>>     > > >>
> >>>>     > > >>                     out20 = out;
> >>>>     > > >>                 }
> >>>>     > > >>                 //---- end of eval portion of
> hash32AsDouble
> >>>> function.
> >>>>     > > ----//
> >>>>     > > >>                 return out20 .value;
> >>>>     > > >>             }
> >>>>     > > >>             return  0;
> >>>>     > > >>         }
> >>>>     > > >>     }
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>   Some other explanation:
> >>>>     > > >>   1st : The if checking won't hurt the performance , as I
> >>>> invoke this
> >>>>     > > >> method column by column , so it's branch predication
> >>>> friendly.
> >>>>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ，since
> >>>> the
> >>>>     > efficient
> >>>>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid
> >>>> the
> >>>>     > conflict.
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
> >>>>     > <par0328@yahoo.com.invalid
> >>>>     > > >
> >>>>     > > >> wrote:
> >>>>     > > >>
> >>>>     > > >>> Hi Weijie,
> >>>>     > > >>>
> >>>>     > > >>> Seeing the discussion about the details of JCodeModel
> >>>> suggests you
> >>>>     > may
> >>>>     > > >>> be trying to debug your generated code at the level of the
> >>>> code
> >>>>     > > generator.
> >>>>     > > >>>
> >>>>     > > >>> Some time ago we added the ability to step through the
> >>>> generated
> >>>>     > code.
> >>>>     > > >>> Look for the following line in the generator code:
> >>>>     > > >>>
> >>>>     > > >>>
> >>>>     > > >>>     // Uncomment out this line to debug the generated
> code.
> >>>>     > > >>>
> >>>>     > > >>> //    cg.saveCodeForDebugging(true);
> >>>>     > > >>>
> >>>>     > > >>>
> >>>>     > > >>> Uncomment the code line and Drill will save each generated
> >>>> file to a
> >>>>     > > >>> configured location (which, if I recall correctly, is
> >>>>     > > /tmp/drill/codegen,
> >>>>     > > >>> though it may have changed after Tim's test directory
> >>>> changes.)
> >>>>     > > >>>
> >>>>     > > >>> Then, set a breakpoint in the template setup() method and
> >>>> you can
> >>>>     > step
> >>>>     > > >>> directly into the generated doSetup() method. Same for the
> >>>> eval()
> >>>>     > > method.
> >>>>     > > >>>
> >>>>     > > >>> This way, you can not only see the generated code, you can
> >>>> step
> >>>>     > through
> >>>>     > > >>> it. I've found this to be a far easier way to understand
> the
> >>>>     > generated
> >>>>     > > code
> >>>>     > > >>> than the older techniques folks have used (look at byte
> >>>> codes, use
> >>>>     > > print
> >>>>     > > >>> statements, brute force reasoning, etc.)
> >>>>     > > >>>
> >>>>     > > >>> Tim, Boaz and others have used this technique more
> recently
> >>>> and can
> >>>>     > > >>> probably give you additional pointers.
> >>>>     > > >>>
> >>>>     > > >>> Thanks,
> >>>>     > > >>> - Paul
> >>>>     > > >>>
> >>>>     > > >>>
> >>>>     > > >>>
> >>>>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
> >>>>     > > >>> tongweijie178@gmail.com> wrote:
> >>>>     > > >>>
> >>>>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you
> need
> >>>> an
> >>>>     > _else()
> >>>>     > > >>> block
> >>>>     > > >>> also ?"  I give a default return logic at the method, so I
> >>>> don't need
> >>>>     > > the
> >>>>     > > >>> _else() block.  I have noticed the IfExpression's
> >>>> evaluation method
> >>>>     > at
> >>>>     > > >>> EvaluationVisitor which also uses the JConditional . But
> >>>> that also
> >>>>     > > >>> doesn't
> >>>>     > > >>> match my requirement. I think the key point here is the
> >>>>     > > >>> FunctionHolderExpression and ValueVectorReadExpression
> will
> >>>> put their
> >>>>     > > >>> corresponding generated codes to the eval method's JBlock
> ,
> >>>> not our
> >>>>     > > >>> specific IfBlock which is a inner block of the eval
> >>>> method's JBlock .
> >>>>     > > >>>
> >>>>     > > >>> So it seems I should make some changes to the
> >>>> ClassGenerator to let
> >>>>     > the
> >>>>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
> >>>> JConditional's
> >>>>     > > then
> >>>>     > > >>> block) or implement some special FunctionHolderExpression
> >>>>     > > >>> 、ValueVectorReadExpression and corresponding visiting
> >>>> methods at the
> >>>>     > > >>> EvaluationVisitor to generate the special code blocks.
> Hope
> >>>> someone
> >>>>     > who
> >>>>     > > >>> are
> >>>>     > > >>> familiar with these part of codes to point out whether
> >>>> there are more
> >>>>     > > >>> easy
> >>>>     > > >>> or different choices to achieve the target.
> >>>>     > > >>>
> >>>>     > > >>> To make discussion more accurate, I put the generated
> codes
> >>>> of the
> >>>>     > > >>> previous
> >>>>     > > >>> setupGetBuild64Hash method here:
> >>>>     > > >>>
> >>>>     > > >>>     public long getBuild64HashCodeInner(int
> incomingRowIdx,
> >>>> int
> >>>>     > > >>> seedValue, int fieldId)
> >>>>     > > >>>         throws SchemaChangeException
> >>>>     > > >>>     {
> >>>>     > > >>>         {
> >>>>     > > >>>             IntHolder fieldId16 = new IntHolder();
> >>>>     > > >>>             fieldId16 .value = fieldId;
> >>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
> >>>>     > > >>>                 return out24 .value;
> >>>>     > > >>>             }
> >>>>     > > >>>             IntHolder out22 = new IntHolder();
> >>>>     > > >>>             {
> >>>>     > > >>>                 out22 .value = vv19 .getAccessor().get((
> >>>>     > > incomingRowIdx));
> >>>>     > > >>>             }
> >>>>     > > >>>             IntHolder seedValue23 = new IntHolder();
> >>>>     > > >>>             seedValue23 .value = seedValue;
> >>>>     > > >>>             //---- start of eval portion of hash32AsDouble
> >>>> function.
> >>>>     > > >>> ----//
> >>>>     > > >>>             IntHolder out24 = new IntHolder();
> >>>>     > > >>>             {
> >>>>     > > >>>                 final IntHolder out = new IntHolder();
> >>>>     > > >>>                 IntHolder in = out22;
> >>>>     > > >>>                 IntHolder seed = seedValue23;
> >>>>     > > >>>
> >>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
> >>>>     > > >>>     out.value =
> >>>>     > > >>>
> >>>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
> >>>>     > > >>> in.value, seed.value);
> >>>>     > > >>> }
> >>>>     > > >>>
> >>>>     > > >>>                 out24 = out;
> >>>>     > > >>>             }
> >>>>     > > >>>             //---- end of eval portion of hash32AsDouble
> >>>> function.
> >>>>     > > ----//
> >>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
> >>>>     > > >>>                 return out26 .value;
> >>>>     > > >>>             }
> >>>>     > > >>>             IntHolder seedValue25 = new IntHolder();
> >>>>     > > >>>             seedValue25 .value = seedValue;
> >>>>     > > >>>             //---- start of eval portion of hash32AsDouble
> >>>> function.
> >>>>     > > >>> ----//
> >>>>     > > >>>             IntHolder out26 = new IntHolder();
> >>>>     > > >>>             {
> >>>>     > > >>>                 final IntHolder out = new IntHolder();
> >>>>     > > >>>                 IntHolder in = out22;
> >>>>     > > >>>                 IntHolder seed = seedValue25;
> >>>>     > > >>>
> >>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
> >>>>     > > >>>     out.value =
> >>>>     > > >>>
> >>>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
> >>>>     > > >>> in.value, seed.value);
> >>>>     > > >>> }
> >>>>     > > >>>
> >>>>     > > >>>                 out26 = out;
> >>>>     > > >>>             }
> >>>>     > > >>>             //---- end of eval portion of hash32AsDouble
> >>>> function.
> >>>>     > > ----//
> >>>>     > > >>>             return  0;
> >>>>     > > >>>         }
> >>>>     > > >>>     }
> >>>>     > > >>>
> >>>>     > > >>>
> >>>>     > > >>>
> >>>>     > > >>>
> >>>>     > > >>>
> >>>>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
> >>>> amansinha@apache.org>
> >>>>     > > >>> wrote:
> >>>>     > > >>>
> >>>>     > > >>> > sorry, the previous email is incomplete.
> >>>>     > > >>> > For the ifBlock, do you need an _else() block also ?
> >>>>     > > >>> >
> >>>>     > > >>> > I have sometimes found that 'JConditional' is a good way
> >>>> to break
> >>>>     > > down
> >>>>     > > >>> the
> >>>>     > > >>> > logic further.  Please see example usages of
> JConditional
> >>>> here [1].
> >>>>     > > >>> >
> >>>>     > > >>> > -Aman
> >>>>     > > >>> >
> >>>>     > > >>> > [1]
> >>>>     > > >>> >
> >>>>     > > >>> >
> >>>>     > > >>>
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
> >>>> .
> >>>>     > > sun.codemodel.JBlock
> >>>>     > > >>> >
> >>>>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
> >>>> amansinha@apache.org>
> >>>>     > > >>> wrote:
> >>>>     > > >>> >
> >>>>     > > >>> > > Hi Weijie,
> >>>>     > > >>> > > It would be a little cumbersome to debug such issues
> >>>> over email
> >>>>     > > >>> since one
> >>>>     > > >>> > > has to look at the generated code output and
> >>>> iteratively debug.
> >>>>     > > >>> > > Couple of thoughts I have that might help:
> >>>>     > > >>> > >
> >>>>     > > >>> > > For this particular if-then block, should you also
> >>>>     > > >>> > > JBlock ifBlock =
> >>>>     > > >>> > >
> >>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
> >>>>     > > >>> > > tBuildSideFieldId))._then();
> >>>>     > > >>> > >
> >>>>     > > >>> > >
> >>>>     > > >>> > >
> >>>>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
> >>>>     > > >>> tongweijie178@gmail.com>
> >>>>     > > >>> > > wrote:
> >>>>     > > >>> > >
> >>>>     > > >>> > >> HI All:
> >>>>     > > >>> > >>  Through implementing the JPPD feature (
> >>>>     > > >>> > >>
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=
> )
> >>>> , I was
> >>>>     > blocked
> >>>>     > > >>> by
> >>>>     > > >>> > the
> >>>>     > > >>> > >> problem: how to get the hash code of each build side
> >>>> of the hash
> >>>>     > > >>> join
> >>>>     > > >>> > >> columns through the dynamic generated java code. Hope
> >>>> someone
> >>>>     > can
> >>>>     > > >>> give
> >>>>     > > >>> > >> some
> >>>>     > > >>> > >> advice.
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>    I supposed to add methods as below to the
> >>>> HashTableTemplate :
> >>>>     > > >>> > >>
> >>>>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx,
> int
> >>>>     > seedValue,
> >>>>     > > >>> int
> >>>>     > > >>> > >> fieldId) throws SchemaChangeException{
> >>>>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
> >>>> seedValue,
> >>>>     > > >>> fieldId);
> >>>>     > > >>> > >> }
> >>>>     > > >>> > >>
> >>>>     > > >>> > >> protected abstract long
> >>>>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int
> >>>>     > > incomingRowIdx,
> >>>>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId")
> >>>> int
> >>>>     > fieldId)
> >>>>     > > >>> > >> throws SchemaChangeException;
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>    The high level code to invoke the
> >>>> getBuild64HashCode method
> >>>>     > is
> >>>>     > > >>> at the
> >>>>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
> >>>>     > > >>> > >>
> >>>>     > > >>> > >> //create runtime filter
> >>>>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) {
> >>>>     > > >>> > >>  //create runtime filter and send out async
> >>>>     > > >>> > >>  int condFieldIndex = 0;
> >>>>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters) {
> >>>>     > > >>> > >>    //VV
> >>>>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount;
> ind++) {
> >>>>     > > >>> > >>      long hashCode =
> >>>> partitions[0].getBuild64HashCode(ind,
> >>>>     > > >>> > >> condFieldIndex);
> >>>>     > > >>> > >>      bloomFilter.insert(hashCode);
> >>>>     > > >>> > >>    }
> >>>>     > > >>> > >>    condFieldIndex++;
> >>>>     > > >>> > >>  }
> >>>>     > > >>> > >>  //TODO sered out async
> >>>>     > > >>> > >> }
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>  As you know, the abstract method
> >>>> getBuild64HashCodeInner needs
> >>>>     > to
> >>>>     > > >>> > >> calculate the hash codes of each build side column by
> >>>> the
> >>>>     > fieldId
> >>>>     > > >>> input
> >>>>     > > >>> > >> parameter. In order to achieve this target, I plan to
> >>>> have
> >>>>     > > different
> >>>>     > > >>> > >> solving parts corresponding to different column
> >>>> ValueVector ,
> >>>>     > > using
> >>>>     > > >>> the
> >>>>     > > >>> > if
> >>>>     > > >>> > >> statement to distinguish different solving parts
> >>>> through the id
> >>>>     > of
> >>>>     > > >>> the
> >>>>     > > >>> > >> column.  The corresponding method to generate the
> >>>> dynamic codes
> >>>>     > > is
> >>>>     > > >>> as
> >>>>     > > >>> > >> below:
> >>>>     > > >>> > >>
> >>>>     > > >>> > >> private void
> >>>> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
> >>>>     > > >>> > >> MappingSet incomingMapping, VectorAccessible batch,
> >>>>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
> >>>> buildKeyFieldIds)
> >>>>     > > >>> > >>  throws SchemaChangeException {
> >>>>     > > >>> > >>  cg.setMappingSet(incomingMapping);
> >>>>     > > >>> > >>  if (keyExprs == null || keyExprs.length == 0) {
> >>>>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
> >>>>     > > >>> > >>  }
> >>>>     > > >>> > >>  String seedValue = "seedValue";
> >>>>     > > >>> > >>  String fieldId = "fieldId";
> >>>>     > > >>> > >>  LogicalExpression seed =
> >>>>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
> >>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
> >>>>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
> >>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
> >>>>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
> >>>>     > > cg.addExpr(fieldIdParamExpr);
> >>>>     > > >>> > >>  int i = 0;
> >>>>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
> >>>>     > > >>> > >>    TypedFieldId targetTypeFieldId =
> >>>> buildKeyFieldIds[i];
> >>>>     > > >>> > >>    ValueExpressions.IntExpression
> >>>> targetBuildFieldIdExp = new
> >>>>     > > >>> > >>
> >>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
> >>>>     > > )[0],
> >>>>     > > >>> > >> ExpressionPosition.UNKNOWN);
> >>>>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
> >>>>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
> >>>>     > > >>> > >>
> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
> >>>>     > > >>> > >>    JBlock ifBlock =
> >>>>     > > >>> > >>
> >>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
> >>>>     > > >>> > >> tBuildSideFieldId))._then();
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>    LogicalExpression hashExpression =
> >>>>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed,
> >>>> incomingProbe !=
> >>>>     > > null);
> >>>>     > > >>> > >>    LogicalExpression materializedExpr =
> >>>>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
> >>>>     > > hashExpression,
> >>>>     > > >>> > >> batch, context.getFunctionRegistry());
> >>>>     > > >>> > >>    HoldingContainer hash =
> cg.addExpr(materializedExpr,
> >>>>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>
> >>>>     > > >>> > >>    ifBlock._return(hash.getValue());
> >>>>     > > >>> > >>    i++;
> >>>>     > > >>> > >>  }
> >>>>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
> >>>>     > > >>> > >>
> >>>>     > > >>> > >> }
> >>>>     > > >>> > >>
> >>>>     > > >>> > >> But unfortunately, the generated codes are not what I
> >>>> expected.
> >>>>     > > The
> >>>>     > > >>> > codes
> >>>>     > > >>> > >> to read ValueVector , calculate hash code of the read
> >>>> value do
> >>>>     > not
> >>>>     > > >>> stay
> >>>>     > > >>> > in
> >>>>     > > >>> > >> the if block.  So how can I let the related codes
> stay
> >>>> in the if
> >>>>     > > >>> block ?
> >>>>     > > >>> > >>
> >>>>     > > >>> > >
> >>>>     > > >>> > >
> >>>>     > > >>> >
> >>>>     > > >>
> >>>>     > > >>
> >>>>     > >
> >>>>     >
> >>>>
> >>>>
> >>>>
>

Re: How to generate hash code for each build side one of the hash join columns

Posted by Sorabh Hamirwasia <sh...@mapr.com>.

Your understanding about SelectionVector2 is correct. Internally it holds a DrillBuff which stores all the indexes of interest.


The reason why setup of SV2 is done only once in ProjectTemplate::setup is because for subsequent next() call's from Project to Filter, each new incoming will have same object reference of SV2 java object. Only the underlying buffer for the SV2 object will change to store new indexes for new incoming batch. It's only after OK_NEW_SCHEMA outcome is seen when setup will be again called.


Thanks,
Sorabh


________________________________
From: weijie tong <to...@gmail.com>
Sent: Friday, June 1, 2018 7:27 AM
To: dev@drill.apache.org
Subject: Re: How to generate hash code for each build side one of the hash join columns

Could someone explain the theory of SelectionVector2 ? I was confused about
the code implementation. I think it acts as an indirect index to the
filtered RecordBatch. To FilterRecordBatch, it filters the RecordBatch and
the satisfied row index will be storied in the SelectionVector2. To
ProjectRecordBatch, it uses the incoming RecordBatch's SelectionVector2's
index to access the filtered RecordBatch data. The operated memory batch
between the Filter and Project are the same by leverage the
SelectionVector2 to access actual data.  If I was right ,then I found a
confused case that the SelectionVector2 of ProjectRecordBatch was
initialized only one time at ProjectTemplate's setup method.  Hope someone
could give a explanation why ProjectTemplate not get a fresh
SelectionVector2 of incoming batch (e.g. generated from the
FilterRecordBatch) every next call.

On Fri, Jun 1, 2018 at 5:14 PM weijie tong <to...@gmail.com> wrote:

> I find the answer that RecordBatch's max size is 2^16 which is defined at
> RecordBatch's MAX_BATCH_SIZE.
>
> On Fri, Jun 1, 2018 at 3:36 PM weijie tong <to...@gmail.com>
> wrote:
>
>> Some questions about SelectionVector2 and SelectionVector4:
>>
>> I want to create SelectionVector4 or SelectionVector2 to represent the
>> filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does
>> not support SelectVector4 . And the SelectionVector2's record count size is
>> char type size .  So why SelectionVector4 is not supported by the
>> ProjectBatch ? The same question is to the FilterBatch's SelectVector2
>> which also only support the 2 Byte size record count.
>>
>> On Fri, Jun 1, 2018 at 1:40 PM weijie tong <to...@gmail.com>
>> wrote:
>>
>>> Hi Boaz:
>>>
>>>   Your propose is valuable though I have implemented the dynamic
>>> generating code logic.  If a  ``` long hash64(int index, long seed) ```
>>> method is added to the ValueVector , it will also benefit others to
>>> implement specific storage plugin's filter logic by using the pushed down
>>> bloom filter.  To HashJoin and HashAggregate , methods ```double
>>> hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
>>> seed)```  will also be needed to the ValueVector.  If no one else gives
>>> objection , I will be pleasure to take this work.
>>>
>>>    Btw, I will share my thought about the scan side's filter logic by
>>> the BloomFilter. The scan side filter logic here I supposed to do is to
>>> filter the materialized ValueVector ,not at the process to construct the
>>> ValueVector from the original storage format data. The reason is the
>>> checking logic will break down the performance to materialize the original
>>> deep storage format data to ValueVector.
>>>
>>> On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <bb...@mapr.com> wrote:
>>>
>>>>  Hi Weijie,
>>>>
>>>>     Another option is to totally avoid the generated code.
>>>> We were considering the idea of replacing the generated code used for
>>>> computing hash values with “real java” code.
>>>>
>>>> This idea is analogous to the usage of the copyEntry() method in the
>>>> ValueVector interface (that Paul added last year).
>>>> See an example of using the copyEntry() (via the appendRow() in
>>>> VectorContainer) in the new Hash-Join-Spill code.
>>>> Basically no need to generate “type specific” code, as the virtual
>>>> copyEntry() method does the “type specific” work.
>>>>
>>>> Similarly we could have a hash64() method in ValueVector, which would
>>>> perform the “type specific” computation.
>>>> (One difference from copyEntry() – the hash64() would also need to take
>>>> the “seed” parameter, which is the hash value produced by the previous
>>>> hash).
>>>> And similar to appendRow(), there would be evalHash() iterating over
>>>> the key columns.
>>>> (And one difference from appendRow() – need to iterate only on the key
>>>> columns; these are the first columns; their number can be found from the
>>>> config: e.g., htConfig.getKeyExprsBuild().size() )
>>>>
>>>>    With such implementation, that evalHash() could be used anywhere
>>>> (e.g., to match the Bloom filters on the left side of the join).
>>>>
>>>>        Thanks,
>>>>
>>>>              Boaz
>>>>
>>>>
>>>> On 5/30/18, 7:49 PM, "weijie tong" <to...@gmail.com> wrote:
>>>>
>>>>     Hi Aman:
>>>>
>>>>       Thanks for your tips. I have rebased the latest code from the
>>>> master
>>>>     branch . Yes, the spill-to-disk feature does changed the original
>>>>     implementation. I have adjusted my implementation according to the
>>>> new
>>>>     feature. But as you say, it will take some challenge to integration
>>>> as I
>>>>     noticed the spill-to-disk feature will continue to tune its
>>>> implementation
>>>>     performance.
>>>>
>>>>       The BloomFilter was implemented natively in Drill , not an
>>>> external
>>>>     library. It's implemented the algorithm of the paper which was
>>>> mentioned by
>>>>     you.
>>>>
>>>>
>>>>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <am...@apache.org>
>>>> wrote:
>>>>
>>>>     > Hi Weijie,
>>>>     > I was hoping you could leverage the existing methods..so its good
>>>> that you
>>>>     > found the ones that work for your use case.
>>>>     > One thing I want to point out (maybe you're already aware) .. the
>>>> Hash Join
>>>>     > code has changed significantly in the master branch due to the
>>>>     > spill-to-disk feature.
>>>>     > So, this may pose some integration challenges for your run-time
>>>> join
>>>>     > pushdown feature.
>>>>     > Also, one other question/clarification:  for the bloom filter
>>>> itself are
>>>>     > you implementing it natively in Drill or using an external
>>>> library ?
>>>>     >
>>>>     > -Aman
>>>>     >
>>>>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
>>>> tongweijie178@gmail.com>
>>>>     > wrote:
>>>>     >
>>>>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>>>>     > unNestEvalBlock()
>>>>     > > which has the same effect to what I change to the
>>>> ClassGenerator. So I
>>>>     > give
>>>>     > > up what I change to the ClassGenerator and hope this can help
>>>> someone
>>>>     > else.
>>>>     > >
>>>>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
>>>> tongweijie178@gmail.com>
>>>>     > > wrote:
>>>>     > >
>>>>     > > > The code formatting is not nice. Put them again:
>>>>     > > >
>>>>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>>     > > MappingSet
>>>>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
>>>> keyExprs,
>>>>     > > > TypedFieldId[] buildKeyFieldIds)
>>>>     > > > throws SchemaChangeException {
>>>>     > > > cg.setMappingSet(incomingMapping);
>>>>     > > > if (keyExprs == null || keyExprs.length == 0) {
>>>>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > > }
>>>>     > > > String seedValue = "seedValue";
>>>>     > > > String fieldId = "fieldId";
>>>>     > > > LogicalExpression seed =
>>>>     > > > ValueExpressions.getParameterExpression(seedValue,
>>>> Types.required(
>>>>     > > > TypeProtos.MinorType.INT));
>>>>     > > >
>>>>     > > > LogicalExpression fieldIdParamExpr =
>>>>     > > > ValueExpressions.getParameterExpression(fieldId,
>>>> Types.required(
>>>>     > > > TypeProtos.MinorType.INT) );
>>>>     > > > HoldingContainer fieldIdParamHolder =
>>>> cg.addExpr(fieldIdParamExpr);
>>>>     > > > int i = 0;
>>>>     > > >  for (LogicalExpression expr : keyExprs) {
>>>>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp =
>>>> new
>>>>     > > >
>>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>>>     > > > ExpressionPosition.UNKNOWN);
>>>>     > > >
>>>>     > > >     JFieldRef targetBuildSideFieldId =
>>>>     > cg.addExpr(targetBuildFieldIdExp,
>>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>>     > > >     JBlock ifBlock =
>>>>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>>>>     > > eq(targetBuildSideFieldId))._then();
>>>>     > > >     //specify a special JBlock which is a inner one of the
>>>> eval block
>>>>     > to
>>>>     > > > the ClassGenerator to substitute the returned JBlock of
>>>> getEvalBlock()
>>>>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
>>>>     > > >     LogicalExpression hashExpression =
>>>>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
>>>> null);
>>>>     > > >     LogicalExpression materializedExpr =
>>>>     > > >
>>>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>>>>     > > batch,
>>>>     > > > context.getFunctionRegistry());
>>>>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
>>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>>>     > > >     ifBlock._return(hash.getValue());
>>>>     > > >     //reset the customized block to null ,so the
>>>> getEvalBlock() return
>>>>     > > the
>>>>     > > > truly eval JBlock
>>>>     > > >     cg.setCustomizedEvalInnerBlock(null);
>>>>     > > >     i++;
>>>>     > > >  }
>>>>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > > }
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
>>>> seedValue,
>>>>     > > int
>>>>     > > > fieldId)
>>>>     > > > throws SchemaChangeException
>>>>     > > > {
>>>>     > > > {
>>>>     > > > IntHolder fieldId12 = new IntHolder();
>>>>     > > > fieldId12 .value = fieldId;
>>>>     > > > if (fieldId12 .value == constant14 .value) {
>>>>     > > >    IntHolder out18 = new IntHolder();
>>>>     > > >    {
>>>>     > > >      out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>>>>     > > >    }
>>>>     > > >    IntHolder seedValue19 = new IntHolder();
>>>>     > > >    seedValue19 .value = seedValue;
>>>>     > > >    //---- start of eval portion of hash32AsDouble function.
>>>> ----//
>>>>     > > >    IntHolder out20 = new IntHolder();
>>>>     > > >   {
>>>>     > > >       final IntHolder out = new IntHolder();
>>>>     > > >       IntHolder in = out18;
>>>>     > > >       IntHolder seed = seedValue19;
>>>>     > > >
>>>>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
>>>>     > > >       out.value =
>>>>     > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>> in.value,
>>>>     > > > seed.value);
>>>>     > > >    }
>>>>     > > >
>>>>     > > >    out20 = out;
>>>>     > > > }
>>>>     > > > //---- end of eval portion of hash32AsDouble function. ----//
>>>>     > > > return out20 .value;
>>>>     > > > }
>>>>     > > > return 0;
>>>>     > > > }
>>>>     > > > }
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
>>>> tongweijie178@gmail.com>
>>>>     > > > wrote:
>>>>     > > >
>>>>     > > >> HI Paul:
>>>>     > > >>
>>>>     > > >>  Thanks for your enthusiasm. I have managed this skill as
>>>> you ever
>>>>     > > >> mentioned me at another mail thread. It's really helpful
>>>> ,thanks for
>>>>     > > your
>>>>     > > >> valuable work.
>>>>     > > >>
>>>>     > > >>   Now I have solved this tough problem by adding a
>>>> customized JBlock
>>>>     > > >> member field to the ClassGenerator. So once you want the
>>>>     > getEvalBlock()
>>>>     > > of
>>>>     > > >> the ClassGenerator to return a inner customized JBlock ,
>>>> then you set
>>>>     > > this
>>>>     > > >> member, if you want the method to return eval self JBlock ,
>>>> you reset
>>>>     > > this
>>>>     > > >> member to null.
>>>>     > > >>
>>>>     > > >>   Here is my changed setup method :
>>>>     > > >>
>>>>     > > >>
>>>>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable>
>>>> cg,
>>>>     > > MappingSet incomingMapping, VectorAccessible batch,
>>>> LogicalExpression[]
>>>>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>>>>     > > >>   throws SchemaChangeException {
>>>>     > > >>   cg.setMappingSet(incomingMapping);
>>>>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
>>>>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > >>   }
>>>>     > > >>   String seedValue = "seedValue";
>>>>     > > >>   String fieldId = "fieldId";
>>>>     > > >>   LogicalExpression seed =
>>>>     > ValueExpressions.getParameterExpression(seedValue,
>>>>     > > Types.required(TypeProtos.MinorType.INT));
>>>>     > > >>
>>>>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
>>>>     > > getParameterExpression(fieldId, Types.required(
>>>> TypeProtos.MinorType.INT)
>>>>     > > );
>>>>     > > >>   HoldingContainer fieldIdParamHolder =
>>>> cg.addExpr(fieldIdParamExpr);
>>>>     > > >>   int i = 0;
>>>>     > > >>   for (LogicalExpression expr : keyExprs) {
>>>>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp =
>>>> new
>>>>     > >
>>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>>>     > > ExpressionPosition.UNKNOWN);
>>>>     > > >>
>>>>     > > >>     JFieldRef targetBuildSideFieldId =
>>>>     > cg.addExpr(targetBuildFieldIdExp,
>>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
>>>>     > >
>>>> fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>>>>     > > >>     //specify a special JBlock which is a inner one of the
>>>> eval block
>>>>     > > to the ClassGenerator to substitute the returned JBlock of
>>>> getEvalBlock()
>>>>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
>>>>     > > >>     LogicalExpression hashExpression =
>>>>     > HashPrelUtil.getHashExpression(expr,
>>>>     > > seed, incomingProbe != null);
>>>>     > > >>     LogicalExpression materializedExpr =
>>>> ExpressionTreeMaterializer.
>>>>     > > materializeAndCheckErrors(hashExpression, batch,
>>>>     > > context.getFunctionRegistry());
>>>>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
>>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>>>     > > >>     ifBlock._return(hash.getValue());
>>>>     > > >>     //reset the customized block to null ,so the
>>>> getEvalBlock() return
>>>>     > > the truly eval JBlock
>>>>     > > >>     cg.setCustomizedEvalInnerBlock(null);
>>>>     > > >>     i++;
>>>>     > > >>   }
>>>>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > >> }
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>  The corresponding generated codes :
>>>>     > > >>
>>>>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx,
>>>> int
>>>>     > > seedValue, int fieldId)
>>>>     > > >>         throws SchemaChangeException
>>>>     > > >>     {
>>>>     > > >>         {
>>>>     > > >>             IntHolder fieldId12 = new IntHolder();
>>>>     > > >>             fieldId12 .value = fieldId;
>>>>     > > >>             if (fieldId12 .value == constant14 .value) {
>>>>     > > >>                 IntHolder out18 = new IntHolder();
>>>>     > > >>                 {
>>>>     > > >>                     out18 .value = vv15 .getAccessor().get((
>>>>     > > incomingRowIdx));
>>>>     > > >>                 }
>>>>     > > >>                 IntHolder seedValue19 = new IntHolder();
>>>>     > > >>                 seedValue19 .value = seedValue;
>>>>     > > >>                 //---- start of eval portion of
>>>> hash32AsDouble
>>>>     > > function. ----//
>>>>     > > >>                 IntHolder out20 = new IntHolder();
>>>>     > > >>                 {
>>>>     > > >>                     final IntHolder out = new IntHolder();
>>>>     > > >>                     IntHolder in = out18;
>>>>     > > >>                     IntHolder seed = seedValue19;
>>>>     > > >>
>>>>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>>>>     > > >>     out.value =
>>>>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>>     > > in.value, seed.value);
>>>>     > > >> }
>>>>     > > >>
>>>>     > > >>                     out20 = out;
>>>>     > > >>                 }
>>>>     > > >>                 //---- end of eval portion of hash32AsDouble
>>>> function.
>>>>     > > ----//
>>>>     > > >>                 return out20 .value;
>>>>     > > >>             }
>>>>     > > >>             return  0;
>>>>     > > >>         }
>>>>     > > >>     }
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>   Some other explanation:
>>>>     > > >>   1st : The if checking won't hurt the performance , as I
>>>> invoke this
>>>>     > > >> method column by column , so it's branch predication
>>>> friendly.
>>>>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ，since
>>>> the
>>>>     > efficient
>>>>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid
>>>> the
>>>>     > conflict.
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
>>>>     > <par0328@yahoo.com.invalid
>>>>     > > >
>>>>     > > >> wrote:
>>>>     > > >>
>>>>     > > >>> Hi Weijie,
>>>>     > > >>>
>>>>     > > >>> Seeing the discussion about the details of JCodeModel
>>>> suggests you
>>>>     > may
>>>>     > > >>> be trying to debug your generated code at the level of the
>>>> code
>>>>     > > generator.
>>>>     > > >>>
>>>>     > > >>> Some time ago we added the ability to step through the
>>>> generated
>>>>     > code.
>>>>     > > >>> Look for the following line in the generator code:
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>     // Uncomment out this line to debug the generated code.
>>>>     > > >>>
>>>>     > > >>> //    cg.saveCodeForDebugging(true);
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>> Uncomment the code line and Drill will save each generated
>>>> file to a
>>>>     > > >>> configured location (which, if I recall correctly, is
>>>>     > > /tmp/drill/codegen,
>>>>     > > >>> though it may have changed after Tim's test directory
>>>> changes.)
>>>>     > > >>>
>>>>     > > >>> Then, set a breakpoint in the template setup() method and
>>>> you can
>>>>     > step
>>>>     > > >>> directly into the generated doSetup() method. Same for the
>>>> eval()
>>>>     > > method.
>>>>     > > >>>
>>>>     > > >>> This way, you can not only see the generated code, you can
>>>> step
>>>>     > through
>>>>     > > >>> it. I've found this to be a far easier way to understand the
>>>>     > generated
>>>>     > > code
>>>>     > > >>> than the older techniques folks have used (look at byte
>>>> codes, use
>>>>     > > print
>>>>     > > >>> statements, brute force reasoning, etc.)
>>>>     > > >>>
>>>>     > > >>> Tim, Boaz and others have used this technique more recently
>>>> and can
>>>>     > > >>> probably give you additional pointers.
>>>>     > > >>>
>>>>     > > >>> Thanks,
>>>>     > > >>> - Paul
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
>>>>     > > >>> tongweijie178@gmail.com> wrote:
>>>>     > > >>>
>>>>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you need
>>>> an
>>>>     > _else()
>>>>     > > >>> block
>>>>     > > >>> also ?"  I give a default return logic at the method, so I
>>>> don't need
>>>>     > > the
>>>>     > > >>> _else() block.  I have noticed the IfExpression's
>>>> evaluation method
>>>>     > at
>>>>     > > >>> EvaluationVisitor which also uses the JConditional . But
>>>> that also
>>>>     > > >>> doesn't
>>>>     > > >>> match my requirement. I think the key point here is the
>>>>     > > >>> FunctionHolderExpression and ValueVectorReadExpression will
>>>> put their
>>>>     > > >>> corresponding generated codes to the eval method's JBlock ,
>>>> not our
>>>>     > > >>> specific IfBlock which is a inner block of the eval
>>>> method's JBlock .
>>>>     > > >>>
>>>>     > > >>> So it seems I should make some changes to the
>>>> ClassGenerator to let
>>>>     > the
>>>>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
>>>> JConditional's
>>>>     > > then
>>>>     > > >>> block) or implement some special FunctionHolderExpression
>>>>     > > >>> 、ValueVectorReadExpression and corresponding visiting
>>>> methods at the
>>>>     > > >>> EvaluationVisitor to generate the special code blocks. Hope
>>>> someone
>>>>     > who
>>>>     > > >>> are
>>>>     > > >>> familiar with these part of codes to point out whether
>>>> there are more
>>>>     > > >>> easy
>>>>     > > >>> or different choices to achieve the target.
>>>>     > > >>>
>>>>     > > >>> To make discussion more accurate, I put the generated codes
>>>> of the
>>>>     > > >>> previous
>>>>     > > >>> setupGetBuild64Hash method here:
>>>>     > > >>>
>>>>     > > >>>     public long getBuild64HashCodeInner(int incomingRowIdx,
>>>> int
>>>>     > > >>> seedValue, int fieldId)
>>>>     > > >>>         throws SchemaChangeException
>>>>     > > >>>     {
>>>>     > > >>>         {
>>>>     > > >>>             IntHolder fieldId16 = new IntHolder();
>>>>     > > >>>             fieldId16 .value = fieldId;
>>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>>>     > > >>>                 return out24 .value;
>>>>     > > >>>             }
>>>>     > > >>>             IntHolder out22 = new IntHolder();
>>>>     > > >>>             {
>>>>     > > >>>                 out22 .value = vv19 .getAccessor().get((
>>>>     > > incomingRowIdx));
>>>>     > > >>>             }
>>>>     > > >>>             IntHolder seedValue23 = new IntHolder();
>>>>     > > >>>             seedValue23 .value = seedValue;
>>>>     > > >>>             //---- start of eval portion of hash32AsDouble
>>>> function.
>>>>     > > >>> ----//
>>>>     > > >>>             IntHolder out24 = new IntHolder();
>>>>     > > >>>             {
>>>>     > > >>>                 final IntHolder out = new IntHolder();
>>>>     > > >>>                 IntHolder in = out22;
>>>>     > > >>>                 IntHolder seed = seedValue23;
>>>>     > > >>>
>>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>>>     > > >>>     out.value =
>>>>     > > >>>
>>>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>>     > > >>> in.value, seed.value);
>>>>     > > >>> }
>>>>     > > >>>
>>>>     > > >>>                 out24 = out;
>>>>     > > >>>             }
>>>>     > > >>>             //---- end of eval portion of hash32AsDouble
>>>> function.
>>>>     > > ----//
>>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>>>     > > >>>                 return out26 .value;
>>>>     > > >>>             }
>>>>     > > >>>             IntHolder seedValue25 = new IntHolder();
>>>>     > > >>>             seedValue25 .value = seedValue;
>>>>     > > >>>             //---- start of eval portion of hash32AsDouble
>>>> function.
>>>>     > > >>> ----//
>>>>     > > >>>             IntHolder out26 = new IntHolder();
>>>>     > > >>>             {
>>>>     > > >>>                 final IntHolder out = new IntHolder();
>>>>     > > >>>                 IntHolder in = out22;
>>>>     > > >>>                 IntHolder seed = seedValue25;
>>>>     > > >>>
>>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>>>     > > >>>     out.value =
>>>>     > > >>>
>>>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>>     > > >>> in.value, seed.value);
>>>>     > > >>> }
>>>>     > > >>>
>>>>     > > >>>                 out26 = out;
>>>>     > > >>>             }
>>>>     > > >>>             //---- end of eval portion of hash32AsDouble
>>>> function.
>>>>     > > ----//
>>>>     > > >>>             return  0;
>>>>     > > >>>         }
>>>>     > > >>>     }
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
>>>> amansinha@apache.org>
>>>>     > > >>> wrote:
>>>>     > > >>>
>>>>     > > >>> > sorry, the previous email is incomplete.
>>>>     > > >>> > For the ifBlock, do you need an _else() block also ?
>>>>     > > >>> >
>>>>     > > >>> > I have sometimes found that 'JConditional' is a good way
>>>> to break
>>>>     > > down
>>>>     > > >>> the
>>>>     > > >>> > logic further.  Please see example usages of JConditional
>>>> here [1].
>>>>     > > >>> >
>>>>     > > >>> > -Aman
>>>>     > > >>> >
>>>>     > > >>> > [1]
>>>>     > > >>> >
>>>>     > > >>> >
>>>>     > > >>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
>>>> .
>>>>     > > sun.codemodel.JBlock
>>>>     > > >>> >
>>>>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
>>>> amansinha@apache.org>
>>>>     > > >>> wrote:
>>>>     > > >>> >
>>>>     > > >>> > > Hi Weijie,
>>>>     > > >>> > > It would be a little cumbersome to debug such issues
>>>> over email
>>>>     > > >>> since one
>>>>     > > >>> > > has to look at the generated code output and
>>>> iteratively debug.
>>>>     > > >>> > > Couple of thoughts I have that might help:
>>>>     > > >>> > >
>>>>     > > >>> > > For this particular if-then block, should you also
>>>>     > > >>> > > JBlock ifBlock =
>>>>     > > >>> > >
>>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>>>     > > >>> > > tBuildSideFieldId))._then();
>>>>     > > >>> > >
>>>>     > > >>> > >
>>>>     > > >>> > >
>>>>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
>>>>     > > >>> tongweijie178@gmail.com>
>>>>     > > >>> > > wrote:
>>>>     > > >>> > >
>>>>     > > >>> > >> HI All:
>>>>     > > >>> > >>  Through implementing the JPPD feature (
>>>>     > > >>> > >>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
>>>> , I was
>>>>     > blocked
>>>>     > > >>> by
>>>>     > > >>> > the
>>>>     > > >>> > >> problem: how to get the hash code of each build side
>>>> of the hash
>>>>     > > >>> join
>>>>     > > >>> > >> columns through the dynamic generated java code. Hope
>>>> someone
>>>>     > can
>>>>     > > >>> give
>>>>     > > >>> > >> some
>>>>     > > >>> > >> advice.
>>>>     > > >>> > >>
>>>>     > > >>> > >>    I supposed to add methods as below to the
>>>> HashTableTemplate :
>>>>     > > >>> > >>
>>>>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int
>>>>     > seedValue,
>>>>     > > >>> int
>>>>     > > >>> > >> fieldId) throws SchemaChangeException{
>>>>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
>>>> seedValue,
>>>>     > > >>> fieldId);
>>>>     > > >>> > >> }
>>>>     > > >>> > >>
>>>>     > > >>> > >> protected abstract long
>>>>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int
>>>>     > > incomingRowIdx,
>>>>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId")
>>>> int
>>>>     > fieldId)
>>>>     > > >>> > >> throws SchemaChangeException;
>>>>     > > >>> > >>
>>>>     > > >>> > >>
>>>>     > > >>> > >>    The high level code to invoke the
>>>> getBuild64HashCode method
>>>>     > is
>>>>     > > >>> at the
>>>>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
>>>>     > > >>> > >>
>>>>     > > >>> > >> //create runtime filter
>>>>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) {
>>>>     > > >>> > >>  //create runtime filter and send out async
>>>>     > > >>> > >>  int condFieldIndex = 0;
>>>>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters) {
>>>>     > > >>> > >>    //VV
>>>>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount; ind++) {
>>>>     > > >>> > >>      long hashCode =
>>>> partitions[0].getBuild64HashCode(ind,
>>>>     > > >>> > >> condFieldIndex);
>>>>     > > >>> > >>      bloomFilter.insert(hashCode);
>>>>     > > >>> > >>    }
>>>>     > > >>> > >>    condFieldIndex++;
>>>>     > > >>> > >>  }
>>>>     > > >>> > >>  //TODO sered out async
>>>>     > > >>> > >> }
>>>>     > > >>> > >>
>>>>     > > >>> > >>
>>>>     > > >>> > >>  As you know, the abstract method
>>>> getBuild64HashCodeInner needs
>>>>     > to
>>>>     > > >>> > >> calculate the hash codes of each build side column by
>>>> the
>>>>     > fieldId
>>>>     > > >>> input
>>>>     > > >>> > >> parameter. In order to achieve this target, I plan to
>>>> have
>>>>     > > different
>>>>     > > >>> > >> solving parts corresponding to different column
>>>> ValueVector ,
>>>>     > > using
>>>>     > > >>> the
>>>>     > > >>> > if
>>>>     > > >>> > >> statement to distinguish different solving parts
>>>> through the id
>>>>     > of
>>>>     > > >>> the
>>>>     > > >>> > >> column.  The corresponding method to generate the
>>>> dynamic codes
>>>>     > > is
>>>>     > > >>> as
>>>>     > > >>> > >> below:
>>>>     > > >>> > >>
>>>>     > > >>> > >> private void
>>>> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>>     > > >>> > >> MappingSet incomingMapping, VectorAccessible batch,
>>>>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
>>>> buildKeyFieldIds)
>>>>     > > >>> > >>  throws SchemaChangeException {
>>>>     > > >>> > >>  cg.setMappingSet(incomingMapping);
>>>>     > > >>> > >>  if (keyExprs == null || keyExprs.length == 0) {
>>>>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > >>> > >>  }
>>>>     > > >>> > >>  String seedValue = "seedValue";
>>>>     > > >>> > >>  String fieldId = "fieldId";
>>>>     > > >>> > >>  LogicalExpression seed =
>>>>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>>>>     > > >>> > >>
>>>>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
>>>>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
>>>>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
>>>>     > > cg.addExpr(fieldIdParamExpr);
>>>>     > > >>> > >>  int i = 0;
>>>>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
>>>>     > > >>> > >>    TypedFieldId targetTypeFieldId =
>>>> buildKeyFieldIds[i];
>>>>     > > >>> > >>    ValueExpressions.IntExpression
>>>> targetBuildFieldIdExp = new
>>>>     > > >>> > >>
>>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>>>>     > > )[0],
>>>>     > > >>> > >> ExpressionPosition.UNKNOWN);
>>>>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
>>>>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>>>>     > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>>     > > >>> > >>    JBlock ifBlock =
>>>>     > > >>> > >>
>>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>>>     > > >>> > >> tBuildSideFieldId))._then();
>>>>     > > >>> > >>
>>>>     > > >>> > >>    LogicalExpression hashExpression =
>>>>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed,
>>>> incomingProbe !=
>>>>     > > null);
>>>>     > > >>> > >>    LogicalExpression materializedExpr =
>>>>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>>>>     > > hashExpression,
>>>>     > > >>> > >> batch, context.getFunctionRegistry());
>>>>     > > >>> > >>    HoldingContainer hash = cg.addExpr(materializedExpr,
>>>>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>>>>     > > >>> > >>
>>>>     > > >>> > >>
>>>>     > > >>> > >>    ifBlock._return(hash.getValue());
>>>>     > > >>> > >>    i++;
>>>>     > > >>> > >>  }
>>>>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > >>> > >>
>>>>     > > >>> > >> }
>>>>     > > >>> > >>
>>>>     > > >>> > >> But unfortunately, the generated codes are not what I
>>>> expected.
>>>>     > > The
>>>>     > > >>> > codes
>>>>     > > >>> > >> to read ValueVector , calculate hash code of the read
>>>> value do
>>>>     > not
>>>>     > > >>> stay
>>>>     > > >>> > in
>>>>     > > >>> > >> the if block.  So how can I let the related codes stay
>>>> in the if
>>>>     > > >>> block ?
>>>>     > > >>> > >>
>>>>     > > >>> > >
>>>>     > > >>> > >
>>>>     > > >>> >
>>>>     > > >>
>>>>     > > >>
>>>>     > >
>>>>     >
>>>>
>>>>
>>>>

Re: How to generate hash code for each build side one of the hash join columns

Posted by weijie tong <to...@gmail.com>.

Could someone explain the theory of SelectionVector2 ? I was confused about
the code implementation. I think it acts as an indirect index to the
filtered RecordBatch. To FilterRecordBatch, it filters the RecordBatch and
the satisfied row index will be storied in the SelectionVector2. To
ProjectRecordBatch, it uses the incoming RecordBatch's SelectionVector2's
index to access the filtered RecordBatch data. The operated memory batch
between the Filter and Project are the same by leverage the
SelectionVector2 to access actual data.  If I was right ,then I found a
confused case that the SelectionVector2 of ProjectRecordBatch was
initialized only one time at ProjectTemplate's setup method.  Hope someone
could give a explanation why ProjectTemplate not get a fresh
SelectionVector2 of incoming batch (e.g. generated from the
FilterRecordBatch) every next call.

On Fri, Jun 1, 2018 at 5:14 PM weijie tong <to...@gmail.com> wrote:

> I find the answer that RecordBatch's max size is 2^16 which is defined at
> RecordBatch's MAX_BATCH_SIZE.
>
> On Fri, Jun 1, 2018 at 3:36 PM weijie tong <to...@gmail.com>
> wrote:
>
>> Some questions about SelectionVector2 and SelectionVector4:
>>
>> I want to create SelectionVector4 or SelectionVector2 to represent the
>> filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does
>> not support SelectVector4 . And the SelectionVector2's record count size is
>> char type size .  So why SelectionVector4 is not supported by the
>> ProjectBatch ? The same question is to the FilterBatch's SelectVector2
>> which also only support the 2 Byte size record count.
>>
>> On Fri, Jun 1, 2018 at 1:40 PM weijie tong <to...@gmail.com>
>> wrote:
>>
>>> Hi Boaz:
>>>
>>>   Your propose is valuable though I have implemented the dynamic
>>> generating code logic.  If a  ``` long hash64(int index, long seed) ```
>>> method is added to the ValueVector , it will also benefit others to
>>> implement specific storage plugin's filter logic by using the pushed down
>>> bloom filter.  To HashJoin and HashAggregate , methods ```double
>>> hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
>>> seed)```  will also be needed to the ValueVector.  If no one else gives
>>> objection , I will be pleasure to take this work.
>>>
>>>    Btw, I will share my thought about the scan side's filter logic by
>>> the BloomFilter. The scan side filter logic here I supposed to do is to
>>> filter the materialized ValueVector ,not at the process to construct the
>>> ValueVector from the original storage format data. The reason is the
>>> checking logic will break down the performance to materialize the original
>>> deep storage format data to ValueVector.
>>>
>>> On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <bb...@mapr.com> wrote:
>>>
>>>>  Hi Weijie,
>>>>
>>>>     Another option is to totally avoid the generated code.
>>>> We were considering the idea of replacing the generated code used for
>>>> computing hash values with “real java” code.
>>>>
>>>> This idea is analogous to the usage of the copyEntry() method in the
>>>> ValueVector interface (that Paul added last year).
>>>> See an example of using the copyEntry() (via the appendRow() in
>>>> VectorContainer) in the new Hash-Join-Spill code.
>>>> Basically no need to generate “type specific” code, as the virtual
>>>> copyEntry() method does the “type specific” work.
>>>>
>>>> Similarly we could have a hash64() method in ValueVector, which would
>>>> perform the “type specific” computation.
>>>> (One difference from copyEntry() – the hash64() would also need to take
>>>> the “seed” parameter, which is the hash value produced by the previous
>>>> hash).
>>>> And similar to appendRow(), there would be evalHash() iterating over
>>>> the key columns.
>>>> (And one difference from appendRow() – need to iterate only on the key
>>>> columns; these are the first columns; their number can be found from the
>>>> config: e.g., htConfig.getKeyExprsBuild().size() )
>>>>
>>>>    With such implementation, that evalHash() could be used anywhere
>>>> (e.g., to match the Bloom filters on the left side of the join).
>>>>
>>>>        Thanks,
>>>>
>>>>              Boaz
>>>>
>>>>
>>>> On 5/30/18, 7:49 PM, "weijie tong" <to...@gmail.com> wrote:
>>>>
>>>>     Hi Aman:
>>>>
>>>>       Thanks for your tips. I have rebased the latest code from the
>>>> master
>>>>     branch . Yes, the spill-to-disk feature does changed the original
>>>>     implementation. I have adjusted my implementation according to the
>>>> new
>>>>     feature. But as you say, it will take some challenge to integration
>>>> as I
>>>>     noticed the spill-to-disk feature will continue to tune its
>>>> implementation
>>>>     performance.
>>>>
>>>>       The BloomFilter was implemented natively in Drill , not an
>>>> external
>>>>     library. It's implemented the algorithm of the paper which was
>>>> mentioned by
>>>>     you.
>>>>
>>>>
>>>>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <am...@apache.org>
>>>> wrote:
>>>>
>>>>     > Hi Weijie,
>>>>     > I was hoping you could leverage the existing methods..so its good
>>>> that you
>>>>     > found the ones that work for your use case.
>>>>     > One thing I want to point out (maybe you're already aware) .. the
>>>> Hash Join
>>>>     > code has changed significantly in the master branch due to the
>>>>     > spill-to-disk feature.
>>>>     > So, this may pose some integration challenges for your run-time
>>>> join
>>>>     > pushdown feature.
>>>>     > Also, one other question/clarification:  for the bloom filter
>>>> itself are
>>>>     > you implementing it natively in Drill or using an external
>>>> library ?
>>>>     >
>>>>     > -Aman
>>>>     >
>>>>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
>>>> tongweijie178@gmail.com>
>>>>     > wrote:
>>>>     >
>>>>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>>>>     > unNestEvalBlock()
>>>>     > > which has the same effect to what I change to the
>>>> ClassGenerator. So I
>>>>     > give
>>>>     > > up what I change to the ClassGenerator and hope this can help
>>>> someone
>>>>     > else.
>>>>     > >
>>>>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
>>>> tongweijie178@gmail.com>
>>>>     > > wrote:
>>>>     > >
>>>>     > > > The code formatting is not nice. Put them again:
>>>>     > > >
>>>>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>>     > > MappingSet
>>>>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
>>>> keyExprs,
>>>>     > > > TypedFieldId[] buildKeyFieldIds)
>>>>     > > > throws SchemaChangeException {
>>>>     > > > cg.setMappingSet(incomingMapping);
>>>>     > > > if (keyExprs == null || keyExprs.length == 0) {
>>>>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > > }
>>>>     > > > String seedValue = "seedValue";
>>>>     > > > String fieldId = "fieldId";
>>>>     > > > LogicalExpression seed =
>>>>     > > > ValueExpressions.getParameterExpression(seedValue,
>>>> Types.required(
>>>>     > > > TypeProtos.MinorType.INT));
>>>>     > > >
>>>>     > > > LogicalExpression fieldIdParamExpr =
>>>>     > > > ValueExpressions.getParameterExpression(fieldId,
>>>> Types.required(
>>>>     > > > TypeProtos.MinorType.INT) );
>>>>     > > > HoldingContainer fieldIdParamHolder =
>>>> cg.addExpr(fieldIdParamExpr);
>>>>     > > > int i = 0;
>>>>     > > >  for (LogicalExpression expr : keyExprs) {
>>>>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp =
>>>> new
>>>>     > > >
>>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>>>     > > > ExpressionPosition.UNKNOWN);
>>>>     > > >
>>>>     > > >     JFieldRef targetBuildSideFieldId =
>>>>     > cg.addExpr(targetBuildFieldIdExp,
>>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>>     > > >     JBlock ifBlock =
>>>>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>>>>     > > eq(targetBuildSideFieldId))._then();
>>>>     > > >     //specify a special JBlock which is a inner one of the
>>>> eval block
>>>>     > to
>>>>     > > > the ClassGenerator to substitute the returned JBlock of
>>>> getEvalBlock()
>>>>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
>>>>     > > >     LogicalExpression hashExpression =
>>>>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
>>>> null);
>>>>     > > >     LogicalExpression materializedExpr =
>>>>     > > >
>>>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>>>>     > > batch,
>>>>     > > > context.getFunctionRegistry());
>>>>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
>>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>>>     > > >     ifBlock._return(hash.getValue());
>>>>     > > >     //reset the customized block to null ,so the
>>>> getEvalBlock() return
>>>>     > > the
>>>>     > > > truly eval JBlock
>>>>     > > >     cg.setCustomizedEvalInnerBlock(null);
>>>>     > > >     i++;
>>>>     > > >  }
>>>>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > > }
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
>>>> seedValue,
>>>>     > > int
>>>>     > > > fieldId)
>>>>     > > > throws SchemaChangeException
>>>>     > > > {
>>>>     > > > {
>>>>     > > > IntHolder fieldId12 = new IntHolder();
>>>>     > > > fieldId12 .value = fieldId;
>>>>     > > > if (fieldId12 .value == constant14 .value) {
>>>>     > > >    IntHolder out18 = new IntHolder();
>>>>     > > >    {
>>>>     > > >      out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>>>>     > > >    }
>>>>     > > >    IntHolder seedValue19 = new IntHolder();
>>>>     > > >    seedValue19 .value = seedValue;
>>>>     > > >    //---- start of eval portion of hash32AsDouble function.
>>>> ----//
>>>>     > > >    IntHolder out20 = new IntHolder();
>>>>     > > >   {
>>>>     > > >       final IntHolder out = new IntHolder();
>>>>     > > >       IntHolder in = out18;
>>>>     > > >       IntHolder seed = seedValue19;
>>>>     > > >
>>>>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
>>>>     > > >       out.value =
>>>>     > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>> in.value,
>>>>     > > > seed.value);
>>>>     > > >    }
>>>>     > > >
>>>>     > > >    out20 = out;
>>>>     > > > }
>>>>     > > > //---- end of eval portion of hash32AsDouble function. ----//
>>>>     > > > return out20 .value;
>>>>     > > > }
>>>>     > > > return 0;
>>>>     > > > }
>>>>     > > > }
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > >
>>>>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
>>>> tongweijie178@gmail.com>
>>>>     > > > wrote:
>>>>     > > >
>>>>     > > >> HI Paul:
>>>>     > > >>
>>>>     > > >>  Thanks for your enthusiasm. I have managed this skill as
>>>> you ever
>>>>     > > >> mentioned me at another mail thread. It's really helpful
>>>> ,thanks for
>>>>     > > your
>>>>     > > >> valuable work.
>>>>     > > >>
>>>>     > > >>   Now I have solved this tough problem by adding a
>>>> customized JBlock
>>>>     > > >> member field to the ClassGenerator. So once you want the
>>>>     > getEvalBlock()
>>>>     > > of
>>>>     > > >> the ClassGenerator to return a inner customized JBlock ,
>>>> then you set
>>>>     > > this
>>>>     > > >> member, if you want the method to return eval self JBlock ,
>>>> you reset
>>>>     > > this
>>>>     > > >> member to null.
>>>>     > > >>
>>>>     > > >>   Here is my changed setup method :
>>>>     > > >>
>>>>     > > >>
>>>>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable>
>>>> cg,
>>>>     > > MappingSet incomingMapping, VectorAccessible batch,
>>>> LogicalExpression[]
>>>>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>>>>     > > >>   throws SchemaChangeException {
>>>>     > > >>   cg.setMappingSet(incomingMapping);
>>>>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
>>>>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > >>   }
>>>>     > > >>   String seedValue = "seedValue";
>>>>     > > >>   String fieldId = "fieldId";
>>>>     > > >>   LogicalExpression seed =
>>>>     > ValueExpressions.getParameterExpression(seedValue,
>>>>     > > Types.required(TypeProtos.MinorType.INT));
>>>>     > > >>
>>>>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
>>>>     > > getParameterExpression(fieldId, Types.required(
>>>> TypeProtos.MinorType.INT)
>>>>     > > );
>>>>     > > >>   HoldingContainer fieldIdParamHolder =
>>>> cg.addExpr(fieldIdParamExpr);
>>>>     > > >>   int i = 0;
>>>>     > > >>   for (LogicalExpression expr : keyExprs) {
>>>>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp =
>>>> new
>>>>     > >
>>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>>>     > > ExpressionPosition.UNKNOWN);
>>>>     > > >>
>>>>     > > >>     JFieldRef targetBuildSideFieldId =
>>>>     > cg.addExpr(targetBuildFieldIdExp,
>>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
>>>>     > >
>>>> fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>>>>     > > >>     //specify a special JBlock which is a inner one of the
>>>> eval block
>>>>     > > to the ClassGenerator to substitute the returned JBlock of
>>>> getEvalBlock()
>>>>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
>>>>     > > >>     LogicalExpression hashExpression =
>>>>     > HashPrelUtil.getHashExpression(expr,
>>>>     > > seed, incomingProbe != null);
>>>>     > > >>     LogicalExpression materializedExpr =
>>>> ExpressionTreeMaterializer.
>>>>     > > materializeAndCheckErrors(hashExpression, batch,
>>>>     > > context.getFunctionRegistry());
>>>>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
>>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>>>     > > >>     ifBlock._return(hash.getValue());
>>>>     > > >>     //reset the customized block to null ,so the
>>>> getEvalBlock() return
>>>>     > > the truly eval JBlock
>>>>     > > >>     cg.setCustomizedEvalInnerBlock(null);
>>>>     > > >>     i++;
>>>>     > > >>   }
>>>>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > >> }
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>  The corresponding generated codes :
>>>>     > > >>
>>>>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx,
>>>> int
>>>>     > > seedValue, int fieldId)
>>>>     > > >>         throws SchemaChangeException
>>>>     > > >>     {
>>>>     > > >>         {
>>>>     > > >>             IntHolder fieldId12 = new IntHolder();
>>>>     > > >>             fieldId12 .value = fieldId;
>>>>     > > >>             if (fieldId12 .value == constant14 .value) {
>>>>     > > >>                 IntHolder out18 = new IntHolder();
>>>>     > > >>                 {
>>>>     > > >>                     out18 .value = vv15 .getAccessor().get((
>>>>     > > incomingRowIdx));
>>>>     > > >>                 }
>>>>     > > >>                 IntHolder seedValue19 = new IntHolder();
>>>>     > > >>                 seedValue19 .value = seedValue;
>>>>     > > >>                 //---- start of eval portion of
>>>> hash32AsDouble
>>>>     > > function. ----//
>>>>     > > >>                 IntHolder out20 = new IntHolder();
>>>>     > > >>                 {
>>>>     > > >>                     final IntHolder out = new IntHolder();
>>>>     > > >>                     IntHolder in = out18;
>>>>     > > >>                     IntHolder seed = seedValue19;
>>>>     > > >>
>>>>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>>>>     > > >>     out.value =
>>>>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>>     > > in.value, seed.value);
>>>>     > > >> }
>>>>     > > >>
>>>>     > > >>                     out20 = out;
>>>>     > > >>                 }
>>>>     > > >>                 //---- end of eval portion of hash32AsDouble
>>>> function.
>>>>     > > ----//
>>>>     > > >>                 return out20 .value;
>>>>     > > >>             }
>>>>     > > >>             return  0;
>>>>     > > >>         }
>>>>     > > >>     }
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>   Some other explanation:
>>>>     > > >>   1st : The if checking won't hurt the performance , as I
>>>> invoke this
>>>>     > > >> method column by column , so it's branch predication
>>>> friendly.
>>>>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ，since
>>>> the
>>>>     > efficient
>>>>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid
>>>> the
>>>>     > conflict.
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >>
>>>>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
>>>>     > <par0328@yahoo.com.invalid
>>>>     > > >
>>>>     > > >> wrote:
>>>>     > > >>
>>>>     > > >>> Hi Weijie,
>>>>     > > >>>
>>>>     > > >>> Seeing the discussion about the details of JCodeModel
>>>> suggests you
>>>>     > may
>>>>     > > >>> be trying to debug your generated code at the level of the
>>>> code
>>>>     > > generator.
>>>>     > > >>>
>>>>     > > >>> Some time ago we added the ability to step through the
>>>> generated
>>>>     > code.
>>>>     > > >>> Look for the following line in the generator code:
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>     // Uncomment out this line to debug the generated code.
>>>>     > > >>>
>>>>     > > >>> //    cg.saveCodeForDebugging(true);
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>> Uncomment the code line and Drill will save each generated
>>>> file to a
>>>>     > > >>> configured location (which, if I recall correctly, is
>>>>     > > /tmp/drill/codegen,
>>>>     > > >>> though it may have changed after Tim's test directory
>>>> changes.)
>>>>     > > >>>
>>>>     > > >>> Then, set a breakpoint in the template setup() method and
>>>> you can
>>>>     > step
>>>>     > > >>> directly into the generated doSetup() method. Same for the
>>>> eval()
>>>>     > > method.
>>>>     > > >>>
>>>>     > > >>> This way, you can not only see the generated code, you can
>>>> step
>>>>     > through
>>>>     > > >>> it. I've found this to be a far easier way to understand the
>>>>     > generated
>>>>     > > code
>>>>     > > >>> than the older techniques folks have used (look at byte
>>>> codes, use
>>>>     > > print
>>>>     > > >>> statements, brute force reasoning, etc.)
>>>>     > > >>>
>>>>     > > >>> Tim, Boaz and others have used this technique more recently
>>>> and can
>>>>     > > >>> probably give you additional pointers.
>>>>     > > >>>
>>>>     > > >>> Thanks,
>>>>     > > >>> - Paul
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
>>>>     > > >>> tongweijie178@gmail.com> wrote:
>>>>     > > >>>
>>>>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you need
>>>> an
>>>>     > _else()
>>>>     > > >>> block
>>>>     > > >>> also ?"  I give a default return logic at the method, so I
>>>> don't need
>>>>     > > the
>>>>     > > >>> _else() block.  I have noticed the IfExpression's
>>>> evaluation method
>>>>     > at
>>>>     > > >>> EvaluationVisitor which also uses the JConditional . But
>>>> that also
>>>>     > > >>> doesn't
>>>>     > > >>> match my requirement. I think the key point here is the
>>>>     > > >>> FunctionHolderExpression and ValueVectorReadExpression will
>>>> put their
>>>>     > > >>> corresponding generated codes to the eval method's JBlock ,
>>>> not our
>>>>     > > >>> specific IfBlock which is a inner block of the eval
>>>> method's JBlock .
>>>>     > > >>>
>>>>     > > >>> So it seems I should make some changes to the
>>>> ClassGenerator to let
>>>>     > the
>>>>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
>>>> JConditional's
>>>>     > > then
>>>>     > > >>> block) or implement some special FunctionHolderExpression
>>>>     > > >>> 、ValueVectorReadExpression and corresponding visiting
>>>> methods at the
>>>>     > > >>> EvaluationVisitor to generate the special code blocks. Hope
>>>> someone
>>>>     > who
>>>>     > > >>> are
>>>>     > > >>> familiar with these part of codes to point out whether
>>>> there are more
>>>>     > > >>> easy
>>>>     > > >>> or different choices to achieve the target.
>>>>     > > >>>
>>>>     > > >>> To make discussion more accurate, I put the generated codes
>>>> of the
>>>>     > > >>> previous
>>>>     > > >>> setupGetBuild64Hash method here:
>>>>     > > >>>
>>>>     > > >>>     public long getBuild64HashCodeInner(int incomingRowIdx,
>>>> int
>>>>     > > >>> seedValue, int fieldId)
>>>>     > > >>>         throws SchemaChangeException
>>>>     > > >>>     {
>>>>     > > >>>         {
>>>>     > > >>>             IntHolder fieldId16 = new IntHolder();
>>>>     > > >>>             fieldId16 .value = fieldId;
>>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>>>     > > >>>                 return out24 .value;
>>>>     > > >>>             }
>>>>     > > >>>             IntHolder out22 = new IntHolder();
>>>>     > > >>>             {
>>>>     > > >>>                 out22 .value = vv19 .getAccessor().get((
>>>>     > > incomingRowIdx));
>>>>     > > >>>             }
>>>>     > > >>>             IntHolder seedValue23 = new IntHolder();
>>>>     > > >>>             seedValue23 .value = seedValue;
>>>>     > > >>>             //---- start of eval portion of hash32AsDouble
>>>> function.
>>>>     > > >>> ----//
>>>>     > > >>>             IntHolder out24 = new IntHolder();
>>>>     > > >>>             {
>>>>     > > >>>                 final IntHolder out = new IntHolder();
>>>>     > > >>>                 IntHolder in = out22;
>>>>     > > >>>                 IntHolder seed = seedValue23;
>>>>     > > >>>
>>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>>>     > > >>>     out.value =
>>>>     > > >>>
>>>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>>     > > >>> in.value, seed.value);
>>>>     > > >>> }
>>>>     > > >>>
>>>>     > > >>>                 out24 = out;
>>>>     > > >>>             }
>>>>     > > >>>             //---- end of eval portion of hash32AsDouble
>>>> function.
>>>>     > > ----//
>>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>>>     > > >>>                 return out26 .value;
>>>>     > > >>>             }
>>>>     > > >>>             IntHolder seedValue25 = new IntHolder();
>>>>     > > >>>             seedValue25 .value = seedValue;
>>>>     > > >>>             //---- start of eval portion of hash32AsDouble
>>>> function.
>>>>     > > >>> ----//
>>>>     > > >>>             IntHolder out26 = new IntHolder();
>>>>     > > >>>             {
>>>>     > > >>>                 final IntHolder out = new IntHolder();
>>>>     > > >>>                 IntHolder in = out22;
>>>>     > > >>>                 IntHolder seed = seedValue25;
>>>>     > > >>>
>>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>>>     > > >>>     out.value =
>>>>     > > >>>
>>>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>>     > > >>> in.value, seed.value);
>>>>     > > >>> }
>>>>     > > >>>
>>>>     > > >>>                 out26 = out;
>>>>     > > >>>             }
>>>>     > > >>>             //---- end of eval portion of hash32AsDouble
>>>> function.
>>>>     > > ----//
>>>>     > > >>>             return  0;
>>>>     > > >>>         }
>>>>     > > >>>     }
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>>
>>>>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
>>>> amansinha@apache.org>
>>>>     > > >>> wrote:
>>>>     > > >>>
>>>>     > > >>> > sorry, the previous email is incomplete.
>>>>     > > >>> > For the ifBlock, do you need an _else() block also ?
>>>>     > > >>> >
>>>>     > > >>> > I have sometimes found that 'JConditional' is a good way
>>>> to break
>>>>     > > down
>>>>     > > >>> the
>>>>     > > >>> > logic further.  Please see example usages of JConditional
>>>> here [1].
>>>>     > > >>> >
>>>>     > > >>> > -Aman
>>>>     > > >>> >
>>>>     > > >>> > [1]
>>>>     > > >>> >
>>>>     > > >>> >
>>>>     > > >>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
>>>> .
>>>>     > > sun.codemodel.JBlock
>>>>     > > >>> >
>>>>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
>>>> amansinha@apache.org>
>>>>     > > >>> wrote:
>>>>     > > >>> >
>>>>     > > >>> > > Hi Weijie,
>>>>     > > >>> > > It would be a little cumbersome to debug such issues
>>>> over email
>>>>     > > >>> since one
>>>>     > > >>> > > has to look at the generated code output and
>>>> iteratively debug.
>>>>     > > >>> > > Couple of thoughts I have that might help:
>>>>     > > >>> > >
>>>>     > > >>> > > For this particular if-then block, should you also
>>>>     > > >>> > > JBlock ifBlock =
>>>>     > > >>> > >
>>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>>>     > > >>> > > tBuildSideFieldId))._then();
>>>>     > > >>> > >
>>>>     > > >>> > >
>>>>     > > >>> > >
>>>>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
>>>>     > > >>> tongweijie178@gmail.com>
>>>>     > > >>> > > wrote:
>>>>     > > >>> > >
>>>>     > > >>> > >> HI All:
>>>>     > > >>> > >>  Through implementing the JPPD feature (
>>>>     > > >>> > >>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
>>>> , I was
>>>>     > blocked
>>>>     > > >>> by
>>>>     > > >>> > the
>>>>     > > >>> > >> problem: how to get the hash code of each build side
>>>> of the hash
>>>>     > > >>> join
>>>>     > > >>> > >> columns through the dynamic generated java code. Hope
>>>> someone
>>>>     > can
>>>>     > > >>> give
>>>>     > > >>> > >> some
>>>>     > > >>> > >> advice.
>>>>     > > >>> > >>
>>>>     > > >>> > >>    I supposed to add methods as below to the
>>>> HashTableTemplate :
>>>>     > > >>> > >>
>>>>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int
>>>>     > seedValue,
>>>>     > > >>> int
>>>>     > > >>> > >> fieldId) throws SchemaChangeException{
>>>>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
>>>> seedValue,
>>>>     > > >>> fieldId);
>>>>     > > >>> > >> }
>>>>     > > >>> > >>
>>>>     > > >>> > >> protected abstract long
>>>>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int
>>>>     > > incomingRowIdx,
>>>>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId")
>>>> int
>>>>     > fieldId)
>>>>     > > >>> > >> throws SchemaChangeException;
>>>>     > > >>> > >>
>>>>     > > >>> > >>
>>>>     > > >>> > >>    The high level code to invoke the
>>>> getBuild64HashCode method
>>>>     > is
>>>>     > > >>> at the
>>>>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
>>>>     > > >>> > >>
>>>>     > > >>> > >> //create runtime filter
>>>>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) {
>>>>     > > >>> > >>  //create runtime filter and send out async
>>>>     > > >>> > >>  int condFieldIndex = 0;
>>>>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters) {
>>>>     > > >>> > >>    //VV
>>>>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount; ind++) {
>>>>     > > >>> > >>      long hashCode =
>>>> partitions[0].getBuild64HashCode(ind,
>>>>     > > >>> > >> condFieldIndex);
>>>>     > > >>> > >>      bloomFilter.insert(hashCode);
>>>>     > > >>> > >>    }
>>>>     > > >>> > >>    condFieldIndex++;
>>>>     > > >>> > >>  }
>>>>     > > >>> > >>  //TODO sered out async
>>>>     > > >>> > >> }
>>>>     > > >>> > >>
>>>>     > > >>> > >>
>>>>     > > >>> > >>  As you know, the abstract method
>>>> getBuild64HashCodeInner needs
>>>>     > to
>>>>     > > >>> > >> calculate the hash codes of each build side column by
>>>> the
>>>>     > fieldId
>>>>     > > >>> input
>>>>     > > >>> > >> parameter. In order to achieve this target, I plan to
>>>> have
>>>>     > > different
>>>>     > > >>> > >> solving parts corresponding to different column
>>>> ValueVector ,
>>>>     > > using
>>>>     > > >>> the
>>>>     > > >>> > if
>>>>     > > >>> > >> statement to distinguish different solving parts
>>>> through the id
>>>>     > of
>>>>     > > >>> the
>>>>     > > >>> > >> column.  The corresponding method to generate the
>>>> dynamic codes
>>>>     > > is
>>>>     > > >>> as
>>>>     > > >>> > >> below:
>>>>     > > >>> > >>
>>>>     > > >>> > >> private void
>>>> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>>     > > >>> > >> MappingSet incomingMapping, VectorAccessible batch,
>>>>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
>>>> buildKeyFieldIds)
>>>>     > > >>> > >>  throws SchemaChangeException {
>>>>     > > >>> > >>  cg.setMappingSet(incomingMapping);
>>>>     > > >>> > >>  if (keyExprs == null || keyExprs.length == 0) {
>>>>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > >>> > >>  }
>>>>     > > >>> > >>  String seedValue = "seedValue";
>>>>     > > >>> > >>  String fieldId = "fieldId";
>>>>     > > >>> > >>  LogicalExpression seed =
>>>>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>>>>     > > >>> > >>
>>>>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
>>>>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
>>>>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
>>>>     > > cg.addExpr(fieldIdParamExpr);
>>>>     > > >>> > >>  int i = 0;
>>>>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
>>>>     > > >>> > >>    TypedFieldId targetTypeFieldId =
>>>> buildKeyFieldIds[i];
>>>>     > > >>> > >>    ValueExpressions.IntExpression
>>>> targetBuildFieldIdExp = new
>>>>     > > >>> > >>
>>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>>>>     > > )[0],
>>>>     > > >>> > >> ExpressionPosition.UNKNOWN);
>>>>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
>>>>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>>>>     > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>>     > > >>> > >>    JBlock ifBlock =
>>>>     > > >>> > >>
>>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>>>     > > >>> > >> tBuildSideFieldId))._then();
>>>>     > > >>> > >>
>>>>     > > >>> > >>    LogicalExpression hashExpression =
>>>>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed,
>>>> incomingProbe !=
>>>>     > > null);
>>>>     > > >>> > >>    LogicalExpression materializedExpr =
>>>>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>>>>     > > hashExpression,
>>>>     > > >>> > >> batch, context.getFunctionRegistry());
>>>>     > > >>> > >>    HoldingContainer hash = cg.addExpr(materializedExpr,
>>>>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>>>>     > > >>> > >>
>>>>     > > >>> > >>
>>>>     > > >>> > >>    ifBlock._return(hash.getValue());
>>>>     > > >>> > >>    i++;
>>>>     > > >>> > >>  }
>>>>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
>>>>     > > >>> > >>
>>>>     > > >>> > >> }
>>>>     > > >>> > >>
>>>>     > > >>> > >> But unfortunately, the generated codes are not what I
>>>> expected.
>>>>     > > The
>>>>     > > >>> > codes
>>>>     > > >>> > >> to read ValueVector , calculate hash code of the read
>>>> value do
>>>>     > not
>>>>     > > >>> stay
>>>>     > > >>> > in
>>>>     > > >>> > >> the if block.  So how can I let the related codes stay
>>>> in the if
>>>>     > > >>> block ?
>>>>     > > >>> > >>
>>>>     > > >>> > >
>>>>     > > >>> > >
>>>>     > > >>> >
>>>>     > > >>
>>>>     > > >>
>>>>     > >
>>>>     >
>>>>
>>>>
>>>>

Re: How to generate hash code for each build side one of the hash join columns

Posted by weijie tong <to...@gmail.com>.

I find the answer that RecordBatch's max size is 2^16 which is defined at
RecordBatch's MAX_BATCH_SIZE.

On Fri, Jun 1, 2018 at 3:36 PM weijie tong <to...@gmail.com> wrote:

> Some questions about SelectionVector2 and SelectionVector4:
>
> I want to create SelectionVector4 or SelectionVector2 to represent the
> filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does
> not support SelectVector4 . And the SelectionVector2's record count size is
> char type size .  So why SelectionVector4 is not supported by the
> ProjectBatch ? The same question is to the FilterBatch's SelectVector2
> which also only support the 2 Byte size record count.
>
> On Fri, Jun 1, 2018 at 1:40 PM weijie tong <to...@gmail.com>
> wrote:
>
>> Hi Boaz:
>>
>>   Your propose is valuable though I have implemented the dynamic
>> generating code logic.  If a  ``` long hash64(int index, long seed) ```
>> method is added to the ValueVector , it will also benefit others to
>> implement specific storage plugin's filter logic by using the pushed down
>> bloom filter.  To HashJoin and HashAggregate , methods ```double
>> hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
>> seed)```  will also be needed to the ValueVector.  If no one else gives
>> objection , I will be pleasure to take this work.
>>
>>    Btw, I will share my thought about the scan side's filter logic by the
>> BloomFilter. The scan side filter logic here I supposed to do is to filter
>> the materialized ValueVector ,not at the process to construct the
>> ValueVector from the original storage format data. The reason is the
>> checking logic will break down the performance to materialize the original
>> deep storage format data to ValueVector.
>>
>> On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <bb...@mapr.com> wrote:
>>
>>>  Hi Weijie,
>>>
>>>     Another option is to totally avoid the generated code.
>>> We were considering the idea of replacing the generated code used for
>>> computing hash values with “real java” code.
>>>
>>> This idea is analogous to the usage of the copyEntry() method in the
>>> ValueVector interface (that Paul added last year).
>>> See an example of using the copyEntry() (via the appendRow() in
>>> VectorContainer) in the new Hash-Join-Spill code.
>>> Basically no need to generate “type specific” code, as the virtual
>>> copyEntry() method does the “type specific” work.
>>>
>>> Similarly we could have a hash64() method in ValueVector, which would
>>> perform the “type specific” computation.
>>> (One difference from copyEntry() – the hash64() would also need to take
>>> the “seed” parameter, which is the hash value produced by the previous
>>> hash).
>>> And similar to appendRow(), there would be evalHash() iterating over the
>>> key columns.
>>> (And one difference from appendRow() – need to iterate only on the key
>>> columns; these are the first columns; their number can be found from the
>>> config: e.g., htConfig.getKeyExprsBuild().size() )
>>>
>>>    With such implementation, that evalHash() could be used anywhere
>>> (e.g., to match the Bloom filters on the left side of the join).
>>>
>>>        Thanks,
>>>
>>>              Boaz
>>>
>>>
>>> On 5/30/18, 7:49 PM, "weijie tong" <to...@gmail.com> wrote:
>>>
>>>     Hi Aman:
>>>
>>>       Thanks for your tips. I have rebased the latest code from the
>>> master
>>>     branch . Yes, the spill-to-disk feature does changed the original
>>>     implementation. I have adjusted my implementation according to the
>>> new
>>>     feature. But as you say, it will take some challenge to integration
>>> as I
>>>     noticed the spill-to-disk feature will continue to tune its
>>> implementation
>>>     performance.
>>>
>>>       The BloomFilter was implemented natively in Drill , not an external
>>>     library. It's implemented the algorithm of the paper which was
>>> mentioned by
>>>     you.
>>>
>>>
>>>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <am...@apache.org>
>>> wrote:
>>>
>>>     > Hi Weijie,
>>>     > I was hoping you could leverage the existing methods..so its good
>>> that you
>>>     > found the ones that work for your use case.
>>>     > One thing I want to point out (maybe you're already aware) .. the
>>> Hash Join
>>>     > code has changed significantly in the master branch due to the
>>>     > spill-to-disk feature.
>>>     > So, this may pose some integration challenges for your run-time
>>> join
>>>     > pushdown feature.
>>>     > Also, one other question/clarification:  for the bloom filter
>>> itself are
>>>     > you implementing it natively in Drill or using an external library
>>> ?
>>>     >
>>>     > -Aman
>>>     >
>>>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
>>> tongweijie178@gmail.com>
>>>     > wrote:
>>>     >
>>>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>>>     > unNestEvalBlock()
>>>     > > which has the same effect to what I change to the
>>> ClassGenerator. So I
>>>     > give
>>>     > > up what I change to the ClassGenerator and hope this can help
>>> someone
>>>     > else.
>>>     > >
>>>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
>>> tongweijie178@gmail.com>
>>>     > > wrote:
>>>     > >
>>>     > > > The code formatting is not nice. Put them again:
>>>     > > >
>>>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>     > > MappingSet
>>>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
>>> keyExprs,
>>>     > > > TypedFieldId[] buildKeyFieldIds)
>>>     > > > throws SchemaChangeException {
>>>     > > > cg.setMappingSet(incomingMapping);
>>>     > > > if (keyExprs == null || keyExprs.length == 0) {
>>>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > > }
>>>     > > > String seedValue = "seedValue";
>>>     > > > String fieldId = "fieldId";
>>>     > > > LogicalExpression seed =
>>>     > > > ValueExpressions.getParameterExpression(seedValue,
>>> Types.required(
>>>     > > > TypeProtos.MinorType.INT));
>>>     > > >
>>>     > > > LogicalExpression fieldIdParamExpr =
>>>     > > > ValueExpressions.getParameterExpression(fieldId,
>>> Types.required(
>>>     > > > TypeProtos.MinorType.INT) );
>>>     > > > HoldingContainer fieldIdParamHolder =
>>> cg.addExpr(fieldIdParamExpr);
>>>     > > > int i = 0;
>>>     > > >  for (LogicalExpression expr : keyExprs) {
>>>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp = new
>>>     > > >
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>>     > > > ExpressionPosition.UNKNOWN);
>>>     > > >
>>>     > > >     JFieldRef targetBuildSideFieldId =
>>>     > cg.addExpr(targetBuildFieldIdExp,
>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>     > > >     JBlock ifBlock =
>>>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>>>     > > eq(targetBuildSideFieldId))._then();
>>>     > > >     //specify a special JBlock which is a inner one of the
>>> eval block
>>>     > to
>>>     > > > the ClassGenerator to substitute the returned JBlock of
>>> getEvalBlock()
>>>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
>>>     > > >     LogicalExpression hashExpression =
>>>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
>>> null);
>>>     > > >     LogicalExpression materializedExpr =
>>>     > > >
>>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>>>     > > batch,
>>>     > > > context.getFunctionRegistry());
>>>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
>>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>>     > > >     ifBlock._return(hash.getValue());
>>>     > > >     //reset the customized block to null ,so the
>>> getEvalBlock() return
>>>     > > the
>>>     > > > truly eval JBlock
>>>     > > >     cg.setCustomizedEvalInnerBlock(null);
>>>     > > >     i++;
>>>     > > >  }
>>>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > > }
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
>>> seedValue,
>>>     > > int
>>>     > > > fieldId)
>>>     > > > throws SchemaChangeException
>>>     > > > {
>>>     > > > {
>>>     > > > IntHolder fieldId12 = new IntHolder();
>>>     > > > fieldId12 .value = fieldId;
>>>     > > > if (fieldId12 .value == constant14 .value) {
>>>     > > >    IntHolder out18 = new IntHolder();
>>>     > > >    {
>>>     > > >      out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>>>     > > >    }
>>>     > > >    IntHolder seedValue19 = new IntHolder();
>>>     > > >    seedValue19 .value = seedValue;
>>>     > > >    //---- start of eval portion of hash32AsDouble function.
>>> ----//
>>>     > > >    IntHolder out20 = new IntHolder();
>>>     > > >   {
>>>     > > >       final IntHolder out = new IntHolder();
>>>     > > >       IntHolder in = out18;
>>>     > > >       IntHolder seed = seedValue19;
>>>     > > >
>>>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
>>>     > > >       out.value =
>>>     > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>> in.value,
>>>     > > > seed.value);
>>>     > > >    }
>>>     > > >
>>>     > > >    out20 = out;
>>>     > > > }
>>>     > > > //---- end of eval portion of hash32AsDouble function. ----//
>>>     > > > return out20 .value;
>>>     > > > }
>>>     > > > return 0;
>>>     > > > }
>>>     > > > }
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > >
>>>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
>>> tongweijie178@gmail.com>
>>>     > > > wrote:
>>>     > > >
>>>     > > >> HI Paul:
>>>     > > >>
>>>     > > >>  Thanks for your enthusiasm. I have managed this skill as you
>>> ever
>>>     > > >> mentioned me at another mail thread. It's really helpful
>>> ,thanks for
>>>     > > your
>>>     > > >> valuable work.
>>>     > > >>
>>>     > > >>   Now I have solved this tough problem by adding a customized
>>> JBlock
>>>     > > >> member field to the ClassGenerator. So once you want the
>>>     > getEvalBlock()
>>>     > > of
>>>     > > >> the ClassGenerator to return a inner customized JBlock , then
>>> you set
>>>     > > this
>>>     > > >> member, if you want the method to return eval self JBlock ,
>>> you reset
>>>     > > this
>>>     > > >> member to null.
>>>     > > >>
>>>     > > >>   Here is my changed setup method :
>>>     > > >>
>>>     > > >>
>>>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>     > > MappingSet incomingMapping, VectorAccessible batch,
>>> LogicalExpression[]
>>>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>>>     > > >>   throws SchemaChangeException {
>>>     > > >>   cg.setMappingSet(incomingMapping);
>>>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
>>>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > >>   }
>>>     > > >>   String seedValue = "seedValue";
>>>     > > >>   String fieldId = "fieldId";
>>>     > > >>   LogicalExpression seed =
>>>     > ValueExpressions.getParameterExpression(seedValue,
>>>     > > Types.required(TypeProtos.MinorType.INT));
>>>     > > >>
>>>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
>>>     > > getParameterExpression(fieldId, Types.required(
>>> TypeProtos.MinorType.INT)
>>>     > > );
>>>     > > >>   HoldingContainer fieldIdParamHolder =
>>> cg.addExpr(fieldIdParamExpr);
>>>     > > >>   int i = 0;
>>>     > > >>   for (LogicalExpression expr : keyExprs) {
>>>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp = new
>>>     > >
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>>     > > ExpressionPosition.UNKNOWN);
>>>     > > >>
>>>     > > >>     JFieldRef targetBuildSideFieldId =
>>>     > cg.addExpr(targetBuildFieldIdExp,
>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
>>>     > >
>>> fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>>>     > > >>     //specify a special JBlock which is a inner one of the
>>> eval block
>>>     > > to the ClassGenerator to substitute the returned JBlock of
>>> getEvalBlock()
>>>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
>>>     > > >>     LogicalExpression hashExpression =
>>>     > HashPrelUtil.getHashExpression(expr,
>>>     > > seed, incomingProbe != null);
>>>     > > >>     LogicalExpression materializedExpr =
>>> ExpressionTreeMaterializer.
>>>     > > materializeAndCheckErrors(hashExpression, batch,
>>>     > > context.getFunctionRegistry());
>>>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
>>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>>     > > >>     ifBlock._return(hash.getValue());
>>>     > > >>     //reset the customized block to null ,so the
>>> getEvalBlock() return
>>>     > > the truly eval JBlock
>>>     > > >>     cg.setCustomizedEvalInnerBlock(null);
>>>     > > >>     i++;
>>>     > > >>   }
>>>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > >> }
>>>     > > >>
>>>     > > >>
>>>     > > >>  The corresponding generated codes :
>>>     > > >>
>>>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx,
>>> int
>>>     > > seedValue, int fieldId)
>>>     > > >>         throws SchemaChangeException
>>>     > > >>     {
>>>     > > >>         {
>>>     > > >>             IntHolder fieldId12 = new IntHolder();
>>>     > > >>             fieldId12 .value = fieldId;
>>>     > > >>             if (fieldId12 .value == constant14 .value) {
>>>     > > >>                 IntHolder out18 = new IntHolder();
>>>     > > >>                 {
>>>     > > >>                     out18 .value = vv15 .getAccessor().get((
>>>     > > incomingRowIdx));
>>>     > > >>                 }
>>>     > > >>                 IntHolder seedValue19 = new IntHolder();
>>>     > > >>                 seedValue19 .value = seedValue;
>>>     > > >>                 //---- start of eval portion of hash32AsDouble
>>>     > > function. ----//
>>>     > > >>                 IntHolder out20 = new IntHolder();
>>>     > > >>                 {
>>>     > > >>                     final IntHolder out = new IntHolder();
>>>     > > >>                     IntHolder in = out18;
>>>     > > >>                     IntHolder seed = seedValue19;
>>>     > > >>
>>>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>>>     > > >>     out.value =
>>>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>     > > in.value, seed.value);
>>>     > > >> }
>>>     > > >>
>>>     > > >>                     out20 = out;
>>>     > > >>                 }
>>>     > > >>                 //---- end of eval portion of hash32AsDouble
>>> function.
>>>     > > ----//
>>>     > > >>                 return out20 .value;
>>>     > > >>             }
>>>     > > >>             return  0;
>>>     > > >>         }
>>>     > > >>     }
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>   Some other explanation:
>>>     > > >>   1st : The if checking won't hurt the performance , as I
>>> invoke this
>>>     > > >> method column by column , so it's branch predication friendly.
>>>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ，since the
>>>     > efficient
>>>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid the
>>>     > conflict.
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >>
>>>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
>>>     > <par0328@yahoo.com.invalid
>>>     > > >
>>>     > > >> wrote:
>>>     > > >>
>>>     > > >>> Hi Weijie,
>>>     > > >>>
>>>     > > >>> Seeing the discussion about the details of JCodeModel
>>> suggests you
>>>     > may
>>>     > > >>> be trying to debug your generated code at the level of the
>>> code
>>>     > > generator.
>>>     > > >>>
>>>     > > >>> Some time ago we added the ability to step through the
>>> generated
>>>     > code.
>>>     > > >>> Look for the following line in the generator code:
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>     // Uncomment out this line to debug the generated code.
>>>     > > >>>
>>>     > > >>> //    cg.saveCodeForDebugging(true);
>>>     > > >>>
>>>     > > >>>
>>>     > > >>> Uncomment the code line and Drill will save each generated
>>> file to a
>>>     > > >>> configured location (which, if I recall correctly, is
>>>     > > /tmp/drill/codegen,
>>>     > > >>> though it may have changed after Tim's test directory
>>> changes.)
>>>     > > >>>
>>>     > > >>> Then, set a breakpoint in the template setup() method and
>>> you can
>>>     > step
>>>     > > >>> directly into the generated doSetup() method. Same for the
>>> eval()
>>>     > > method.
>>>     > > >>>
>>>     > > >>> This way, you can not only see the generated code, you can
>>> step
>>>     > through
>>>     > > >>> it. I've found this to be a far easier way to understand the
>>>     > generated
>>>     > > code
>>>     > > >>> than the older techniques folks have used (look at byte
>>> codes, use
>>>     > > print
>>>     > > >>> statements, brute force reasoning, etc.)
>>>     > > >>>
>>>     > > >>> Tim, Boaz and others have used this technique more recently
>>> and can
>>>     > > >>> probably give you additional pointers.
>>>     > > >>>
>>>     > > >>> Thanks,
>>>     > > >>> - Paul
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
>>>     > > >>> tongweijie178@gmail.com> wrote:
>>>     > > >>>
>>>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you need
>>> an
>>>     > _else()
>>>     > > >>> block
>>>     > > >>> also ?"  I give a default return logic at the method, so I
>>> don't need
>>>     > > the
>>>     > > >>> _else() block.  I have noticed the IfExpression's evaluation
>>> method
>>>     > at
>>>     > > >>> EvaluationVisitor which also uses the JConditional . But
>>> that also
>>>     > > >>> doesn't
>>>     > > >>> match my requirement. I think the key point here is the
>>>     > > >>> FunctionHolderExpression and ValueVectorReadExpression will
>>> put their
>>>     > > >>> corresponding generated codes to the eval method's JBlock ,
>>> not our
>>>     > > >>> specific IfBlock which is a inner block of the eval method's
>>> JBlock .
>>>     > > >>>
>>>     > > >>> So it seems I should make some changes to the ClassGenerator
>>> to let
>>>     > the
>>>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
>>> JConditional's
>>>     > > then
>>>     > > >>> block) or implement some special FunctionHolderExpression
>>>     > > >>> 、ValueVectorReadExpression and corresponding visiting
>>> methods at the
>>>     > > >>> EvaluationVisitor to generate the special code blocks. Hope
>>> someone
>>>     > who
>>>     > > >>> are
>>>     > > >>> familiar with these part of codes to point out whether there
>>> are more
>>>     > > >>> easy
>>>     > > >>> or different choices to achieve the target.
>>>     > > >>>
>>>     > > >>> To make discussion more accurate, I put the generated codes
>>> of the
>>>     > > >>> previous
>>>     > > >>> setupGetBuild64Hash method here:
>>>     > > >>>
>>>     > > >>>     public long getBuild64HashCodeInner(int incomingRowIdx,
>>> int
>>>     > > >>> seedValue, int fieldId)
>>>     > > >>>         throws SchemaChangeException
>>>     > > >>>     {
>>>     > > >>>         {
>>>     > > >>>             IntHolder fieldId16 = new IntHolder();
>>>     > > >>>             fieldId16 .value = fieldId;
>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>>     > > >>>                 return out24 .value;
>>>     > > >>>             }
>>>     > > >>>             IntHolder out22 = new IntHolder();
>>>     > > >>>             {
>>>     > > >>>                 out22 .value = vv19 .getAccessor().get((
>>>     > > incomingRowIdx));
>>>     > > >>>             }
>>>     > > >>>             IntHolder seedValue23 = new IntHolder();
>>>     > > >>>             seedValue23 .value = seedValue;
>>>     > > >>>             //---- start of eval portion of hash32AsDouble
>>> function.
>>>     > > >>> ----//
>>>     > > >>>             IntHolder out24 = new IntHolder();
>>>     > > >>>             {
>>>     > > >>>                 final IntHolder out = new IntHolder();
>>>     > > >>>                 IntHolder in = out22;
>>>     > > >>>                 IntHolder seed = seedValue23;
>>>     > > >>>
>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>>     > > >>>     out.value =
>>>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>     > > >>> in.value, seed.value);
>>>     > > >>> }
>>>     > > >>>
>>>     > > >>>                 out24 = out;
>>>     > > >>>             }
>>>     > > >>>             //---- end of eval portion of hash32AsDouble
>>> function.
>>>     > > ----//
>>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>>     > > >>>                 return out26 .value;
>>>     > > >>>             }
>>>     > > >>>             IntHolder seedValue25 = new IntHolder();
>>>     > > >>>             seedValue25 .value = seedValue;
>>>     > > >>>             //---- start of eval portion of hash32AsDouble
>>> function.
>>>     > > >>> ----//
>>>     > > >>>             IntHolder out26 = new IntHolder();
>>>     > > >>>             {
>>>     > > >>>                 final IntHolder out = new IntHolder();
>>>     > > >>>                 IntHolder in = out22;
>>>     > > >>>                 IntHolder seed = seedValue25;
>>>     > > >>>
>>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>>     > > >>>     out.value =
>>>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>>     > > >>> in.value, seed.value);
>>>     > > >>> }
>>>     > > >>>
>>>     > > >>>                 out26 = out;
>>>     > > >>>             }
>>>     > > >>>             //---- end of eval portion of hash32AsDouble
>>> function.
>>>     > > ----//
>>>     > > >>>             return  0;
>>>     > > >>>         }
>>>     > > >>>     }
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>
>>>     > > >>>
>>>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
>>> amansinha@apache.org>
>>>     > > >>> wrote:
>>>     > > >>>
>>>     > > >>> > sorry, the previous email is incomplete.
>>>     > > >>> > For the ifBlock, do you need an _else() block also ?
>>>     > > >>> >
>>>     > > >>> > I have sometimes found that 'JConditional' is a good way
>>> to break
>>>     > > down
>>>     > > >>> the
>>>     > > >>> > logic further.  Please see example usages of JConditional
>>> here [1].
>>>     > > >>> >
>>>     > > >>> > -Aman
>>>     > > >>> >
>>>     > > >>> > [1]
>>>     > > >>> >
>>>     > > >>> >
>>>     > > >>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
>>> .
>>>     > > sun.codemodel.JBlock
>>>     > > >>> >
>>>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
>>> amansinha@apache.org>
>>>     > > >>> wrote:
>>>     > > >>> >
>>>     > > >>> > > Hi Weijie,
>>>     > > >>> > > It would be a little cumbersome to debug such issues
>>> over email
>>>     > > >>> since one
>>>     > > >>> > > has to look at the generated code output and iteratively
>>> debug.
>>>     > > >>> > > Couple of thoughts I have that might help:
>>>     > > >>> > >
>>>     > > >>> > > For this particular if-then block, should you also
>>>     > > >>> > > JBlock ifBlock =
>>>     > > >>> > >
>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>>     > > >>> > > tBuildSideFieldId))._then();
>>>     > > >>> > >
>>>     > > >>> > >
>>>     > > >>> > >
>>>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
>>>     > > >>> tongweijie178@gmail.com>
>>>     > > >>> > > wrote:
>>>     > > >>> > >
>>>     > > >>> > >> HI All:
>>>     > > >>> > >>  Through implementing the JPPD feature (
>>>     > > >>> > >>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
>>> , I was
>>>     > blocked
>>>     > > >>> by
>>>     > > >>> > the
>>>     > > >>> > >> problem: how to get the hash code of each build side of
>>> the hash
>>>     > > >>> join
>>>     > > >>> > >> columns through the dynamic generated java code. Hope
>>> someone
>>>     > can
>>>     > > >>> give
>>>     > > >>> > >> some
>>>     > > >>> > >> advice.
>>>     > > >>> > >>
>>>     > > >>> > >>    I supposed to add methods as below to the
>>> HashTableTemplate :
>>>     > > >>> > >>
>>>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int
>>>     > seedValue,
>>>     > > >>> int
>>>     > > >>> > >> fieldId) throws SchemaChangeException{
>>>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
>>> seedValue,
>>>     > > >>> fieldId);
>>>     > > >>> > >> }
>>>     > > >>> > >>
>>>     > > >>> > >> protected abstract long
>>>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int
>>>     > > incomingRowIdx,
>>>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId") int
>>>     > fieldId)
>>>     > > >>> > >> throws SchemaChangeException;
>>>     > > >>> > >>
>>>     > > >>> > >>
>>>     > > >>> > >>    The high level code to invoke the getBuild64HashCode
>>> method
>>>     > is
>>>     > > >>> at the
>>>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
>>>     > > >>> > >>
>>>     > > >>> > >> //create runtime filter
>>>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) {
>>>     > > >>> > >>  //create runtime filter and send out async
>>>     > > >>> > >>  int condFieldIndex = 0;
>>>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters) {
>>>     > > >>> > >>    //VV
>>>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount; ind++) {
>>>     > > >>> > >>      long hashCode =
>>> partitions[0].getBuild64HashCode(ind,
>>>     > > >>> > >> condFieldIndex);
>>>     > > >>> > >>      bloomFilter.insert(hashCode);
>>>     > > >>> > >>    }
>>>     > > >>> > >>    condFieldIndex++;
>>>     > > >>> > >>  }
>>>     > > >>> > >>  //TODO sered out async
>>>     > > >>> > >> }
>>>     > > >>> > >>
>>>     > > >>> > >>
>>>     > > >>> > >>  As you know, the abstract method
>>> getBuild64HashCodeInner needs
>>>     > to
>>>     > > >>> > >> calculate the hash codes of each build side column by
>>> the
>>>     > fieldId
>>>     > > >>> input
>>>     > > >>> > >> parameter. In order to achieve this target, I plan to
>>> have
>>>     > > different
>>>     > > >>> > >> solving parts corresponding to different column
>>> ValueVector ,
>>>     > > using
>>>     > > >>> the
>>>     > > >>> > if
>>>     > > >>> > >> statement to distinguish different solving parts
>>> through the id
>>>     > of
>>>     > > >>> the
>>>     > > >>> > >> column.  The corresponding method to generate the
>>> dynamic codes
>>>     > > is
>>>     > > >>> as
>>>     > > >>> > >> below:
>>>     > > >>> > >>
>>>     > > >>> > >> private void
>>> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>>     > > >>> > >> MappingSet incomingMapping, VectorAccessible batch,
>>>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
>>> buildKeyFieldIds)
>>>     > > >>> > >>  throws SchemaChangeException {
>>>     > > >>> > >>  cg.setMappingSet(incomingMapping);
>>>     > > >>> > >>  if (keyExprs == null || keyExprs.length == 0) {
>>>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > >>> > >>  }
>>>     > > >>> > >>  String seedValue = "seedValue";
>>>     > > >>> > >>  String fieldId = "fieldId";
>>>     > > >>> > >>  LogicalExpression seed =
>>>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>>>     > > >>> > >>
>>>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
>>>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
>>>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
>>>     > > cg.addExpr(fieldIdParamExpr);
>>>     > > >>> > >>  int i = 0;
>>>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
>>>     > > >>> > >>    TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>>     > > >>> > >>    ValueExpressions.IntExpression targetBuildFieldIdExp
>>> = new
>>>     > > >>> > >>
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>>>     > > )[0],
>>>     > > >>> > >> ExpressionPosition.UNKNOWN);
>>>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
>>>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>>>     > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>>     > > >>> > >>    JBlock ifBlock =
>>>     > > >>> > >>
>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>>     > > >>> > >> tBuildSideFieldId))._then();
>>>     > > >>> > >>
>>>     > > >>> > >>    LogicalExpression hashExpression =
>>>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed,
>>> incomingProbe !=
>>>     > > null);
>>>     > > >>> > >>    LogicalExpression materializedExpr =
>>>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>>>     > > hashExpression,
>>>     > > >>> > >> batch, context.getFunctionRegistry());
>>>     > > >>> > >>    HoldingContainer hash = cg.addExpr(materializedExpr,
>>>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>>>     > > >>> > >>
>>>     > > >>> > >>
>>>     > > >>> > >>    ifBlock._return(hash.getValue());
>>>     > > >>> > >>    i++;
>>>     > > >>> > >>  }
>>>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
>>>     > > >>> > >>
>>>     > > >>> > >> }
>>>     > > >>> > >>
>>>     > > >>> > >> But unfortunately, the generated codes are not what I
>>> expected.
>>>     > > The
>>>     > > >>> > codes
>>>     > > >>> > >> to read ValueVector , calculate hash code of the read
>>> value do
>>>     > not
>>>     > > >>> stay
>>>     > > >>> > in
>>>     > > >>> > >> the if block.  So how can I let the related codes stay
>>> in the if
>>>     > > >>> block ?
>>>     > > >>> > >>
>>>     > > >>> > >
>>>     > > >>> > >
>>>     > > >>> >
>>>     > > >>
>>>     > > >>
>>>     > >
>>>     >
>>>
>>>
>>>

Re: How to generate hash code for each build side one of the hash join columns

Posted by weijie tong <to...@gmail.com>.

Some questions about SelectionVector2 and SelectionVector4:

I want to create SelectionVector4 or SelectionVector2 to represent the
filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does
not support SelectVector4 . And the SelectionVector2's record count size is
char type size .  So why SelectionVector4 is not supported by the
ProjectBatch ? The same question is to the FilterBatch's SelectVector2
which also only support the 2 Byte size record count.

On Fri, Jun 1, 2018 at 1:40 PM weijie tong <to...@gmail.com> wrote:

> Hi Boaz:
>
>   Your propose is valuable though I have implemented the dynamic
> generating code logic.  If a  ``` long hash64(int index, long seed) ```
> method is added to the ValueVector , it will also benefit others to
> implement specific storage plugin's filter logic by using the pushed down
> bloom filter.  To HashJoin and HashAggregate , methods ```double
> hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
> seed)```  will also be needed to the ValueVector.  If no one else gives
> objection , I will be pleasure to take this work.
>
>    Btw, I will share my thought about the scan side's filter logic by the
> BloomFilter. The scan side filter logic here I supposed to do is to filter
> the materialized ValueVector ,not at the process to construct the
> ValueVector from the original storage format data. The reason is the
> checking logic will break down the performance to materialize the original
> deep storage format data to ValueVector.
>
> On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <bb...@mapr.com> wrote:
>
>>  Hi Weijie,
>>
>>     Another option is to totally avoid the generated code.
>> We were considering the idea of replacing the generated code used for
>> computing hash values with “real java” code.
>>
>> This idea is analogous to the usage of the copyEntry() method in the
>> ValueVector interface (that Paul added last year).
>> See an example of using the copyEntry() (via the appendRow() in
>> VectorContainer) in the new Hash-Join-Spill code.
>> Basically no need to generate “type specific” code, as the virtual
>> copyEntry() method does the “type specific” work.
>>
>> Similarly we could have a hash64() method in ValueVector, which would
>> perform the “type specific” computation.
>> (One difference from copyEntry() – the hash64() would also need to take
>> the “seed” parameter, which is the hash value produced by the previous
>> hash).
>> And similar to appendRow(), there would be evalHash() iterating over the
>> key columns.
>> (And one difference from appendRow() – need to iterate only on the key
>> columns; these are the first columns; their number can be found from the
>> config: e.g., htConfig.getKeyExprsBuild().size() )
>>
>>    With such implementation, that evalHash() could be used anywhere
>> (e.g., to match the Bloom filters on the left side of the join).
>>
>>        Thanks,
>>
>>              Boaz
>>
>>
>> On 5/30/18, 7:49 PM, "weijie tong" <to...@gmail.com> wrote:
>>
>>     Hi Aman:
>>
>>       Thanks for your tips. I have rebased the latest code from the master
>>     branch . Yes, the spill-to-disk feature does changed the original
>>     implementation. I have adjusted my implementation according to the new
>>     feature. But as you say, it will take some challenge to integration
>> as I
>>     noticed the spill-to-disk feature will continue to tune its
>> implementation
>>     performance.
>>
>>       The BloomFilter was implemented natively in Drill , not an external
>>     library. It's implemented the algorithm of the paper which was
>> mentioned by
>>     you.
>>
>>
>>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <am...@apache.org>
>> wrote:
>>
>>     > Hi Weijie,
>>     > I was hoping you could leverage the existing methods..so its good
>> that you
>>     > found the ones that work for your use case.
>>     > One thing I want to point out (maybe you're already aware) .. the
>> Hash Join
>>     > code has changed significantly in the master branch due to the
>>     > spill-to-disk feature.
>>     > So, this may pose some integration challenges for your run-time join
>>     > pushdown feature.
>>     > Also, one other question/clarification:  for the bloom filter
>> itself are
>>     > you implementing it natively in Drill or using an external library ?
>>     >
>>     > -Aman
>>     >
>>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
>> tongweijie178@gmail.com>
>>     > wrote:
>>     >
>>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>>     > unNestEvalBlock()
>>     > > which has the same effect to what I change to the ClassGenerator.
>> So I
>>     > give
>>     > > up what I change to the ClassGenerator and hope this can help
>> someone
>>     > else.
>>     > >
>>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
>> tongweijie178@gmail.com>
>>     > > wrote:
>>     > >
>>     > > > The code formatting is not nice. Put them again:
>>     > > >
>>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>     > > MappingSet
>>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
>> keyExprs,
>>     > > > TypedFieldId[] buildKeyFieldIds)
>>     > > > throws SchemaChangeException {
>>     > > > cg.setMappingSet(incomingMapping);
>>     > > > if (keyExprs == null || keyExprs.length == 0) {
>>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > > }
>>     > > > String seedValue = "seedValue";
>>     > > > String fieldId = "fieldId";
>>     > > > LogicalExpression seed =
>>     > > > ValueExpressions.getParameterExpression(seedValue,
>> Types.required(
>>     > > > TypeProtos.MinorType.INT));
>>     > > >
>>     > > > LogicalExpression fieldIdParamExpr =
>>     > > > ValueExpressions.getParameterExpression(fieldId, Types.required(
>>     > > > TypeProtos.MinorType.INT) );
>>     > > > HoldingContainer fieldIdParamHolder =
>> cg.addExpr(fieldIdParamExpr);
>>     > > > int i = 0;
>>     > > >  for (LogicalExpression expr : keyExprs) {
>>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp = new
>>     > > >
>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>     > > > ExpressionPosition.UNKNOWN);
>>     > > >
>>     > > >     JFieldRef targetBuildSideFieldId =
>>     > cg.addExpr(targetBuildFieldIdExp,
>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>     > > >     JBlock ifBlock =
>>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>>     > > eq(targetBuildSideFieldId))._then();
>>     > > >     //specify a special JBlock which is a inner one of the eval
>> block
>>     > to
>>     > > > the ClassGenerator to substitute the returned JBlock of
>> getEvalBlock()
>>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
>>     > > >     LogicalExpression hashExpression =
>>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
>> null);
>>     > > >     LogicalExpression materializedExpr =
>>     > > >
>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>>     > > batch,
>>     > > > context.getFunctionRegistry());
>>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
>>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>     > > >     ifBlock._return(hash.getValue());
>>     > > >     //reset the customized block to null ,so the getEvalBlock()
>> return
>>     > > the
>>     > > > truly eval JBlock
>>     > > >     cg.setCustomizedEvalInnerBlock(null);
>>     > > >     i++;
>>     > > >  }
>>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > > }
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
>> seedValue,
>>     > > int
>>     > > > fieldId)
>>     > > > throws SchemaChangeException
>>     > > > {
>>     > > > {
>>     > > > IntHolder fieldId12 = new IntHolder();
>>     > > > fieldId12 .value = fieldId;
>>     > > > if (fieldId12 .value == constant14 .value) {
>>     > > >    IntHolder out18 = new IntHolder();
>>     > > >    {
>>     > > >      out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>>     > > >    }
>>     > > >    IntHolder seedValue19 = new IntHolder();
>>     > > >    seedValue19 .value = seedValue;
>>     > > >    //---- start of eval portion of hash32AsDouble function.
>> ----//
>>     > > >    IntHolder out20 = new IntHolder();
>>     > > >   {
>>     > > >       final IntHolder out = new IntHolder();
>>     > > >       IntHolder in = out18;
>>     > > >       IntHolder seed = seedValue19;
>>     > > >
>>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
>>     > > >       out.value =
>>     > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>> in.value,
>>     > > > seed.value);
>>     > > >    }
>>     > > >
>>     > > >    out20 = out;
>>     > > > }
>>     > > > //---- end of eval portion of hash32AsDouble function. ----//
>>     > > > return out20 .value;
>>     > > > }
>>     > > > return 0;
>>     > > > }
>>     > > > }
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > >
>>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
>> tongweijie178@gmail.com>
>>     > > > wrote:
>>     > > >
>>     > > >> HI Paul:
>>     > > >>
>>     > > >>  Thanks for your enthusiasm. I have managed this skill as you
>> ever
>>     > > >> mentioned me at another mail thread. It's really helpful
>> ,thanks for
>>     > > your
>>     > > >> valuable work.
>>     > > >>
>>     > > >>   Now I have solved this tough problem by adding a customized
>> JBlock
>>     > > >> member field to the ClassGenerator. So once you want the
>>     > getEvalBlock()
>>     > > of
>>     > > >> the ClassGenerator to return a inner customized JBlock , then
>> you set
>>     > > this
>>     > > >> member, if you want the method to return eval self JBlock ,
>> you reset
>>     > > this
>>     > > >> member to null.
>>     > > >>
>>     > > >>   Here is my changed setup method :
>>     > > >>
>>     > > >>
>>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>     > > MappingSet incomingMapping, VectorAccessible batch,
>> LogicalExpression[]
>>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>>     > > >>   throws SchemaChangeException {
>>     > > >>   cg.setMappingSet(incomingMapping);
>>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
>>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > >>   }
>>     > > >>   String seedValue = "seedValue";
>>     > > >>   String fieldId = "fieldId";
>>     > > >>   LogicalExpression seed =
>>     > ValueExpressions.getParameterExpression(seedValue,
>>     > > Types.required(TypeProtos.MinorType.INT));
>>     > > >>
>>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
>>     > > getParameterExpression(fieldId, Types.required(
>> TypeProtos.MinorType.INT)
>>     > > );
>>     > > >>   HoldingContainer fieldIdParamHolder =
>> cg.addExpr(fieldIdParamExpr);
>>     > > >>   int i = 0;
>>     > > >>   for (LogicalExpression expr : keyExprs) {
>>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp = new
>>     > > ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>     > > ExpressionPosition.UNKNOWN);
>>     > > >>
>>     > > >>     JFieldRef targetBuildSideFieldId =
>>     > cg.addExpr(targetBuildFieldIdExp,
>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
>>     > > fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>>     > > >>     //specify a special JBlock which is a inner one of the
>> eval block
>>     > > to the ClassGenerator to substitute the returned JBlock of
>> getEvalBlock()
>>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
>>     > > >>     LogicalExpression hashExpression =
>>     > HashPrelUtil.getHashExpression(expr,
>>     > > seed, incomingProbe != null);
>>     > > >>     LogicalExpression materializedExpr =
>> ExpressionTreeMaterializer.
>>     > > materializeAndCheckErrors(hashExpression, batch,
>>     > > context.getFunctionRegistry());
>>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
>>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>     > > >>     ifBlock._return(hash.getValue());
>>     > > >>     //reset the customized block to null ,so the
>> getEvalBlock() return
>>     > > the truly eval JBlock
>>     > > >>     cg.setCustomizedEvalInnerBlock(null);
>>     > > >>     i++;
>>     > > >>   }
>>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > >> }
>>     > > >>
>>     > > >>
>>     > > >>  The corresponding generated codes :
>>     > > >>
>>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx, int
>>     > > seedValue, int fieldId)
>>     > > >>         throws SchemaChangeException
>>     > > >>     {
>>     > > >>         {
>>     > > >>             IntHolder fieldId12 = new IntHolder();
>>     > > >>             fieldId12 .value = fieldId;
>>     > > >>             if (fieldId12 .value == constant14 .value) {
>>     > > >>                 IntHolder out18 = new IntHolder();
>>     > > >>                 {
>>     > > >>                     out18 .value = vv15 .getAccessor().get((
>>     > > incomingRowIdx));
>>     > > >>                 }
>>     > > >>                 IntHolder seedValue19 = new IntHolder();
>>     > > >>                 seedValue19 .value = seedValue;
>>     > > >>                 //---- start of eval portion of hash32AsDouble
>>     > > function. ----//
>>     > > >>                 IntHolder out20 = new IntHolder();
>>     > > >>                 {
>>     > > >>                     final IntHolder out = new IntHolder();
>>     > > >>                     IntHolder in = out18;
>>     > > >>                     IntHolder seed = seedValue19;
>>     > > >>
>>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>>     > > >>     out.value =
>>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>     > > in.value, seed.value);
>>     > > >> }
>>     > > >>
>>     > > >>                     out20 = out;
>>     > > >>                 }
>>     > > >>                 //---- end of eval portion of hash32AsDouble
>> function.
>>     > > ----//
>>     > > >>                 return out20 .value;
>>     > > >>             }
>>     > > >>             return  0;
>>     > > >>         }
>>     > > >>     }
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>   Some other explanation:
>>     > > >>   1st : The if checking won't hurt the performance , as I
>> invoke this
>>     > > >> method column by column , so it's branch predication friendly.
>>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ，since the
>>     > efficient
>>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid the
>>     > conflict.
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >>
>>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
>>     > <par0328@yahoo.com.invalid
>>     > > >
>>     > > >> wrote:
>>     > > >>
>>     > > >>> Hi Weijie,
>>     > > >>>
>>     > > >>> Seeing the discussion about the details of JCodeModel
>> suggests you
>>     > may
>>     > > >>> be trying to debug your generated code at the level of the
>> code
>>     > > generator.
>>     > > >>>
>>     > > >>> Some time ago we added the ability to step through the
>> generated
>>     > code.
>>     > > >>> Look for the following line in the generator code:
>>     > > >>>
>>     > > >>>
>>     > > >>>     // Uncomment out this line to debug the generated code.
>>     > > >>>
>>     > > >>> //    cg.saveCodeForDebugging(true);
>>     > > >>>
>>     > > >>>
>>     > > >>> Uncomment the code line and Drill will save each generated
>> file to a
>>     > > >>> configured location (which, if I recall correctly, is
>>     > > /tmp/drill/codegen,
>>     > > >>> though it may have changed after Tim's test directory
>> changes.)
>>     > > >>>
>>     > > >>> Then, set a breakpoint in the template setup() method and you
>> can
>>     > step
>>     > > >>> directly into the generated doSetup() method. Same for the
>> eval()
>>     > > method.
>>     > > >>>
>>     > > >>> This way, you can not only see the generated code, you can
>> step
>>     > through
>>     > > >>> it. I've found this to be a far easier way to understand the
>>     > generated
>>     > > code
>>     > > >>> than the older techniques folks have used (look at byte
>> codes, use
>>     > > print
>>     > > >>> statements, brute force reasoning, etc.)
>>     > > >>>
>>     > > >>> Tim, Boaz and others have used this technique more recently
>> and can
>>     > > >>> probably give you additional pointers.
>>     > > >>>
>>     > > >>> Thanks,
>>     > > >>> - Paul
>>     > > >>>
>>     > > >>>
>>     > > >>>
>>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
>>     > > >>> tongweijie178@gmail.com> wrote:
>>     > > >>>
>>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you need an
>>     > _else()
>>     > > >>> block
>>     > > >>> also ?"  I give a default return logic at the method, so I
>> don't need
>>     > > the
>>     > > >>> _else() block.  I have noticed the IfExpression's evaluation
>> method
>>     > at
>>     > > >>> EvaluationVisitor which also uses the JConditional . But that
>> also
>>     > > >>> doesn't
>>     > > >>> match my requirement. I think the key point here is the
>>     > > >>> FunctionHolderExpression and ValueVectorReadExpression will
>> put their
>>     > > >>> corresponding generated codes to the eval method's JBlock ,
>> not our
>>     > > >>> specific IfBlock which is a inner block of the eval method's
>> JBlock .
>>     > > >>>
>>     > > >>> So it seems I should make some changes to the ClassGenerator
>> to let
>>     > the
>>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
>> JConditional's
>>     > > then
>>     > > >>> block) or implement some special FunctionHolderExpression
>>     > > >>> 、ValueVectorReadExpression and corresponding visiting methods
>> at the
>>     > > >>> EvaluationVisitor to generate the special code blocks. Hope
>> someone
>>     > who
>>     > > >>> are
>>     > > >>> familiar with these part of codes to point out whether there
>> are more
>>     > > >>> easy
>>     > > >>> or different choices to achieve the target.
>>     > > >>>
>>     > > >>> To make discussion more accurate, I put the generated codes
>> of the
>>     > > >>> previous
>>     > > >>> setupGetBuild64Hash method here:
>>     > > >>>
>>     > > >>>     public long getBuild64HashCodeInner(int incomingRowIdx,
>> int
>>     > > >>> seedValue, int fieldId)
>>     > > >>>         throws SchemaChangeException
>>     > > >>>     {
>>     > > >>>         {
>>     > > >>>             IntHolder fieldId16 = new IntHolder();
>>     > > >>>             fieldId16 .value = fieldId;
>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>     > > >>>                 return out24 .value;
>>     > > >>>             }
>>     > > >>>             IntHolder out22 = new IntHolder();
>>     > > >>>             {
>>     > > >>>                 out22 .value = vv19 .getAccessor().get((
>>     > > incomingRowIdx));
>>     > > >>>             }
>>     > > >>>             IntHolder seedValue23 = new IntHolder();
>>     > > >>>             seedValue23 .value = seedValue;
>>     > > >>>             //---- start of eval portion of hash32AsDouble
>> function.
>>     > > >>> ----//
>>     > > >>>             IntHolder out24 = new IntHolder();
>>     > > >>>             {
>>     > > >>>                 final IntHolder out = new IntHolder();
>>     > > >>>                 IntHolder in = out22;
>>     > > >>>                 IntHolder seed = seedValue23;
>>     > > >>>
>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>     > > >>>     out.value =
>>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>     > > >>> in.value, seed.value);
>>     > > >>> }
>>     > > >>>
>>     > > >>>                 out24 = out;
>>     > > >>>             }
>>     > > >>>             //---- end of eval portion of hash32AsDouble
>> function.
>>     > > ----//
>>     > > >>>             if (fieldId16 .value == constant18 .value) {
>>     > > >>>                 return out26 .value;
>>     > > >>>             }
>>     > > >>>             IntHolder seedValue25 = new IntHolder();
>>     > > >>>             seedValue25 .value = seedValue;
>>     > > >>>             //---- start of eval portion of hash32AsDouble
>> function.
>>     > > >>> ----//
>>     > > >>>             IntHolder out26 = new IntHolder();
>>     > > >>>             {
>>     > > >>>                 final IntHolder out = new IntHolder();
>>     > > >>>                 IntHolder in = out22;
>>     > > >>>                 IntHolder seed = seedValue25;
>>     > > >>>
>>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>     > > >>>     out.value =
>>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>     > > >>> in.value, seed.value);
>>     > > >>> }
>>     > > >>>
>>     > > >>>                 out26 = out;
>>     > > >>>             }
>>     > > >>>             //---- end of eval portion of hash32AsDouble
>> function.
>>     > > ----//
>>     > > >>>             return  0;
>>     > > >>>         }
>>     > > >>>     }
>>     > > >>>
>>     > > >>>
>>     > > >>>
>>     > > >>>
>>     > > >>>
>>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
>> amansinha@apache.org>
>>     > > >>> wrote:
>>     > > >>>
>>     > > >>> > sorry, the previous email is incomplete.
>>     > > >>> > For the ifBlock, do you need an _else() block also ?
>>     > > >>> >
>>     > > >>> > I have sometimes found that 'JConditional' is a good way to
>> break
>>     > > down
>>     > > >>> the
>>     > > >>> > logic further.  Please see example usages of JConditional
>> here [1].
>>     > > >>> >
>>     > > >>> > -Aman
>>     > > >>> >
>>     > > >>> > [1]
>>     > > >>> >
>>     > > >>> >
>>     > > >>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
>> .
>>     > > sun.codemodel.JBlock
>>     > > >>> >
>>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
>> amansinha@apache.org>
>>     > > >>> wrote:
>>     > > >>> >
>>     > > >>> > > Hi Weijie,
>>     > > >>> > > It would be a little cumbersome to debug such issues over
>> email
>>     > > >>> since one
>>     > > >>> > > has to look at the generated code output and iteratively
>> debug.
>>     > > >>> > > Couple of thoughts I have that might help:
>>     > > >>> > >
>>     > > >>> > > For this particular if-then block, should you also
>>     > > >>> > > JBlock ifBlock =
>>     > > >>> > >
>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>     > > >>> > > tBuildSideFieldId))._then();
>>     > > >>> > >
>>     > > >>> > >
>>     > > >>> > >
>>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
>>     > > >>> tongweijie178@gmail.com>
>>     > > >>> > > wrote:
>>     > > >>> > >
>>     > > >>> > >> HI All:
>>     > > >>> > >>  Through implementing the JPPD feature (
>>     > > >>> > >>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
>> , I was
>>     > blocked
>>     > > >>> by
>>     > > >>> > the
>>     > > >>> > >> problem: how to get the hash code of each build side of
>> the hash
>>     > > >>> join
>>     > > >>> > >> columns through the dynamic generated java code. Hope
>> someone
>>     > can
>>     > > >>> give
>>     > > >>> > >> some
>>     > > >>> > >> advice.
>>     > > >>> > >>
>>     > > >>> > >>    I supposed to add methods as below to the
>> HashTableTemplate :
>>     > > >>> > >>
>>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx, int
>>     > seedValue,
>>     > > >>> int
>>     > > >>> > >> fieldId) throws SchemaChangeException{
>>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
>> seedValue,
>>     > > >>> fieldId);
>>     > > >>> > >> }
>>     > > >>> > >>
>>     > > >>> > >> protected abstract long
>>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx") int
>>     > > incomingRowIdx,
>>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId") int
>>     > fieldId)
>>     > > >>> > >> throws SchemaChangeException;
>>     > > >>> > >>
>>     > > >>> > >>
>>     > > >>> > >>    The high level code to invoke the getBuild64HashCode
>> method
>>     > is
>>     > > >>> at the
>>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
>>     > > >>> > >>
>>     > > >>> > >> //create runtime filter
>>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter) {
>>     > > >>> > >>  //create runtime filter and send out async
>>     > > >>> > >>  int condFieldIndex = 0;
>>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters) {
>>     > > >>> > >>    //VV
>>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount; ind++) {
>>     > > >>> > >>      long hashCode =
>> partitions[0].getBuild64HashCode(ind,
>>     > > >>> > >> condFieldIndex);
>>     > > >>> > >>      bloomFilter.insert(hashCode);
>>     > > >>> > >>    }
>>     > > >>> > >>    condFieldIndex++;
>>     > > >>> > >>  }
>>     > > >>> > >>  //TODO sered out async
>>     > > >>> > >> }
>>     > > >>> > >>
>>     > > >>> > >>
>>     > > >>> > >>  As you know, the abstract method
>> getBuild64HashCodeInner needs
>>     > to
>>     > > >>> > >> calculate the hash codes of each build side column by the
>>     > fieldId
>>     > > >>> input
>>     > > >>> > >> parameter. In order to achieve this target, I plan to
>> have
>>     > > different
>>     > > >>> > >> solving parts corresponding to different column
>> ValueVector ,
>>     > > using
>>     > > >>> the
>>     > > >>> > if
>>     > > >>> > >> statement to distinguish different solving parts through
>> the id
>>     > of
>>     > > >>> the
>>     > > >>> > >> column.  The corresponding method to generate the
>> dynamic codes
>>     > > is
>>     > > >>> as
>>     > > >>> > >> below:
>>     > > >>> > >>
>>     > > >>> > >> private void
>> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>     > > >>> > >> MappingSet incomingMapping, VectorAccessible batch,
>>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
>> buildKeyFieldIds)
>>     > > >>> > >>  throws SchemaChangeException {
>>     > > >>> > >>  cg.setMappingSet(incomingMapping);
>>     > > >>> > >>  if (keyExprs == null || keyExprs.length == 0) {
>>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > >>> > >>  }
>>     > > >>> > >>  String seedValue = "seedValue";
>>     > > >>> > >>  String fieldId = "fieldId";
>>     > > >>> > >>  LogicalExpression seed =
>>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>>     > > >>> > >>
>>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
>>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
>>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
>>     > > cg.addExpr(fieldIdParamExpr);
>>     > > >>> > >>  int i = 0;
>>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
>>     > > >>> > >>    TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>     > > >>> > >>    ValueExpressions.IntExpression targetBuildFieldIdExp
>> = new
>>     > > >>> > >>
>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>>     > > )[0],
>>     > > >>> > >> ExpressionPosition.UNKNOWN);
>>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
>>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>>     > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>     > > >>> > >>    JBlock ifBlock =
>>     > > >>> > >>
>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>     > > >>> > >> tBuildSideFieldId))._then();
>>     > > >>> > >>
>>     > > >>> > >>    LogicalExpression hashExpression =
>>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed, incomingProbe
>> !=
>>     > > null);
>>     > > >>> > >>    LogicalExpression materializedExpr =
>>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>>     > > hashExpression,
>>     > > >>> > >> batch, context.getFunctionRegistry());
>>     > > >>> > >>    HoldingContainer hash = cg.addExpr(materializedExpr,
>>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>>     > > >>> > >>
>>     > > >>> > >>
>>     > > >>> > >>    ifBlock._return(hash.getValue());
>>     > > >>> > >>    i++;
>>     > > >>> > >>  }
>>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
>>     > > >>> > >>
>>     > > >>> > >> }
>>     > > >>> > >>
>>     > > >>> > >> But unfortunately, the generated codes are not what I
>> expected.
>>     > > The
>>     > > >>> > codes
>>     > > >>> > >> to read ValueVector , calculate hash code of the read
>> value do
>>     > not
>>     > > >>> stay
>>     > > >>> > in
>>     > > >>> > >> the if block.  So how can I let the related codes stay
>> in the if
>>     > > >>> block ?
>>     > > >>> > >>
>>     > > >>> > >
>>     > > >>> > >
>>     > > >>> >
>>     > > >>
>>     > > >>
>>     > >
>>     >
>>
>>
>>