You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Taewoo Kim (JIRA)" <ji...@apache.org> on 2016/11/25 21:35:58 UTC

[jira] [Commented] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash Join are not being used.

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696727#comment-15696727 ] 

Taewoo Kim commented on ASTERIXDB-1736:
---------------------------------------

Discussed this issue with [~buyingyi].

His point: Having a small dataset can't cover all cases of the optimized hybrid hash join. Many times, you need large datasets to cover different paths. For example, the role reversal optimization: small dataset won’t cover that code path. When we were not sure which part is wrong, we replaced the join operator and tested. He has done that a few times, for locating bugs. Also, two implementations allow us to easily diagnose bugs. similar to sort and group-by cases. We have quick sort v.s. merge sort,  sort group by vs. hash group by.

Based on his thought, I think we can keep one hash join between hybrid and grace and remove the other, at least.

> Grace Hash Join and Hybrid Hash Join are not being used.
> --------------------------------------------------------
>
>                 Key: ASTERIXDB-1736
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>
> As the title says, Grace Hash Join and Hybrid Hash Join are not being used. I suggest that we remove these two join methods. Here are my findings for these two joins. 
> 1) Grace Hash Join
> GraceHashJoinOperatorDescriptor is only called from two places: org.apache.hyracks.examples.tpch.client.join and TPCHCustomerOrderHashJoinTest.
> One is a Hyracks example (tpch.client) and the other is a unit test. This join is not used currently (not chosen during the compilation).
> 2) Hybrid Hash Join
> During the compilation, the optimizer decides whether it will use Hybrid Hash Join or Optimized Hybrid Hash Join. 
> If the hash function family for each key variable is set, then we use the optimized hybrid hash join. 
> If not, we use the hybrid hash join. However, in fact, this path - hybrid hash join path will never be chosen. Let's check the code. 
> {code:title=HybridHashJoinPOperator.java|borderStyle=solid}	
>         IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper.variablesToBinaryHashFunctionFamilies(keysLeftBranch,
>                 env, context);
>                 
>         ...
>         
>         boolean optimizedHashJoin = true;
>         for (IBinaryHashFunctionFamily family : hashFunFamilies) {
>             if (family == null) {
>                 optimizedHashJoin = false;
>                 break;
>             }
>         }
>         if (optimizedHashJoin) {
>             opDesc = generateOptimizedHashJoinRuntime(context, inputSchemas, keysLeft, keysRight, hashFunFamilies,
>                     comparatorFactories, predEvaluatorFactory, recDescriptor, spec);
>         } else {
>             opDesc = generateHashJoinRuntime(context, inputSchemas, keysLeft, keysRight, hashFunFactories,
>                     comparatorFactories, predEvaluatorFactory, recDescriptor, spec);
>         }
> {code}
>         
> As we can see, optimizedHashJoin is set to false only when the hash family is null. 
> Then, how do we assign the hashfamily for each key variable?		
> {code:title=JobGenHelper.java|borderStyle=solid}
>     public static IBinaryHashFunctionFamily[] variablesToBinaryHashFunctionFamilies(
>             Collection<LogicalVariable> varLogical, IVariableTypeEnvironment env, JobGenContext context)
>                     throws AlgebricksException {
>         IBinaryHashFunctionFamily[] funFamilies = new IBinaryHashFunctionFamily[varLogical.size()];
>         int i = 0;
>         IBinaryHashFunctionFamilyProvider bhffProvider = context.getBinaryHashFunctionFamilyProvider();
>         for (LogicalVariable var : varLogical) {
>             Object type = env.getVarType(var);
>             funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(type);
>         }
>         return funFamilies;
>     }
> {code}
> For each variable type, we try to get hash function family. In the current codebase, AqlBinaryHashFunctionFamilyProvider is the only class that implements IBinaryHashFunctionFamilyProvider.
> And for any type, it returns AMurmurHash3BinaryHashFunctionFamily. 
> So, there is no way that the hash function family is null.
> {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
> public class AqlBinaryHashFunctionFamilyProvider implements IBinaryHashFunctionFamilyProvider, Serializable {
>     private static final long serialVersionUID = 1L;
>     public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new AqlBinaryHashFunctionFamilyProvider();
>     private AqlBinaryHashFunctionFamilyProvider() {
>     }
>     @Override
>     public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object type) throws AlgebricksException {
>         // AMurmurHash3BinaryHashFunctionFamily converts numeric type to double type before doing hash()
>         return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
>     }
> }
> {code}
>  
>     



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)