You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Taewoo Kim (JIRA)" <ji...@apache.org> on 2016/11/25 21:35:58 UTC
[jira] [Commented] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash
Join are not being used.
[ https://issues.apache.org/jira/browse/ASTERIXDB-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696727#comment-15696727 ]
Taewoo Kim commented on ASTERIXDB-1736:
---------------------------------------
Discussed this issue with [~buyingyi].
His point: Having a small dataset can't cover all cases of the optimized hybrid hash join. Many times, you need large datasets to cover different paths. For example, the role reversal optimization: small dataset won’t cover that code path. When we were not sure which part is wrong, we replaced the join operator and tested. He has done that a few times, for locating bugs. Also, two implementations allow us to easily diagnose bugs. similar to sort and group-by cases. We have quick sort v.s. merge sort, sort group by vs. hash group by.
Based on his thought, I think we can keep one hash join between hybrid and grace and remove the other, at least.
> Grace Hash Join and Hybrid Hash Join are not being used.
> --------------------------------------------------------
>
> Key: ASTERIXDB-1736
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736
> Project: Apache AsterixDB
> Issue Type: Improvement
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
>
> As the title says, Grace Hash Join and Hybrid Hash Join are not being used. I suggest that we remove these two join methods. Here are my findings for these two joins.
> 1) Grace Hash Join
> GraceHashJoinOperatorDescriptor is only called from two places: org.apache.hyracks.examples.tpch.client.join and TPCHCustomerOrderHashJoinTest.
> One is a Hyracks example (tpch.client) and the other is a unit test. This join is not used currently (not chosen during the compilation).
> 2) Hybrid Hash Join
> During the compilation, the optimizer decides whether it will use Hybrid Hash Join or Optimized Hybrid Hash Join.
> If the hash function family for each key variable is set, then we use the optimized hybrid hash join.
> If not, we use the hybrid hash join. However, in fact, this path - hybrid hash join path will never be chosen. Let's check the code.
> {code:title=HybridHashJoinPOperator.java|borderStyle=solid}
> IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper.variablesToBinaryHashFunctionFamilies(keysLeftBranch,
> env, context);
>
> ...
>
> boolean optimizedHashJoin = true;
> for (IBinaryHashFunctionFamily family : hashFunFamilies) {
> if (family == null) {
> optimizedHashJoin = false;
> break;
> }
> }
> if (optimizedHashJoin) {
> opDesc = generateOptimizedHashJoinRuntime(context, inputSchemas, keysLeft, keysRight, hashFunFamilies,
> comparatorFactories, predEvaluatorFactory, recDescriptor, spec);
> } else {
> opDesc = generateHashJoinRuntime(context, inputSchemas, keysLeft, keysRight, hashFunFactories,
> comparatorFactories, predEvaluatorFactory, recDescriptor, spec);
> }
> {code}
>
> As we can see, optimizedHashJoin is set to false only when the hash family is null.
> Then, how do we assign the hashfamily for each key variable?
> {code:title=JobGenHelper.java|borderStyle=solid}
> public static IBinaryHashFunctionFamily[] variablesToBinaryHashFunctionFamilies(
> Collection<LogicalVariable> varLogical, IVariableTypeEnvironment env, JobGenContext context)
> throws AlgebricksException {
> IBinaryHashFunctionFamily[] funFamilies = new IBinaryHashFunctionFamily[varLogical.size()];
> int i = 0;
> IBinaryHashFunctionFamilyProvider bhffProvider = context.getBinaryHashFunctionFamilyProvider();
> for (LogicalVariable var : varLogical) {
> Object type = env.getVarType(var);
> funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(type);
> }
> return funFamilies;
> }
> {code}
> For each variable type, we try to get hash function family. In the current codebase, AqlBinaryHashFunctionFamilyProvider is the only class that implements IBinaryHashFunctionFamilyProvider.
> And for any type, it returns AMurmurHash3BinaryHashFunctionFamily.
> So, there is no way that the hash function family is null.
> {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
> public class AqlBinaryHashFunctionFamilyProvider implements IBinaryHashFunctionFamilyProvider, Serializable {
> private static final long serialVersionUID = 1L;
> public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new AqlBinaryHashFunctionFamilyProvider();
> private AqlBinaryHashFunctionFamilyProvider() {
> }
> @Override
> public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object type) throws AlgebricksException {
> // AMurmurHash3BinaryHashFunctionFamily converts numeric type to double type before doing hash()
> return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
> }
> }
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)