You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Taewoo Kim <wa...@gmail.com> on 2016/11/19 20:01:15 UTC

Fwd: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash Join are not being used.

Hi all,

Please share your thought on this issue. In short, Grace Hash Join and
Hybrid Hash Join are not being used. We only use Optimized Hybrid Hash
Join. Therefore, I think it would be better to remove them.
https://issues.apache.org/jira/browse/ASTERIXDB-1736
<https://issues.apache.org/jira/browse/ASTERIXDB-1736>
---------- Forwarded message ----------
From: Taewoo Kim (JIRA) <ji...@apache.org>
Date: Fri, Nov 18, 2016 at 5:06 PM
Subject: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash
Join are not being used.
To: notifications@asterixdb.incubator.apache.org


Taewoo Kim created ASTERIXDB-1736:
-------------------------------------

             Summary: Grace Hash Join and Hybrid Hash Join are not being
used.
                 Key: ASTERIXDB-1736
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736
             Project: Apache AsterixDB
          Issue Type: Improvement
            Reporter: Taewoo Kim
            Assignee: Taewoo Kim


As the title says, Grace Hash Join and Hybrid Hash Join are not being used.
I suggest that we remove these two join methods. Here are my findings for
these two joins.

1) Grace Hash Join
GraceHashJoinOperatorDescriptor is only called from two places:
org.apache.hyracks.examples.tpch.client.join and
TPCHCustomerOrderHashJoinTest.
One is a Hyracks example (tpch.client) and the other is a unit test. This
join is not used currently (not chosen during the compilation).

2) Hybrid Hash Join
During the compilation, the optimizer decides whether it will use Hybrid
Hash Join or Optimized Hybrid Hash Join.
If the hash function family for each key variable is set, then we use the
optimized hybrid hash join.
If not, we use the hybrid hash join. However, in fact, this path - hybrid
hash join path will never be chosen. Let's check the code.

{code:title=HybridHashJoinPOperator.java|borderStyle=solid}
        IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper.
variablesToBinaryHashFunctionFamilies(keysLeftBranch,
                env, context);

        ...

        boolean optimizedHashJoin = true;
        for (IBinaryHashFunctionFamily family : hashFunFamilies) {
            if (family == null) {
                optimizedHashJoin = false;
                break;
            }
        }

        if (optimizedHashJoin) {
            opDesc = generateOptimizedHashJoinRuntime(context,
inputSchemas, keysLeft, keysRight, hashFunFamilies,
                    comparatorFactories, predEvaluatorFactory,
recDescriptor, spec);
        } else {
            opDesc = generateHashJoinRuntime(context, inputSchemas,
keysLeft, keysRight, hashFunFactories,
                    comparatorFactories, predEvaluatorFactory,
recDescriptor, spec);
        }
{code}

As we can see, optimizedHashJoin is set to false only when the hash family
is null.
Then, how do we assign the hashfamily for each key variable?

{code:title=JobGenHelper.java|borderStyle=solid}
    public static IBinaryHashFunctionFamily[] variablesToBinaryHashFunctionF
amilies(
            Collection<LogicalVariable> varLogical,
IVariableTypeEnvironment env, JobGenContext context)
                    throws AlgebricksException {
        IBinaryHashFunctionFamily[] funFamilies = new
IBinaryHashFunctionFamily[varLogical.size()];
        int i = 0;
        IBinaryHashFunctionFamilyProvider bhffProvider = context.
getBinaryHashFunctionFamilyProvider();
        for (LogicalVariable var : varLogical) {
            Object type = env.getVarType(var);
            funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(
type);
        }
        return funFamilies;
    }
{code}

For each variable type, we try to get hash function family. In the current
codebase, AqlBinaryHashFunctionFamilyProvider is the only class that
implements IBinaryHashFunctionFamilyProvider.
And for any type, it returns AMurmurHash3BinaryHashFunctionFamily.
So, there is no way that the hash function family is null.

{code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
public class AqlBinaryHashFunctionFamilyProvider implements
IBinaryHashFunctionFamilyProvider, Serializable {

    private static final long serialVersionUID = 1L;
    public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new
AqlBinaryHashFunctionFamilyProvider();

    private AqlBinaryHashFunctionFamilyProvider() {

    }

    @Override
    public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object
type) throws AlgebricksException {
        // AMurmurHash3BinaryHashFunctionFamily converts numeric type to
double type before doing hash()
        return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
    }

}
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Fwd: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash Join are not being used.

Posted by Chen Li <ch...@gmail.com>.
+1

On Sat, Nov 19, 2016 at 12:32 PM, Mike Carey <dt...@gmail.com> wrote:

> +1 for removal of now-defunct operations!
>
>
>
> On 11/19/16 12:01 PM, Taewoo Kim wrote:
>
>> Hi all,
>>
>> Please share your thought on this issue. In short, Grace Hash Join and
>> Hybrid Hash Join are not being used. We only use Optimized Hybrid Hash
>> Join. Therefore, I think it would be better to remove them.
>> https://issues.apache.org/jira/browse/ASTERIXDB-1736
>> <https://issues.apache.org/jira/browse/ASTERIXDB-1736>
>> ---------- Forwarded message ----------
>> From: Taewoo Kim (JIRA) <ji...@apache.org>
>> Date: Fri, Nov 18, 2016 at 5:06 PM
>> Subject: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash
>> Join are not being used.
>> To: notifications@asterixdb.incubator.apache.org
>>
>>
>> Taewoo Kim created ASTERIXDB-1736:
>> -------------------------------------
>>
>>               Summary: Grace Hash Join and Hybrid Hash Join are not being
>> used.
>>                   Key: ASTERIXDB-1736
>>                   URL: https://issues.apache.org/jira
>> /browse/ASTERIXDB-1736
>>               Project: Apache AsterixDB
>>            Issue Type: Improvement
>>              Reporter: Taewoo Kim
>>              Assignee: Taewoo Kim
>>
>>
>> As the title says, Grace Hash Join and Hybrid Hash Join are not being
>> used.
>> I suggest that we remove these two join methods. Here are my findings for
>> these two joins.
>>
>> 1) Grace Hash Join
>> GraceHashJoinOperatorDescriptor is only called from two places:
>> org.apache.hyracks.examples.tpch.client.join and
>> TPCHCustomerOrderHashJoinTest.
>> One is a Hyracks example (tpch.client) and the other is a unit test. This
>> join is not used currently (not chosen during the compilation).
>>
>> 2) Hybrid Hash Join
>> During the compilation, the optimizer decides whether it will use Hybrid
>> Hash Join or Optimized Hybrid Hash Join.
>> If the hash function family for each key variable is set, then we use the
>> optimized hybrid hash join.
>> If not, we use the hybrid hash join. However, in fact, this path - hybrid
>> hash join path will never be chosen. Let's check the code.
>>
>> {code:title=HybridHashJoinPOperator.java|borderStyle=solid}
>>          IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper.
>> variablesToBinaryHashFunctionFamilies(keysLeftBranch,
>>                  env, context);
>>
>>          ...
>>
>>          boolean optimizedHashJoin = true;
>>          for (IBinaryHashFunctionFamily family : hashFunFamilies) {
>>              if (family == null) {
>>                  optimizedHashJoin = false;
>>                  break;
>>              }
>>          }
>>
>>          if (optimizedHashJoin) {
>>              opDesc = generateOptimizedHashJoinRuntime(context,
>> inputSchemas, keysLeft, keysRight, hashFunFamilies,
>>                      comparatorFactories, predEvaluatorFactory,
>> recDescriptor, spec);
>>          } else {
>>              opDesc = generateHashJoinRuntime(context, inputSchemas,
>> keysLeft, keysRight, hashFunFactories,
>>                      comparatorFactories, predEvaluatorFactory,
>> recDescriptor, spec);
>>          }
>> {code}
>>
>> As we can see, optimizedHashJoin is set to false only when the hash family
>> is null.
>> Then, how do we assign the hashfamily for each key variable?
>>
>> {code:title=JobGenHelper.java|borderStyle=solid}
>>      public static IBinaryHashFunctionFamily[]
>> variablesToBinaryHashFunctionF
>> amilies(
>>              Collection<LogicalVariable> varLogical,
>> IVariableTypeEnvironment env, JobGenContext context)
>>                      throws AlgebricksException {
>>          IBinaryHashFunctionFamily[] funFamilies = new
>> IBinaryHashFunctionFamily[varLogical.size()];
>>          int i = 0;
>>          IBinaryHashFunctionFamilyProvider bhffProvider = context.
>> getBinaryHashFunctionFamilyProvider();
>>          for (LogicalVariable var : varLogical) {
>>              Object type = env.getVarType(var);
>>              funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(
>> type);
>>          }
>>          return funFamilies;
>>      }
>> {code}
>>
>> For each variable type, we try to get hash function family. In the current
>> codebase, AqlBinaryHashFunctionFamilyProvider is the only class that
>> implements IBinaryHashFunctionFamilyProvider.
>> And for any type, it returns AMurmurHash3BinaryHashFunctionFamily.
>> So, there is no way that the hash function family is null.
>>
>> {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
>> public class AqlBinaryHashFunctionFamilyProvider implements
>> IBinaryHashFunctionFamilyProvider, Serializable {
>>
>>      private static final long serialVersionUID = 1L;
>>      public static final AqlBinaryHashFunctionFamilyProvider INSTANCE =
>> new
>> AqlBinaryHashFunctionFamilyProvider();
>>
>>      private AqlBinaryHashFunctionFamilyProvider() {
>>
>>      }
>>
>>      @Override
>>      public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object
>> type) throws AlgebricksException {
>>          // AMurmurHash3BinaryHashFunctionFamily converts numeric type to
>> double type before doing hash()
>>          return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
>>      }
>>
>> }
>> {code}
>>
>>
>>
>>
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>>
>>
>

Re: Fwd: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash Join are not being used.

Posted by Mike Carey <dt...@gmail.com>.
+1 for removal of now-defunct operations!


On 11/19/16 12:01 PM, Taewoo Kim wrote:
> Hi all,
>
> Please share your thought on this issue. In short, Grace Hash Join and
> Hybrid Hash Join are not being used. We only use Optimized Hybrid Hash
> Join. Therefore, I think it would be better to remove them.
> https://issues.apache.org/jira/browse/ASTERIXDB-1736
> <https://issues.apache.org/jira/browse/ASTERIXDB-1736>
> ---------- Forwarded message ----------
> From: Taewoo Kim (JIRA) <ji...@apache.org>
> Date: Fri, Nov 18, 2016 at 5:06 PM
> Subject: [jira] [Created] (ASTERIXDB-1736) Grace Hash Join and Hybrid Hash
> Join are not being used.
> To: notifications@asterixdb.incubator.apache.org
>
>
> Taewoo Kim created ASTERIXDB-1736:
> -------------------------------------
>
>               Summary: Grace Hash Join and Hybrid Hash Join are not being
> used.
>                   Key: ASTERIXDB-1736
>                   URL: https://issues.apache.org/jira/browse/ASTERIXDB-1736
>               Project: Apache AsterixDB
>            Issue Type: Improvement
>              Reporter: Taewoo Kim
>              Assignee: Taewoo Kim
>
>
> As the title says, Grace Hash Join and Hybrid Hash Join are not being used.
> I suggest that we remove these two join methods. Here are my findings for
> these two joins.
>
> 1) Grace Hash Join
> GraceHashJoinOperatorDescriptor is only called from two places:
> org.apache.hyracks.examples.tpch.client.join and
> TPCHCustomerOrderHashJoinTest.
> One is a Hyracks example (tpch.client) and the other is a unit test. This
> join is not used currently (not chosen during the compilation).
>
> 2) Hybrid Hash Join
> During the compilation, the optimizer decides whether it will use Hybrid
> Hash Join or Optimized Hybrid Hash Join.
> If the hash function family for each key variable is set, then we use the
> optimized hybrid hash join.
> If not, we use the hybrid hash join. However, in fact, this path - hybrid
> hash join path will never be chosen. Let's check the code.
>
> {code:title=HybridHashJoinPOperator.java|borderStyle=solid}
>          IBinaryHashFunctionFamily[] hashFunFamilies = JobGenHelper.
> variablesToBinaryHashFunctionFamilies(keysLeftBranch,
>                  env, context);
>
>          ...
>
>          boolean optimizedHashJoin = true;
>          for (IBinaryHashFunctionFamily family : hashFunFamilies) {
>              if (family == null) {
>                  optimizedHashJoin = false;
>                  break;
>              }
>          }
>
>          if (optimizedHashJoin) {
>              opDesc = generateOptimizedHashJoinRuntime(context,
> inputSchemas, keysLeft, keysRight, hashFunFamilies,
>                      comparatorFactories, predEvaluatorFactory,
> recDescriptor, spec);
>          } else {
>              opDesc = generateHashJoinRuntime(context, inputSchemas,
> keysLeft, keysRight, hashFunFactories,
>                      comparatorFactories, predEvaluatorFactory,
> recDescriptor, spec);
>          }
> {code}
>
> As we can see, optimizedHashJoin is set to false only when the hash family
> is null.
> Then, how do we assign the hashfamily for each key variable?
>
> {code:title=JobGenHelper.java|borderStyle=solid}
>      public static IBinaryHashFunctionFamily[] variablesToBinaryHashFunctionF
> amilies(
>              Collection<LogicalVariable> varLogical,
> IVariableTypeEnvironment env, JobGenContext context)
>                      throws AlgebricksException {
>          IBinaryHashFunctionFamily[] funFamilies = new
> IBinaryHashFunctionFamily[varLogical.size()];
>          int i = 0;
>          IBinaryHashFunctionFamilyProvider bhffProvider = context.
> getBinaryHashFunctionFamilyProvider();
>          for (LogicalVariable var : varLogical) {
>              Object type = env.getVarType(var);
>              funFamilies[i++] = bhffProvider.getBinaryHashFunctionFamily(
> type);
>          }
>          return funFamilies;
>      }
> {code}
>
> For each variable type, we try to get hash function family. In the current
> codebase, AqlBinaryHashFunctionFamilyProvider is the only class that
> implements IBinaryHashFunctionFamilyProvider.
> And for any type, it returns AMurmurHash3BinaryHashFunctionFamily.
> So, there is no way that the hash function family is null.
>
> {code:title= AqlBinaryHashFunctionFamilyProvider.java|borderStyle=solid}
> public class AqlBinaryHashFunctionFamilyProvider implements
> IBinaryHashFunctionFamilyProvider, Serializable {
>
>      private static final long serialVersionUID = 1L;
>      public static final AqlBinaryHashFunctionFamilyProvider INSTANCE = new
> AqlBinaryHashFunctionFamilyProvider();
>
>      private AqlBinaryHashFunctionFamilyProvider() {
>
>      }
>
>      @Override
>      public IBinaryHashFunctionFamily getBinaryHashFunctionFamily(Object
> type) throws AlgebricksException {
>          // AMurmurHash3BinaryHashFunctionFamily converts numeric type to
> double type before doing hash()
>          return AMurmurHash3BinaryHashFunctionFamily.INSTANCE;
>      }
>
> }
> {code}
>
>
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>