You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Chen Feng (Jira)" <ji...@apache.org> on 2019/09/24 10:53:00 UTC

[jira] [Created] (HBASE-23068) Improve performance of InListExpression.hashCode

Chen Feng created HBASE-23068:
---------------------------------

             Summary: Improve performance of InListExpression.hashCode
                 Key: HBASE-23068
                 URL: https://issues.apache.org/jira/browse/HBASE-23068
             Project: HBase
          Issue Type: Improvement
            Reporter: Chen Feng
            Assignee: Chen Feng


In WhereOptimizer.pushKeyExpressionsToScan(), has a line of code: "extractNodes.addAll(nodesToExtract)" When executing sqls like "select * from ... where A in (a1, a2, ..., a_n) and B = X", saying A in N (N > 100,000) elements, previous code execution will slow (> 90s in our environment).

This is because in such case, extractNodes is a HashSet, nodesToExtract is a List with N InListExpression (the N InListExpressions are the same instance), each InListExpression.values has N elements as well.

HashSet.addAll(list<InListExpression>) will call N times of InListExpression.hashCode(). Each time, InListExpression.hashCode() will calculate hashCode for every value. Therefore, the time complexity will be N^2.

A simple way to solve it is to remember of the hashCode of InListExpression and returns it directly if calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)