You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/01 06:16:00 UTC

[jira] [Commented] (PHOENIX-6752) Duplicate expression nodes in extract nodes during WHERE compilation phase leads to poor performance.

    [ https://issues.apache.org/jira/browse/PHOENIX-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611828#comment-17611828 ] 

ASF GitHub Bot commented on PHOENIX-6752:
-----------------------------------------

comnetwork commented on code in PR #1508:
URL: https://github.com/apache/phoenix/pull/1508#discussion_r984304639


##########
phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java:
##########
@@ -926,7 +934,11 @@ private KeySlots andKeySlots(AndExpression andExpression, List<KeySlots> childSl
                             if (slot.getOrderPreserving() != null) {
                                 orderPreserving = slot.getOrderPreserving().combine(orderPreserving);
                             }
-                            if (slot.getKeyPart().getExtractNodes() != null) {
+                            // Extract once per iteration, when there are large number
+                            // of OR clauses (for e.g N > 100k).
+                            // The extractNodes.addAll method can get called N times.
+                            if (!visitedKeyParts.contains(slot.getKeyPart()) && slot.getKeyPart().getExtractNodes() != null) {
+                                visitedKeyParts.add(slot.getKeyPart());
                                 extractNodes.addAll(slot.getKeyPart().getExtractNodes());

Review Comment:
   The problem of this jira to solve is here? `KeyExpressionVisitor.andKeySlots` adds many repeated `keyPart.extractNodes` to extractNodes and extractNodes is exploded ? 
   Make the extractNodes from List to Set could prevent the extractNodes exploded, but for the case there are large number of OR clauses,how this PR could prevent the cpu time consumed by `SlotsIterator` to enumerate all `KeyRange` combination? 



##########
phoenix-core/src/it/java/org/apache/phoenix/end2end/InListIT.java:
##########
@@ -1856,6 +1936,40 @@ public void testBaseTableAndIndexTableHaveRightScan() throws Exception {
         }
     }
 
+    @Test
+    public void testWithLargeORs() throws Exception {

Review Comment:
   For this test, we test the `WhereOptimizer.pushKeyExpressionsToScan` and assert the `extractedNodes` is worked as we expected,  I think we would better simplify this test and put it in `WhereOptimizerTest`, not in `InListIT`.If we could put a test in UTs, we would not put it in ITs.





> Duplicate expression nodes in extract nodes during WHERE compilation phase leads to poor performance.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6752
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6752
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.15.0, 5.1.0, 4.16.1, 5.2.0
>            Reporter: Jacob Isaac
>            Assignee: Jacob Isaac
>            Priority: Critical
>             Fix For: 5.2.0
>
>         Attachments: test-case.txt
>
>
> SQL queries using the OR operator were taking a long time during the WHERE clause compilation phase when a large number of OR clauses (~50k) are used.
> The key observation was that during the AND/OR processing, when there are a large number of OR expression nodes the same set of extracted nodes was getting added.
> Thus bloating the set size and slowing down the processing.
> [code|https://github.com/apache/phoenix/blob/0c2008ddf32566c525df26cb94d60be32acc10da/phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java#L930]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)