You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Zoltan Haindrich (JIRA)" <ji...@apache.org> on 2018/07/02 14:42:00 UTC

[jira] [Commented] (CALCITE-2384) Performance issue in getPulledUpPredicates

    [ https://issues.apache.org/jira/browse/CALCITE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530021#comment-16530021 ] 

Zoltan Haindrich commented on CALCITE-2384:
-------------------------------------------

It's a kinda sophisticated thing; but to construct a query which can demonstrate it was a very good idea; I've already had a strong feeling about what's possibly causing the problem...so:

Take the following query:

{code}
select a.comm
        from (select * from sales.emp ) a
        join (select * from sales.emp ) b on (a.comm=b.comm or a.comm=99);
        join (select * from sales.emp ) c on (a.comm=b.comm and a.comm=c.comm);
{code}

I will call the two join conditions  {{COND_B}} and {{COND_C}}.

The following are the key properties of these things:

* COND_B is not a "simple" equality condition because of the "or"
* COND_C contains conditions which puts some fields into the same equality group

I will use number of calls to RexSimplify as "weight".
The RelMdPredicates algorithm will invoke simplify to try to mutate a new expression from COND_B by going thru all possibilities of the eqvivalent operands.

For the initial case, the equiv group is: \{ a.comm,b.comm,c.comm \} .
Since a.comm and b.comm is present in COND_B; it will do 3*3 = 9 ops.

We can make this more "heavier" a few ways:

* expand the equiv group by adding a new conditional to COND_C:
{code}
  COND_B = (a.comm=b.comm or a.comm=99)
  COND_C = (a.comm=b.comm and a.comm=c.comm and a.comm=a.deptno);
{code}
  this raises from N*N=3 to (N+1)*(N+1)=16
* add a new conditional to COND_B - and expand the number of present element from equiv group there: with this we can get a N times more invocations:
{code}
  COND_B = (a.comm=b.comm or a.comm=a.deptno or a.comm=99)
  COND_C = (a.comm=b.comm and a.comm=c.comm and a.comm=a.deptno);
{code}
  this raises from N*N=16 to N*N*N=64
* add more joins: 
  adding an extra: {{join (select * from sales.emp ) d on (a.comm=b.comm and a.comm=c.comm and a.comm=a.deptno and a.comm=d.comm)}} raises the invocation count to 3384; 
  but this starts to getting sophisticated enough to not follow by hand - it's intresting that it almost squared the invocation count...

I think that instead of going thru the possible variations; we should only go thru it only once - but replace every variables in the same equiv group with a representative element (smallest index?).
That should probably be able to construct these predicates ; however: I might not see all the caveats the original authors were already caring for...so this might be to high to aim right now.

About the "quick fix" to this performance regression caused by CALCITE-2247: it can't be fixed the way I thinked at first...because after a few steps later it starts to go to those code paths anyway...
a feature toggle would work - if that's okay...


> Performance issue in getPulledUpPredicates
> ------------------------------------------
>
>                 Key: CALCITE-2384
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2384
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Zoltan Haindrich
>            Priority: Major
>
> Performance issue in getPulledUpPredicates. It seems to have been introduced in the fix for CALCITE-2247, and causes Performance issue in getPulledUpPredicates to exceed its 20 second timeout. (See the [email thread|https://lists.apache.org/thread.html/afaa14a864c7027b9f1c66dddd3e5d6320799aeeec937c17d7b24531@%3Cdev.calcite.apache.org%3E]: [~risdenk] noticed this problem, and [~michaelmior] isolated the commit that caused the problem.)
> This issue has lots of history: that test was introduced to check CALCITE-1960 and CALCITE-2205. 
> [~kgyrtkirk], Can you please take a look at this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)