You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Jiunn Jye Ng <ji...@gmail.com> on 2015/03/11 10:02:44 UTC

JdbcSort not used in plan.

Hi,

When running a query with Order By clause using JdbcAdapter, I observed the
JdbcSort rule is not being used.

I have try a few variation of order by and the calcite always fall to
using EnumerableSort.

I tried to workaround by it override JdbcSort.computeSelfCost to return a
very minimal value (CPU = 1, ROW = 1, IO  = 0)

but the cheapest plan always fall to using EnumerableSort even though the
EnumerableSort.computeSelfCost return a higher value. (CPU:100.0 io:0.0
row:3684.1361487904733)

Can someone give me some pointer why this does not work and how the
cheapest plan identification work?

I have also tried override getRows and that does not work neither.


Thank you.

Rgds,
jay

Re: JdbcSort not used in plan.

Posted by Jiunn Jye Ng <ji...@gmail.com>.
Thanks julian.

On Thu, Mar 12, 2015 at 7:16 AM, Julian Hyde <ju...@hydromatic.net> wrote:

> Everything I know I've already said in this thread.
>
> On Wed, Mar 11, 2015 at 4:02 PM, Jiunn Jye Ng <ji...@gmail.com> wrote:
> > Hi
> > Jira case 616 raised for this
> > https://issues.apache.org/jira/browse/CALCITE-616
> >
> > Can you suggest if there are any workaround to force the JdbcSort into
> > action ?
> >
> > Thank you.
> >
> > Rgds,
> > jay
> >
> > On Thu, Mar 12, 2015 at 3:30 AM, Vladimir Sitnikov <
> > sitnikov.vladimir@gmail.com> wrote:
> >
> >> >Every extra variable (e.g. column count) you add is a new
> >> opportunity for something to go wrong.
> >>
> >> I see that, however you need something to make project that reduces
> >> columns cheaper, don't you?
> >> On the other hand, if you use (rows, cpu) costing tuple, what would
> >> you consider cheaper (2 rows, 3 cpu) or (3 rows, 2 cpu)?
> >> The easiest way to compare costs is to use "rows + X*cpu" formula,
> >> that basically requires some coefficient.
> >>
> >> > query failed to use JdbcSort
> >> Well, I just picked a first JdbcTest test that did instantiate JdbcSort.
> >> Of course we need a separate issue for sorting (i.e. simplest case for
> it).
> >>
> >> Vladimir
> >>
>

Re: JdbcSort not used in plan.

Posted by Julian Hyde <ju...@hydromatic.net>.
Everything I know I've already said in this thread.

On Wed, Mar 11, 2015 at 4:02 PM, Jiunn Jye Ng <ji...@gmail.com> wrote:
> Hi
> Jira case 616 raised for this
> https://issues.apache.org/jira/browse/CALCITE-616
>
> Can you suggest if there are any workaround to force the JdbcSort into
> action ?
>
> Thank you.
>
> Rgds,
> jay
>
> On Thu, Mar 12, 2015 at 3:30 AM, Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> wrote:
>
>> >Every extra variable (e.g. column count) you add is a new
>> opportunity for something to go wrong.
>>
>> I see that, however you need something to make project that reduces
>> columns cheaper, don't you?
>> On the other hand, if you use (rows, cpu) costing tuple, what would
>> you consider cheaper (2 rows, 3 cpu) or (3 rows, 2 cpu)?
>> The easiest way to compare costs is to use "rows + X*cpu" formula,
>> that basically requires some coefficient.
>>
>> > query failed to use JdbcSort
>> Well, I just picked a first JdbcTest test that did instantiate JdbcSort.
>> Of course we need a separate issue for sorting (i.e. simplest case for it).
>>
>> Vladimir
>>

Re: JdbcSort not used in plan.

Posted by Jiunn Jye Ng <ji...@gmail.com>.
Hi
Jira case 616 raised for this
https://issues.apache.org/jira/browse/CALCITE-616

Can you suggest if there are any workaround to force the JdbcSort into
action ?

Thank you.

Rgds,
jay

On Thu, Mar 12, 2015 at 3:30 AM, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> >Every extra variable (e.g. column count) you add is a new
> opportunity for something to go wrong.
>
> I see that, however you need something to make project that reduces
> columns cheaper, don't you?
> On the other hand, if you use (rows, cpu) costing tuple, what would
> you consider cheaper (2 rows, 3 cpu) or (3 rows, 2 cpu)?
> The easiest way to compare costs is to use "rows + X*cpu" formula,
> that basically requires some coefficient.
>
> > query failed to use JdbcSort
> Well, I just picked a first JdbcTest test that did instantiate JdbcSort.
> Of course we need a separate issue for sorting (i.e. simplest case for it).
>
> Vladimir
>

Re: JdbcSort not used in plan.

Posted by Vladimir Sitnikov <si...@gmail.com>.
>Every extra variable (e.g. column count) you add is a new
opportunity for something to go wrong.

I see that, however you need something to make project that reduces
columns cheaper, don't you?
On the other hand, if you use (rows, cpu) costing tuple, what would
you consider cheaper (2 rows, 3 cpu) or (3 rows, 2 cpu)?
The easiest way to compare costs is to use "rows + X*cpu" formula,
that basically requires some coefficient.

> query failed to use JdbcSort
Well, I just picked a first JdbcTest test that did instantiate JdbcSort.
Of course we need a separate issue for sorting (i.e. simplest case for it).

Vladimir

Re: JdbcSort not used in plan.

Posted by Julian Hyde <ju...@hydromatic.net>.
> As you know, I would prefer enumerable over.... stop.

Yeah, we agree to disagree. We should use pluggable metadata providers
so you can use one that says that enumerable is totally awesome and
Jacques can use one that says enumerable sucks. :)

My opinion of costs is the same as Richard Feynmann's opinion of
physical theories -- it's pointless to argue whether they are correct,
but we should instead judge them by whether they achieve the desired
result with as little fuss as possible.

So, feel free to revise the cost model. Write tests as you do it. I'll
only complain if the tests start producing dumb plans (that is, dumber
than they do now).

I would advise you not to make the cost model more complicated than it
needs to be. Every extra variable (e.g. column count) you add is a new
opportunity for something to go wrong.

Can you log your findings in a jira case? It's all good stuff.

And by the way, we have diverged from the original intend of this
thread. Jiunn Jye Ng said that his query failed to use JdbcSort even
when he had hacked the cost model. That seems to be a more tractible
and pressing issue.

Julian

On Wed, Mar 11, 2015 at 11:23 AM, Vladimir Sitnikov
<si...@gmail.com> wrote:
> More findings:
>
> JdbcJoin.getRows diverges from Join.getRows, so EnumerableJoin has
> different cost from JdbcJoin.
>
> org.apache.calcite.rel.core.Project#computeSelfCost uses dRows *
> exps.size() in the CPU part of the cost, however VolcanoPlanner just
> ignores CPU part of the cost.
>
> We might want start adding "per cell" cost cpu factor, so the cost of
> project would be nRows + nRows*nExps*per_cell_cost_factor
>
> Vladimir

Re: JdbcSort not used in plan.

Posted by Vladimir Sitnikov <si...@gmail.com>.
More findings:

JdbcJoin.getRows diverges from Join.getRows, so EnumerableJoin has
different cost from JdbcJoin.

org.apache.calcite.rel.core.Project#computeSelfCost uses dRows *
exps.size() in the CPU part of the cost, however VolcanoPlanner just
ignores CPU part of the cost.

We might want start adding "per cell" cost cpu factor, so the cost of
project would be nRows + nRows*nExps*per_cell_cost_factor

Vladimir

Re: JdbcSort not used in plan.

Posted by Vladimir Sitnikov <si...@gmail.com>.
> Informally, we would choose the one with the "more efficient convention".

As you know, I would prefer enumerable over.... stop.

> allows us to achieve this fairly simply.

1) I am not sure if JdbcProject==EnumerableProject*0.8 properly
correlates with actual response times.
For instance, JdbcProject(JdbcTableScan) costs _less_ than a simple
JdbcTableScan.
In other words, it is a bit strange why adding JdbcProject increases cost.
In exactly the same way, a filter executed at jdbc side reduces the
cost of jdbc table scan.

2) JdbcToEnumerableConverter, on contrary, uses
super.computeSelfCost(planner).multiplyBy(.1).
I believe the cost of JdbcToEnumerable should depend on the both
number of rows and columns.

3) It is a bit odd that the cost is based on the number of "rows". I
do understand there should be a single "double" that is used to
compare cost, however it is not that good to assume that cost of "jdbc
scan" is proportional to the number of rows.
The cost depends on _filters_, since when Calcite pushes a filter down
to jdbc, the database can use index to get the required row, so the
cost is much less than a cost of a full table scan.


Well, that was a bit long.
I suggest:
1) Revise costing functions. Include column count where required
2) Probably start a set of planner test cases that assert cost-based decisions.
3) In far far future, we might want to know "indexes available at jdbc
side", so we can better assess the available indexed paths.

Vladimir

Re: JdbcSort not used in plan.

Posted by Julian Hyde <ju...@hydromatic.net>.
> On Mar 11, 2015, at 10:01 AM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
> Those "cost_multiplier" tricks are sloppy.

Maybe, but I actually think "cost multiplier tricks" are better than the alternatives.

When faced with two relational expressions that have different calling conventions but are otherwise the same, which do you prefer? Informally, we would choose the one with the "more efficient convention". Associating a cost multiplier with each trait (e.g. Enumerable = 1, Jdbc = 0.8) allows us to achieve this fairly simply.

Julian


Re: JdbcSort not used in plan.

Posted by Vladimir Sitnikov <si...@gmail.com>.
I think there are more issues.

For instance JdbcTest.testSelfJoinDifferentColumns
select e1."full_name"
  from "foodmart"."employee" as e1
  join "foodmart"."employee" as e2 on e1."first_name" = e2."last_name"
order by e1."last_name" limit 3

gives the following plan (note how JdbcJoin, JdbcCalc, and JdbcSort
are not used):
EnumerableCalc(expr#0..1=[{inputs}], full_name=[$t0])
  EnumerableLimit(fetch=[3])
    EnumerableSort(sort0=[$1], dir0=[ASC])
      EnumerableCalc(expr#0..3=[{inputs}], full_name=[$t0], last_name=[$t2])
        EnumerableJoin(condition=[=($1, $3)], joinType=[inner])
          JdbcToEnumerableConverter
            JdbcProject(full_name=[$1], first_name=[$2], last_name=[$3])
              JdbcTableScan(table=[[foodmart, employee]])
          JdbcToEnumerableConverter
            JdbcProject(last_name=[$3])
              JdbcTableScan(table=[[foodmart, employee]])

Overriding JdbcSort.computeSelfCost and JdbcJoin.computeSelfCost with
"multiplyby JdbcConvention.COST_MULTIPLIER" does not help.

If I reduce COST_MULTIPLIER from 0.8 to 0.08, then "better" plan is used:
EnumerableCalc(expr#0..1=[{inputs}], full_name=[$t0])
  EnumerableLimit(fetch=[3])
    EnumerableSort(sort0=[$1], dir0=[ASC])
      JdbcToEnumerableConverter
        JdbcProject(full_name=[$1], last_name=[$3])
          JdbcJoin(condition=[=($2, $0)], joinType=[inner])
            JdbcProject(last_name=[$3])
              JdbcTableScan(table=[[foodmart, employee]])
            JdbcProject(full_name=[$1], first_name=[$2], last_name=[$3])
              JdbcTableScan(table=[[foodmart, employee]])

Those "cost_multiplier" tricks are sloppy.

Vladimir

Re: JdbcSort not used in plan.

Posted by Julian Hyde <ju...@gmail.com>.
Likely cause is that JdbcSort is being created without its RelCollation properly set up. It should have a create method similar to LogicalSort, and clients should call that rather than the constructor directly.

Julian

> On Mar 11, 2015, at 8:25 AM, Julian Hyde <ju...@gmail.com> wrote:
> 
> Can you please log a jira case and we'll progress the issue there. 
> 
> Julian
> 
>> On Mar 11, 2015, at 02:02, Jiunn Jye Ng <ji...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> When running a query with Order By clause using JdbcAdapter, I observed the
>> JdbcSort rule is not being used.
>> 
>> I have try a few variation of order by and the calcite always fall to
>> using EnumerableSort.
>> 
>> I tried to workaround by it override JdbcSort.computeSelfCost to return a
>> very minimal value (CPU = 1, ROW = 1, IO  = 0)
>> 
>> but the cheapest plan always fall to using EnumerableSort even though the
>> EnumerableSort.computeSelfCost return a higher value. (CPU:100.0 io:0.0
>> row:3684.1361487904733)
>> 
>> Can someone give me some pointer why this does not work and how the
>> cheapest plan identification work?
>> 
>> I have also tried override getRows and that does not work neither.
>> 
>> 
>> Thank you.
>> 
>> Rgds,
>> jay


Re: JdbcSort not used in plan.

Posted by Julian Hyde <ju...@gmail.com>.
Can you please log a jira case and we'll progress the issue there. 

Julian

> On Mar 11, 2015, at 02:02, Jiunn Jye Ng <ji...@gmail.com> wrote:
> 
> Hi,
> 
> When running a query with Order By clause using JdbcAdapter, I observed the
> JdbcSort rule is not being used.
> 
> I have try a few variation of order by and the calcite always fall to
> using EnumerableSort.
> 
> I tried to workaround by it override JdbcSort.computeSelfCost to return a
> very minimal value (CPU = 1, ROW = 1, IO  = 0)
> 
> but the cheapest plan always fall to using EnumerableSort even though the
> EnumerableSort.computeSelfCost return a higher value. (CPU:100.0 io:0.0
> row:3684.1361487904733)
> 
> Can someone give me some pointer why this does not work and how the
> cheapest plan identification work?
> 
> I have also tried override getRows and that does not work neither.
> 
> 
> Thank you.
> 
> Rgds,
> jay