You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Harshvardhan Gupta <ha...@gmail.com> on 2017/03/19 17:34:18 UTC

DERBY-6921 How good is the Derby Query Optimizer, really

Hi,

I am in the process of writing a proposal for this GSoC summer project, I read the VLDB paper by Dr. Viktor Leis who introduced a new benchmark suite for evaluating database query optimizers:  http://www.vldb.org/pvldb/vol9/p204-leis.pdf. 

The paper did an end to end study of various components of query optimizer and isolated the impact of different components in producing query plans which ideally should be close to optimal. The scope of this project itself includes running the benchmark suite for Derby and develop a knowledge base for improving Derby optimizer in future. In the derby context I read about how Derby produces cardinality estimates and updates the statistics, derby’s cost model and the enumeration space derby uses.

 Dr. Viktor Leis in the paper has shown the importance of Cardinality estimates in producing good query plans relative to cost models and enumeration space. Even before isolating the impact of cardinalities on query plan (by injecting true cardinalities, to be taken as part of this project itself), I speculate that cardinality estimation has a lot of scope for improvement in Derby.

I am proposing the introduction of optional table sampling in order to improve the cardinality estimation, the cardinality estimates can then obtained reliably in presence of table samples specially when we are filtering on set of attributes that are mutually co-related which Derby currently ignores by taking in account assumption of uniformity and independence between attributes of the same table. I would like to specifically ask whether such optional sampling methods should be introduced in derby at the cost of leaving simplicity and light overhead of one dimensional histograms that derby optimizer currently uses. The scope of this project can then be adjusted accordingly as well.

Regards,
Harshvardhan Gupta

Re: DERBY-6921 How good is the Derby Query Optimizer, really

Posted by Bryan Pendleton <bp...@gmail.com>.
>  Dr. Viktor Leis in the paper has shown the importance of Cardinality
> estimates in producing good query plans relative to cost models and
> enumeration space. Even before isolating the impact of cardinalities on
> query plan (by injecting true cardinalities, to be taken as part of this
> project itself), I speculate that cardinality estimation has a lot of scope
> for improvement in Derby.
>

I share your suspicion, though I'm eager to see the benchmark numbers
before we come to any definite conclusions.


>
> I am proposing the introduction of optional table sampling in order to
> improve the cardinality estimation, the cardinality estimates can then
> obtained reliably in presence of table samples specially when we are
> filtering on set of attributes that are mutually co-related which Derby
> currently ignores by taking in account assumption of uniformity and
> independence between attributes of the same table.
>
> I think this would be a wonderful direction to explore!

It sounds like quite a lot of work, but I'm sure it can be broken down into
smaller pieces of infrastructure which can serve as milestones along the
way to improvement.

One of Derby's goals, over the years, has been to require as little
administration as possible. In keeping with that vision, it would be
valuable to me to understand how features such as the ones you describe can
be incorporated without requiring a lot of attention from a Database
Administrator to use properly.

thanks,

bryan