You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Harshvardhan Gupta (JIRA)" <ji...@apache.org> on 2017/05/22 14:25:04 UTC
[jira] [Assigned] (DERBY-6921) How good is the Derby Query Optimizer, really

     [ https://issues.apache.org/jira/browse/DERBY-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harshvardhan Gupta reassigned DERBY-6921:
-----------------------------------------

    Assignee: Harshvardhan Gupta

> How good is the Derby Query Optimizer, really
> ---------------------------------------------
>
>                 Key: DERBY-6921
>                 URL: https://issues.apache.org/jira/browse/DERBY-6921
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Bryan Pendleton
>            Assignee: Harshvardhan Gupta
>            Priority: Minor
>              Labels: database, gsoc2017, java, optimizer
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> At the 2015 VLDB conference, a team led by Dr. Viktor Leis at Munich
> Technical University introduced a new benchmark suite for evaluating
> database query optimizers: http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> The benchmark test suite is publically available:
> http://db.in.tum.de/people/sites/leis/qo/job.tgz
> The data set for running the benchmark is publically available:
> ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> As part of Google Summer of Code 2017, I am volunteering to mentor
> a Summer of Code intern who is interested in using these tools to
> improve the Derby query optimizer.
> My suggestion for the overall process is this:
> 1) Acquire the benchmark tools, and the data set
> 2) Run the benchmark.
> 2a) Some of the benchmark queries may reveal bugs in Derby.
>      For each such bug, we need to isolate the bug and fix it.
> 3) Once we are able to run the entire benchmark, we need to
>    analyze the results.
> 3a) Some of the benchmark queries may reveal opportunities
>    for Derby to improve the query plans that it chooses for
>    various classes of queries (this is explained in detail in the
>    VLDB paper and other information available at Dr. Leis's site)
>    For each such improvement, we need to isolate the issue,
>    report it as a separable improvement, and fix it (if we can)
> While the benchmark is an interesting exercise in and of itself,
> the overall goal of the project is to find-and-fix problems in the
> Derby query optimizer, specifically in the 3 areas which are
> the focus of the benchmark tool:
> 1) How good is the Derby cardinality estimator and when does
>    it lead to slow queries?
> 2) How good it the Derby cost model, and how well is it guiding
>    the overall query optimization process?
> 3) How large is the Derby enumerated plan space, and is it
>    appropriately-sized?
> While other Derby issues have been filed against these questions
> in the past, the intent of this specific project is to use the concrete
> tools provided by the VLDB paper to make this effort rigorous and
> successful at making concrete improvements to the Derby query
> optimizer.
> If you are interested in pursuing this project, please take these
> considerations into mind:
> 1) This is NOT an introductory project. You must be quite familiar
>    with DBMS systems, and with SQL, and in particular with
>    cost-based query optimization. If terms such as "cardinality
>    estimation", "correlated query predicates", or "bushy trees"
>    aren't comfortable terms for you ,this probably isn't the
>    project you're interested in.
> 2) If you are new to Derby, that is fine, but please take advantage
>    of the extensive body of introductory material on Derby to
>    become familiar with it: read the Derby Getting Started manual,
>    download the software and follow the tutorials, read the documentation,
>    download the source code and learn how to build and run the
>    test suites, etc.
> 3) All I have presented here is an **outline** of the project. You will
>    need to read the paper(s), study the benchmark queries, and
>    propose a detailed plan for how to use this benchmark as a tool
>    for improving the Derby query optimizer.
> If these sorts of tasks sound like exciting things to do, then please
> let us know!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)