You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Masood Mortazavi <ma...@gmail.com> on 2011/06/14 03:39:10 UTC

Derby query optimizations / (materialized) Views and Queries

I have a Derby table (T) with 10^7 records.
This table has 5 columns.
I have defined a Derby view (V) on one of those columns.
To produce (V) all of (T) needs to be scanned.
There are about 2000 records that the view (V) contains.

Now, I "cross" (T) with (V), with an IN clause involving (V).
I have three questions:
[1] Is the optimizer smart enough to to scan only O( (T)*(V) ) times (with
the V comparisons involving in-memory "materialization" of (V)), or will it
be O( (T)*(T) ) or something else?
[2] What is the support that Derby 10.8.1.2 may be offering for materialized
views?
[3] If the answer to [1] is the larger number, would materializing (V)
change things or improve them drastically, i.e. would Derby save the large
(T) sized scan to produce (V), or even better, would it do that and also
bring all of (V), which is rather small, into memory (in large enough chunks
or completely) for the purposes of the cross involving IN?

With best regards to all good Derby friends,
- m.

Re: Derby query optimizations / (materialized) Views and Queries

Posted by Masood Mortazavi <ma...@gmail.com>.

On Tue, Jun 14, 2011 at 6:09 AM, Rick Hillegas <ri...@oracle.com>wrote:

> Hi Masood,
>
> Some comments inline...
>
> On 6/13/11 7:30 PM, Masood Mortazavi wrote:
>
>> On Mon, Jun 13, 2011 at 7:18 PM, Masood Mortazavi <
>> masoodmortazavi@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>>
>>    On Mon, Jun 13, 2011 at 6:39 PM, Masood Mortazavi
>>    <masoodmortazavi@gmail.com <ma...@gmail.com>> wrote:
>>
>>        I have a Derby table (T) with 10^7 records.
>>        This table has 5 columns.
>>        I have defined a Derby view (V) on one of those columns.
>>        To produce (V) all of (T) needs to be scanned.
>>        There are about 2000 records that the view (V) contains.
>>
>>        Now, I "cross" (T) with (V), with an IN clause involving (V).
>>        I have three questions:
>>        [1] Is the optimizer smart enough to to scan only O( (T)*(V) )
>>        times (with the V comparisons involving in-memory
>>        "materialization" of (V)), or will it be O( (T)*(T) ) or
>>        something else?
>>
>>  I think this depends on the view, the query, and the statistics on the
> underlying table. I'm sure that Derby's view materialization could be
> improved.
>
>
>>        [2] What is the support that Derby 10.8.1.2 may be offering
>>        for materialized views?
>>
>>  View materialization wasn't a focus of the 10.8.1.2 release. It is
> possible that the automatic calculation of statistics may improve the
> performance of materialized views in some cases.
>
>
>>        [3] If the answer to [1] is the larger number, would
>>        materializing (V) change things or improve them drastically,
>>        i.e. would Derby save the large (T) sized scan to produce (V),
>>        or even better, would it do that and also bring all of (V),
>>        which is rather small, into memory (in large enough chunks or
>>        completely) for the purposes of the cross involving IN?
>>
>>        With best regards to all good Derby friends,
>>        - m.
>>
>>
>>    I solve the materialization issue by a simple exporting of the
>>    view's select query and then importing it.
>>    It works well.
>>    I will rerun my IN cross query, against the now materialized table
>>    to see how much better or worse it does in this particular case.
>>    The general gist of my questions above still remain.
>>
>>    Regards,
>>    Masood
>>
>>
>>
>> OK.
>> As described above, I used Derby's wonderful system utility procedures to
>> produce a materialization of the view in question, above.
>> The cross query became faster than a blizzard.
>>
> You may get better performance by using an INSERT ... SELECT statement
> rather than exporting the view and re-importing it. That will eliminate the
> redundant writing of the view to an external file. You may also investigate
> using a temporary table to store the intermediate result (for details, see
> the Reference Guide section on DECLARE GLOBAL TEMPORARY TABLE).
>
> Hope this helps,
> -Rick



This is all very helpful.
Thank you very much.
Yes, INSERT . . . SELECT would have been very helpful and more concise.
We may have other uses for the extra data file, in a Hadoop environment.
Given the statistics of the source data, doing an "IN" against a
materialized view turned out to be a lot faster, as expected.

Thanks again and with very best regards,
m.

P.S.
By the way, not supporting materialized views is not a big deal.
I'm not suggesting it should be supported.

Re: Derby query optimizations / (materialized) Views and Queries

Posted by Rick Hillegas <ri...@oracle.com>.

Hi Masood,

Some comments inline...

On 6/13/11 7:30 PM, Masood Mortazavi wrote:
> On Mon, Jun 13, 2011 at 7:18 PM, Masood Mortazavi 
> <masoodmortazavi@gmail.com <ma...@gmail.com>> wrote:
>
>
>
>     On Mon, Jun 13, 2011 at 6:39 PM, Masood Mortazavi
>     <masoodmortazavi@gmail.com <ma...@gmail.com>> wrote:
>
>         I have a Derby table (T) with 10^7 records.
>         This table has 5 columns.
>         I have defined a Derby view (V) on one of those columns.
>         To produce (V) all of (T) needs to be scanned.
>         There are about 2000 records that the view (V) contains.
>
>         Now, I "cross" (T) with (V), with an IN clause involving (V).
>         I have three questions:
>         [1] Is the optimizer smart enough to to scan only O( (T)*(V) )
>         times (with the V comparisons involving in-memory
>         "materialization" of (V)), or will it be O( (T)*(T) ) or
>         something else?
>
I think this depends on the view, the query, and the statistics on the 
underlying table. I'm sure that Derby's view materialization could be 
improved.
>
>         [2] What is the support that Derby 10.8.1.2 may be offering
>         for materialized views?
>
View materialization wasn't a focus of the 10.8.1.2 release. It is 
possible that the automatic calculation of statistics may improve the 
performance of materialized views in some cases.
>
>         [3] If the answer to [1] is the larger number, would
>         materializing (V) change things or improve them drastically,
>         i.e. would Derby save the large (T) sized scan to produce (V),
>         or even better, would it do that and also bring all of (V),
>         which is rather small, into memory (in large enough chunks or
>         completely) for the purposes of the cross involving IN?
>
>         With best regards to all good Derby friends,
>         - m.
>
>
>     I solve the materialization issue by a simple exporting of the
>     view's select query and then importing it.
>     It works well.
>     I will rerun my IN cross query, against the now materialized table
>     to see how much better or worse it does in this particular case.
>     The general gist of my questions above still remain.
>
>     Regards,
>     Masood
>
>
>
> OK.
> As described above, I used Derby's wonderful system utility procedures 
> to produce a materialization of the view in question, above.
> The cross query became faster than a blizzard.
You may get better performance by using an INSERT ... SELECT statement 
rather than exporting the view and re-importing it. That will eliminate 
the redundant writing of the view to an external file. You may also 
investigate using a temporary table to store the intermediate result 
(for details, see the Reference Guide section on DECLARE GLOBAL 
TEMPORARY TABLE).

Hope this helps,
-Rick

> So, I think I have answered my own question from a practical point of 
> view. Other insights are welcome.
>
> Take care!
> Masood
>

Re: Derby query optimizations / (materialized) Views and Queries

Posted by Masood Mortazavi <ma...@gmail.com>.

On Mon, Jun 13, 2011 at 7:18 PM, Masood Mortazavi <masoodmortazavi@gmail.com
> wrote:

>
>
> On Mon, Jun 13, 2011 at 6:39 PM, Masood Mortazavi <
> masoodmortazavi@gmail.com> wrote:
>
>> I have a Derby table (T) with 10^7 records.
>> This table has 5 columns.
>> I have defined a Derby view (V) on one of those columns.
>> To produce (V) all of (T) needs to be scanned.
>> There are about 2000 records that the view (V) contains.
>>
>> Now, I "cross" (T) with (V), with an IN clause involving (V).
>> I have three questions:
>> [1] Is the optimizer smart enough to to scan only O( (T)*(V) ) times (with
>> the V comparisons involving in-memory "materialization" of (V)), or will it
>> be O( (T)*(T) ) or something else?
>> [2] What is the support that Derby 10.8.1.2 may be offering for
>> materialized views?
>> [3] If the answer to [1] is the larger number, would materializing (V)
>> change things or improve them drastically, i.e. would Derby save the large
>> (T) sized scan to produce (V), or even better, would it do that and also
>> bring all of (V), which is rather small, into memory (in large enough chunks
>> or completely) for the purposes of the cross involving IN?
>>
>> With best regards to all good Derby friends,
>> - m.
>>
>
> I solve the materialization issue by a simple exporting of the view's
> select query and then importing it.
> It works well.
> I will rerun my IN cross query, against the now materialized table to see
> how much better or worse it does in this particular case.
> The general gist of my questions above still remain.
>
> Regards,
> Masood
>


OK.
As described above, I used Derby's wonderful system utility procedures to
produce a materialization of the view in question, above.
The cross query became faster than a blizzard.
So, I think I have answered my own question from a practical point of view.
Other insights are welcome.

Take care!
Masood

Re: Derby query optimizations / (materialized) Views and Queries

Posted by Masood Mortazavi <ma...@gmail.com>.

On Mon, Jun 13, 2011 at 6:39 PM, Masood Mortazavi <masoodmortazavi@gmail.com
> wrote:

> I have a Derby table (T) with 10^7 records.
> This table has 5 columns.
> I have defined a Derby view (V) on one of those columns.
> To produce (V) all of (T) needs to be scanned.
> There are about 2000 records that the view (V) contains.
>
> Now, I "cross" (T) with (V), with an IN clause involving (V).
> I have three questions:
> [1] Is the optimizer smart enough to to scan only O( (T)*(V) ) times (with
> the V comparisons involving in-memory "materialization" of (V)), or will it
> be O( (T)*(T) ) or something else?
> [2] What is the support that Derby 10.8.1.2 may be offering for
> materialized views?
> [3] If the answer to [1] is the larger number, would materializing (V)
> change things or improve them drastically, i.e. would Derby save the large
> (T) sized scan to produce (V), or even better, would it do that and also
> bring all of (V), which is rather small, into memory (in large enough chunks
> or completely) for the purposes of the cross involving IN?
>
> With best regards to all good Derby friends,
> - m.
>

I solve the materialization issue by a simple exporting of the view's select
query and then importing it.
It works well.
I will rerun my IN cross query, against the now materialized table to see
how much better or worse it does in this particular case.
The general gist of my questions above still remain.

Regards,
Masood