You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Mike Matrigali <mi...@sbcglobal.net> on 2005/08/01 18:48:56 UTC

Re: jira question

I like the syntax that satheesh has proposed, and of course it could
be implemented separate from the indexing question.  I still think
it would be nice if the optimizer could tell that an expression in
a where clause matched the result of the expression which generated
the column, otherwise indexes on the generated column aren't as interesting.

Rick Hillegas wrote:
> Thanks, Satheesh. I have logged a new enhancement request on this issue 
> and cross-linked it with bug 455. Generated Columns would be easier to 
> implement since we wouldn't have to teach the optimizer to flag a new 
> class of sargs.
> 
> Cheers,
> -Rick
> 
> Satheesh Bandaram wrote:
> 
>> I think it is possible to avoid extra storage with generated columns 
>> too.  These generated columns need not necessarily be in physical 
>> storage. The language layer could evaluate the expression on the fly 
>> when requested. Again, if a query uses an expression that might match 
>> a generated column, it would be possible to use the index.
>>
>> It is possible to have a first implementation actually create the 
>> column in physical storage and then improve the implementation to 
>> avoid this physical column and instead just evaluate the expression 
>> when needed.
>>
>> Satheesh
>>
>> Manish Khettry wrote:
>>
>>> Adding hidden columns is an easy way to implement the feature but it
>>> does result in storage being wasted. Mike's suggestion (option 3) in
>>> Jira seems to be a better way of doing it. As I understand it, Mike is
>>> suggesting that the store is unaware of the fact that a function value
>>> is used as a key in the btree for theindex. The language layer
>>> maintains the information and uses it to maintain the index (i.e.
>>> passing the right key value in DML and index build after applying the
>>> function) as well as considering the index if the function is present
>>> in the query.
>>>
>>> Manish
>>>
>>> On 7/29/05, Satheesh Bandaram <sa...@sourcery.org> wrote:
>>>  
>>>
>>>> I don't know why Jira is not forwarding comments for this specific bug.
>>>> Here are some of my comments.
>>>>
>>>> SQL standard provides a way to have generated columns. These generated
>>>> columns can be created using,
>>>>
>>>> GENERATED ALWAYS AS <left paren><value expression><right paren>
>>>>
>>>> Once we have a generated column, it should be possible to create a
>>>> regular B-Tree index on this generated column. I think there are 
>>>> several
>>>> advantages of using this syntax: 1) It is standards based. 2) Since the
>>>> column is visible, it is much cleaner and easier to understand the
>>>> semantics 3) Fits nicely with existing syntax for identity columns.
>>>>
>>>> I think this is a good enhancement to add to Derby.
>>>>
>>>> Satheesh
>>>>
>>>> Mike Matrigali wrote:
>>>>
>>>>   
>>>>
>>>>> don' know what is going on with jira.  I have added comments on 
>>>>> 455, and
>>>>> have not seen those go to the list.
>>>>>
>>>>> Rick Hillegas wrote:
>>>>>
>>>>>     
>>>>>
>>>>>> I added comments to two bugs this week (171 and 455) and was hoping
>>>>>> that Jira would post these comments to the developer's list. In
>>>>>> particular, I was hoping that my comment on 455 (expression indexes)
>>>>>> would invite further discussion by the Store experts. Alas, Jira did
>>>>>> not post my comments. Would appreciate theories about how I am
>>>>>> mis-using Jira.
>>>>>>
>>>>>> Thanks,
>>>>>> -Rick
>>>>>>
>>>>>>
>>>>>>
>>>>>>       
>>>>>
>>>>>     
>>>>
>>>>   
>>>
>>>
>>>
>>>
>>>  
>>>
> 
> 
> 
> 


Re: jira question

Posted by Mike Matrigali <mi...@sbcglobal.net>.
yes, and then costing and choosing an index based on the expression if
it is the best way to execute the query.

Manish Khettry wrote:

> On 8/3/05, Mike Matrigali <mi...@sbcglobal.net> wrote:
> 
>>I don't think I am willing to say generated columns are not useful without
>>indexes, sorry if previous email led to that, actually I have not
>>actually thought much about it.
>>
>>I believe I was letting
>>my predjudice through, I have always thought that an interesting set
>>of problems could be solved if the optimizer work was done - so I
>>was concentrating on that.  And I see that optimizer work as a stepping
>>stone to even more possible interesting stuff in the future - things
>>like gist indexes.
>>
> 
> 
> So the optimizer work that you speak of is being able to check if two
> expressions are equivalent?
> 

Re: jira question

Posted by Manish Khettry <ma...@gmail.com>.
On 8/3/05, Mike Matrigali <mi...@sbcglobal.net> wrote:
> I don't think I am willing to say generated columns are not useful without
> indexes, sorry if previous email led to that, actually I have not
> actually thought much about it.
> 
> I believe I was letting
> my predjudice through, I have always thought that an interesting set
> of problems could be solved if the optimizer work was done - so I
> was concentrating on that.  And I see that optimizer work as a stepping
> stone to even more possible interesting stuff in the future - things
> like gist indexes.
> 

So the optimizer work that you speak of is being able to check if two
expressions are equivalent?

Re: jira question

Posted by Daniel John Debrunner <dj...@debrunners.com>.
Rick Hillegas wrote:

> I don't anticipate any significant time savings by scaling back the kind
> of expressions we support.  In any event, sarg-matching of
> indexableExpressions should not involve text comparisons. I imagine it
> would involve normalizing the parse trees and comparing the normalized
> subgraphs.

I'm just saying that getting some working functionality out for simple
expressions sooner rather than waiting to solve all the edge hard cases
is probably better. Getting a functional index working on something like
UPPER(lastname) will solve most people's problems.

starts chant

'incremental development - release early release often'
'incremental development - release early release often'

Dan.



Re: jira question

Posted by Mike Matrigali <mi...@sbcglobal.net>.
I don't think I am willing to say generated columns are not useful without
indexes, sorry if previous email led to that, actually I have not
actually thought much about it.

I believe I was letting
my predjudice through, I have always thought that an interesting set
of problems could be solved if the optimizer work was done - so I
was concentrating on that.  And I see that optimizer work as a stepping
stone to even more possible interesting stuff in the future - things 
like gist indexes.

Having said that, I liked the generated column syntax since it seemed to
be a standard syntax, which seemed better than coming up with our own.




Rick Hillegas wrote:
> I don't anticipate any significant time savings by scaling back the kind 
> of expressions we support.  In any event, sarg-matching of 
> indexableExpressions should not involve text comparisons. I imagine it 
> would involve normalizing the parse trees and comparing the normalized 
> subgraphs.
> 
> If Mike is right and generated columns aren't useful unless the 
> optimizer can turn indexableExpressions into sargs, then the choice 
> between generated columns and  expression indexes seems largely 
> religious to me. I am leaning toward expression indexes because they are 
> slightly simpler for the user: she only has to declare an index, not 
> declare a special column and then declare an index.
> 
> By the way, this is a very lively, fruitful, and clarifying email 
> thread. Thanks especially to Dan, Mike, Satheesh, and Manish.
> 
> Cheers,
> -Rick
> 
> Satheesh Bandaram wrote:
> 
>>
>> Manish Khettry wrote:
>>
>>> I think using arbitrarily complex expressions to create indexes may be
>>> overkill. Dan's idea of restricting expression indexes on function
>>> calls on a single base column (actually one _or more_ base columns
>>> shouldn't be that much harder) is probably better.
>>>  
>>>
>> While it may be OK to start with a single base column, I think one or 
>> more base column references would be good to support. Useful for 
>> functional indexes on *area*, for example. Or total compensation, 
>> which may involve base salary, bonus, commissions etc.
>>
>>> Isn't storing the expression text for a generated column also
>>> problematic? I can think of a few other things
>>> -- The expression will also have to be compiled and evaluated for DDL
>>> like alter table (drop column)  possibly others.
>>>  
>>>
>> Doesn't Derby already support evaluating an expression at DDL time for 
>> a default clause? Alter table add column evaluates this default 
>> expression at the DDL time.
>>
>>> -- We'll need a good way to check if two expressions are the same.
>>> Obviously a textual comparison may not work. Making sure that two
>>> expr's are the same is not trivial if we want to do it right-- say
>>> (x+y) > 10 is the same as 10 < (y+x).
>>>  
>>>
>> Right... Common expression detection needs to be added. Would also be 
>> useful to eliminate multiple evaluations of any common (sub-)expressions.
>>
>> Satheesh
>>
>>> Manish
>>>
>>> On 8/3/05, Rick Hillegas <Ri...@sun.com> wrote:
>>>  
>>>
>>>> Thanks, Jeff. I agree. Recompiling these expressions can occur when we
>>>> compile the triggering SELECT/UPDATE/INSERT statements. Compiling a
>>>> SELECT/UPDATE/INSERT is a relatively heavyweight operation and I don't
>>>> think anyone will notice a little extra time spent recompiling these
>>>> expressions.
>>>>
>>>> Cheers,
>>>> -Rick
>>>>
>>>> Jeffrey Lichtman wrote:
>>>>
>>>>   
>>>>
>>>>>> The expression used for the generated column must be stored in the
>>>>>> data dictionary. Does derby serialize expressions? I was looking
>>>>>> around in the classes in impl/sq/compile (ValueNode and such) and
>>>>>> these classes don't seem to implement Formatable.
>>>>>>       
>>>>>
>>>>> It would be better to store the expressions as text, and to parse and
>>>>> bind them when needed. I learned the hard way while working on the
>>>>> internals of other database systems that storing internal data
>>>>> structures in the data dictionary causes trouble when the structures
>>>>> change. Also, it's useful for users to be able to look at the text of
>>>>> their DDL statements.
>>>>>
>>>>>
>>>>>                       -        Jeff Lichtman
>>>>>                                swazoo@rcn.com
>>>>>                                Check out Swazoo Koolak's Web 
>>>>> Jukebox at
>>>>>                                http://swazoo.com/
>>>>>     
>>>>
>>>>
>>>>   
>>>
>>>
>>>
>>>
>>>  
>>>
> 
> 
> 
> 


Re: jira question

Posted by Rick Hillegas <Ri...@Sun.COM>.
I don't anticipate any significant time savings by scaling back the kind 
of expressions we support.  In any event, sarg-matching of 
indexableExpressions should not involve text comparisons. I imagine it 
would involve normalizing the parse trees and comparing the normalized 
subgraphs.

If Mike is right and generated columns aren't useful unless the 
optimizer can turn indexableExpressions into sargs, then the choice 
between generated columns and  expression indexes seems largely 
religious to me. I am leaning toward expression indexes because they are 
slightly simpler for the user: she only has to declare an index, not 
declare a special column and then declare an index.

By the way, this is a very lively, fruitful, and clarifying email 
thread. Thanks especially to Dan, Mike, Satheesh, and Manish.

Cheers,
-Rick

Satheesh Bandaram wrote:

>
> Manish Khettry wrote:
>
>>I think using arbitrarily complex expressions to create indexes may be
>>overkill. Dan's idea of restricting expression indexes on function
>>calls on a single base column (actually one _or more_ base columns
>>shouldn't be that much harder) is probably better.
>>  
>>
> While it may be OK to start with a single base column, I think one or 
> more base column references would be good to support. Useful for 
> functional indexes on *area*, for example. Or total compensation, 
> which may involve base salary, bonus, commissions etc.
>
>>Isn't storing the expression text for a generated column also
>>problematic? I can think of a few other things
>>-- The expression will also have to be compiled and evaluated for DDL
>>like alter table (drop column)  possibly others.
>>  
>>
> Doesn't Derby already support evaluating an expression at DDL time for 
> a default clause? Alter table add column evaluates this default 
> expression at the DDL time.
>
>>-- We'll need a good way to check if two expressions are the same.
>>Obviously a textual comparison may not work. Making sure that two
>>expr's are the same is not trivial if we want to do it right-- say
>>(x+y) > 10 is the same as 10 < (y+x).
>>  
>>
> Right... Common expression detection needs to be added. Would also be 
> useful to eliminate multiple evaluations of any common (sub-)expressions.
>
> Satheesh
>
>>Manish
>>
>>On 8/3/05, Rick Hillegas <Ri...@sun.com> wrote:
>>  
>>
>>>Thanks, Jeff. I agree. Recompiling these expressions can occur when we
>>>compile the triggering SELECT/UPDATE/INSERT statements. Compiling a
>>>SELECT/UPDATE/INSERT is a relatively heavyweight operation and I don't
>>>think anyone will notice a little extra time spent recompiling these
>>>expressions.
>>>
>>>Cheers,
>>>-Rick
>>>
>>>Jeffrey Lichtman wrote:
>>>
>>>    
>>>
>>>>>The expression used for the generated column must be stored in the
>>>>>data dictionary. Does derby serialize expressions? I was looking
>>>>>around in the classes in impl/sq/compile (ValueNode and such) and
>>>>>these classes don't seem to implement Formatable.
>>>>>        
>>>>>
>>>>It would be better to store the expressions as text, and to parse and
>>>>bind them when needed. I learned the hard way while working on the
>>>>internals of other database systems that storing internal data
>>>>structures in the data dictionary causes trouble when the structures
>>>>change. Also, it's useful for users to be able to look at the text of
>>>>their DDL statements.
>>>>
>>>>
>>>>                       -        Jeff Lichtman
>>>>                                swazoo@rcn.com
>>>>                                Check out Swazoo Koolak's Web Jukebox at
>>>>                                http://swazoo.com/
>>>>      
>>>>
>>>
>>>    
>>>
>>
>>
>>
>>  
>>



Re: jira question

Posted by Manish Khettry <ma...@gmail.com>.
I think using arbitrarily complex expressions to create indexes may be
overkill. Dan's idea of restricting expression indexes on function
calls on a single base column (actually one _or more_ base columns
shouldn't be that much harder) is probably better.

Isn't storing the expression text for a generated column also
problematic? I can think of a few other things
-- The expression will also have to be compiled and evaluated for DDL
like alter table (drop column)  possibly others.
-- We'll need a good way to check if two expressions are the same.
Obviously a textual comparison may not work. Making sure that two
expr's are the same is not trivial if we want to do it right-- say
(x+y) > 10 is the same as 10 < (y+x).

Manish

On 8/3/05, Rick Hillegas <Ri...@sun.com> wrote:
> Thanks, Jeff. I agree. Recompiling these expressions can occur when we
> compile the triggering SELECT/UPDATE/INSERT statements. Compiling a
> SELECT/UPDATE/INSERT is a relatively heavyweight operation and I don't
> think anyone will notice a little extra time spent recompiling these
> expressions.
> 
> Cheers,
> -Rick
> 
> Jeffrey Lichtman wrote:
> 
> >
> >> The expression used for the generated column must be stored in the
> >> data dictionary. Does derby serialize expressions? I was looking
> >> around in the classes in impl/sq/compile (ValueNode and such) and
> >> these classes don't seem to implement Formatable.
> >
> >
> > It would be better to store the expressions as text, and to parse and
> > bind them when needed. I learned the hard way while working on the
> > internals of other database systems that storing internal data
> > structures in the data dictionary causes trouble when the structures
> > change. Also, it's useful for users to be able to look at the text of
> > their DDL statements.
> >
> >
> >                        -        Jeff Lichtman
> >                                 swazoo@rcn.com
> >                                 Check out Swazoo Koolak's Web Jukebox at
> >                                 http://swazoo.com/
> 
> 
> 
>

Re: jira question

Posted by Rick Hillegas <Ri...@Sun.COM>.
Thanks, Jeff. I agree. Recompiling these expressions can occur when we 
compile the triggering SELECT/UPDATE/INSERT statements. Compiling a 
SELECT/UPDATE/INSERT is a relatively heavyweight operation and I don't 
think anyone will notice a little extra time spent recompiling these 
expressions.

Cheers,
-Rick

Jeffrey Lichtman wrote:

>
>> The expression used for the generated column must be stored in the
>> data dictionary. Does derby serialize expressions? I was looking
>> around in the classes in impl/sq/compile (ValueNode and such) and
>> these classes don't seem to implement Formatable.
>
>
> It would be better to store the expressions as text, and to parse and 
> bind them when needed. I learned the hard way while working on the 
> internals of other database systems that storing internal data 
> structures in the data dictionary causes trouble when the structures 
> change. Also, it's useful for users to be able to look at the text of 
> their DDL statements.
>
>
>                        -        Jeff Lichtman
>                                 swazoo@rcn.com
>                                 Check out Swazoo Koolak's Web Jukebox at
>                                 http://swazoo.com/




Re: jira question

Posted by Jeffrey Lichtman <sw...@rcn.com>.
>The expression used for the generated column must be stored in the
>data dictionary. Does derby serialize expressions? I was looking
>around in the classes in impl/sq/compile (ValueNode and such) and
>these classes don't seem to implement Formatable.

It would be better to store the expressions as text, and to parse and 
bind them when needed. I learned the hard way while working on the 
internals of other database systems that storing internal data 
structures in the data dictionary causes trouble when the structures 
change. Also, it's useful for users to be able to look at the text of 
their DDL statements.


                        -        Jeff Lichtman
                                 swazoo@rcn.com
                                 Check out Swazoo Koolak's Web Jukebox at
                                 http://swazoo.com/ 


Functional indexes WAS Re: jira question

Posted by Daniel John Debrunner <dj...@debrunners.com>.
Manish Khettry wrote:

> On 8/1/05, Mike Matrigali <mi...@sbcglobal.net> wrote:
> 
>>I still think
>>it would be nice if the optimizer could tell that an expression in
>>a where clause matched the result of the expression which generated
>>the column, otherwise indexes on the generated column aren't as interesting.
>>
> 
> 
> The expression used for the generated column must be stored in the
> data dictionary. Does derby serialize expressions? I was looking
> around in the classes in impl/sq/compile (ValueNode and such) and
> these classes don't seem to implement Formatable.

Compile nodes (and classes) are runtime only. They are never serialized
to disk and are not designed for that. Any runtime representation of a
compiled entity uses a specific class, such as RoutineAliasInfo.

An initial approach to this functional indexing may to to make the
scheme work for simple expressions, function calls on single columns,
eg. UPPER(lastname). Just getting that to work would solve most of the
typical uses and would be a good step towards a complete solution.
Handling arbitary expressions up front may be too much to bite off.

Dan.


Re: jira question

Posted by Manish Khettry <ma...@gmail.com>.
On 8/1/05, Mike Matrigali <mi...@sbcglobal.net> wrote:
> I still think
> it would be nice if the optimizer could tell that an expression in
> a where clause matched the result of the expression which generated
> the column, otherwise indexes on the generated column aren't as interesting.
>

The expression used for the generated column must be stored in the
data dictionary. Does derby serialize expressions? I was looking
around in the classes in impl/sq/compile (ValueNode and such) and
these classes don't seem to implement Formatable.

Manish