You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Rick Hillegas <Ri...@Sun.COM> on 2009/01/07 18:27:22 UTC

Growing footprint, was: User Defined Functions in a Group By Clause

Here's the growth of the embedded engine, derby.jar, over the last few 
feature releases:

10.4.1.3 2440097 +4%
10.3.1.4 2338649 +4%
10.2.1.6 2253277 +5%
10.1.3.1 2141382

This seems like modest growth to me and well behind the corresponding 
growth in the size of storage devices over the same period. I would be 
interested in learning more about the resource-constrained environments 
where Derby is running today and where this kind of growth is not 
acceptable.

I do think that the idea of plugable features (like partitioned tables) 
is attractive. It would be tricky, though, since a vertical feature silo 
would have to cut across many of Derby's existing horizontal layers.

Regards,
-Rick

derby@segel.com wrote:
> Hey!
> Not to be a 'Debbie Downer' but I have some reservations.
>
> Knut points out that there is a viable work around. (Wrap the query)
>
> Currently Derby/JavaDB doesn't have any way of creating a scalable foot
> print.
>
> As more and more features/enhancements are added to Derby/JavaDB, you're
> going to see an increase in the size of the engine's footprint. While this
> may not be an issue for developers who want a full featured java based
> relational engine, it does have an impact on the embedded developer who
> won't use or see value to some of these enhancements.
>
> My suggestion is that before looking to add more features, those who are in
> charge of architecting derby/javadb consider a modification that would allow
> for the ability to have a scalable foot print. 
>
> An example... Suppose one wanted to allow Derby to use raw disk partitions
> and also allow the use of data partitioned tables. These would have extreme
> value for the developer who wants a java relational engine, while it would
> have no value, but a cost, to the developer who wants to embed derby/javadb.
>
> Does this make sense?
>
> -Mike
>
> PS. Yeah I know it's easy to sit in the cheap seats and toss out
> suggestions. But because it has commercial value/impact, I would defer this
> request to those 'boffins' at Sun who are paid to support JavaDB to think
> this through. I would have included IBM but they've seemed to have
> disappeared when IBM dropped Cloudscape support. ;-)
>
>
>   
>> -----Original Message-----
>> From: Richard.Hillegas@Sun.COM [mailto:Richard.Hillegas@Sun.COM]
>> Sent: Wednesday, January 07, 2009 9:22 AM
>> To: Derby Discussion
>> Subject: Re: User Defined Functions in a Group By Clause
>>
>> Hi Kim and Knut,
>>
>> Some comments inline...
>>
>> Knut Anders Hatlen wrote:
>>     
>>> Kim Moore <km...@google.com> writes:
>>>
>>>
>>>       
>>>> I am working on a query that uses a user defined function in a group
>>>> by clause.
>>>>
>>>> select myfunction (datecolumn)
>>>>          ,count(*)
>>>> from    table
>>>> group by myfunction (datecolumn)
>>>>
>>>> Executing the query gives the error "The SELECT list of a group query
>>>> contains at least one invalid expression."
>>>>
>>>> When I replace myfunction (user defined function) with a DATE function
>>>> which comes standard with Derby, the query works.
>>>>
>>>> All help is greatly appreciated.
>>>>
>>>>         
>>> I think this is because user-defined functions are not known to be
>>> deterministic. See this discussion for more details:
>>> http://www.nabble.com/Functions-in-GROUP-BY-expressions--%28related-to-
>>>       
>> DERBY-883%29-tf2517296.html
>>     
>>> Derby 10.5 will support the DETERMINISTIC keyword in CREATE FUNCTION
>>> statements (https://issues.apache.org/jira/browse/DERBY-3570), but I
>>> haven't tested if that will actually allow you to use user-defined
>>> functions in GROUP BY.
>>>
>>>       
>> Nothing was done to relax the limitation on user-defined functions in
>> GROUP BY expressions. Even if you declare the function to be
>> DETERMINISTIC, the query will fail.
>>     
>>> Putting the function call in a subquery and renaming the column holding
>>> the value returned by the function should work even without the
>>> DETERMINISTIC keyword, though:
>>>
>>> select x, count(*) from
>>>   (select myfunction(datecolumn) from mytable) t(x)
>>> group by x
>>>
>>>
>>>       
>> I agree that it would be reasonable to allow user-defined functions in
>> GROUP BY expressions. I have logged DERBY-4003 to track this issue. That
>> JIRA would be a good place to continue the discussion about whether we
>> should limit this extension to DETERMINISTIC functions. The limitation
>> does not seem to me to be rooted in the SQL standard, but there may be
>> some good implementation-related reasons for maintaining it.
>>
>> Regards,
>> -Rick
>>     
>
>
>

RE: Growing footprint, was: User Defined Functions in a Group By Clause

Posted by de...@segel.com.

Rick,

Ok, lets see if I got the math straight...
X, 1.04%X, (1.04*1.04)%X, (1.04*1.04*1.05)%X
So the last version is roughly 13.568% larger than X.

I don't know how much growth is acceptable, and you're right. Moore's law
and systems' capacities have increased at a larger rate. But that doesn't
mean that an increase in Derby's size wouldn't prohibit its use in an
embedded environment. Atom based PCs, or PDAs/Cells could have embedded
Derby apps running as part of a set of applications, so memory and storage
could still be at a premium.

I'm looking at this from a different perspective. What we've seen is an
improvement in existing features. Suppose you want to add net new features
like raw disk support, in-memory support, alternative indexing or
alternative embedded encryption support, clustering/shared nothing parallel
queries... All of this could be 'pluggable' enhancements. These features
have value to those who see Derby/JavaDB as a general purpose database. They
have no or little value to the embedded apps. (Thus an increase in Derby's
footprint for these features would probably be unacceptable for some.)

I do realize that creating a pluggable framework as a part of Derby would
mean some major refactoring.  

I agree that its not a small task, and this is why I suggested that Sun
employees take the time to consider this enhancement. Sun has the most to
gain from this enhancement and it would require a couple of man years to
implement. (A couple of FTEs working 3-6 months just on architecting the
design.) There is probably a better ROI from implementing this concept into
JavaDB than trying to add this into MySQL. Not to mention that if this were
to enter in to Derby, JavaDB has to maintain parity and Sun would be on the
hook for supporting it.


-Mike

PS. I think we can talk offline since this discussion goes beyond the
purpose of this email list.

 


> -----Original Message-----
> From: Richard.Hillegas@Sun.COM [mailto:Richard.Hillegas@Sun.COM]
> Sent: Wednesday, January 07, 2009 11:27 AM
> To: Derby Discussion
> Subject: Growing footprint, was: User Defined Functions in a Group By
> Clause
> 
> Here's the growth of the embedded engine, derby.jar, over the last few
> feature releases:
> 
> 10.4.1.3 2440097 +4%
> 10.3.1.4 2338649 +4%
> 10.2.1.6 2253277 +5%
> 10.1.3.1 2141382
> 
> This seems like modest growth to me and well behind the corresponding
> growth in the size of storage devices over the same period. I would be
> interested in learning more about the resource-constrained environments
> where Derby is running today and where this kind of growth is not
> acceptable.
> 
> I do think that the idea of plugable features (like partitioned tables)
> is attractive. It would be tricky, though, since a vertical feature silo
> would have to cut across many of Derby's existing horizontal layers.
> 
> Regards,
> -Rick
> 
> derby@segel.com wrote:
> > Hey!
> > Not to be a 'Debbie Downer' but I have some reservations.
> >
> > Knut points out that there is a viable work around. (Wrap the query)
> >
> > Currently Derby/JavaDB doesn't have any way of creating a scalable foot
> > print.
> >
> > As more and more features/enhancements are added to Derby/JavaDB, you're
> > going to see an increase in the size of the engine's footprint. While
> this
> > may not be an issue for developers who want a full featured java based
> > relational engine, it does have an impact on the embedded developer who
> > won't use or see value to some of these enhancements.
> >
> > My suggestion is that before looking to add more features, those who are
> in
> > charge of architecting derby/javadb consider a modification that would
> allow
> > for the ability to have a scalable foot print.
> >
> > An example... Suppose one wanted to allow Derby to use raw disk
> partitions
> > and also allow the use of data partitioned tables. These would have
> extreme
> > value for the developer who wants a java relational engine, while it
> would
> > have no value, but a cost, to the developer who wants to embed
> derby/javadb.
> >
> > Does this make sense?
> >
> > -Mike
> >
> > PS. Yeah I know it's easy to sit in the cheap seats and toss out
> > suggestions. But because it has commercial value/impact, I would defer
> this
> > request to those 'boffins' at Sun who are paid to support JavaDB to
> think
> > this through. I would have included IBM but they've seemed to have
> > disappeared when IBM dropped Cloudscape support. ;-)
> >
> >
> >
> >> -----Original Message-----
> >> From: Richard.Hillegas@Sun.COM [mailto:Richard.Hillegas@Sun.COM]
> >> Sent: Wednesday, January 07, 2009 9:22 AM
> >> To: Derby Discussion
> >> Subject: Re: User Defined Functions in a Group By Clause
> >>
> >> Hi Kim and Knut,
> >>
> >> Some comments inline...
> >>
> >> Knut Anders Hatlen wrote:
> >>
> >>> Kim Moore <km...@google.com> writes:
> >>>
> >>>
> >>>
> >>>> I am working on a query that uses a user defined function in a group
> >>>> by clause.
> >>>>
> >>>> select myfunction (datecolumn)
> >>>>          ,count(*)
> >>>> from    table
> >>>> group by myfunction (datecolumn)
> >>>>
> >>>> Executing the query gives the error "The SELECT list of a group query
> >>>> contains at least one invalid expression."
> >>>>
> >>>> When I replace myfunction (user defined function) with a DATE
> function
> >>>> which comes standard with Derby, the query works.
> >>>>
> >>>> All help is greatly appreciated.
> >>>>
> >>>>
> >>> I think this is because user-defined functions are not known to be
> >>> deterministic. See this discussion for more details:
> >>> http://www.nabble.com/Functions-in-GROUP-BY-expressions--%28related-
> to-
> >>>
> >> DERBY-883%29-tf2517296.html
> >>
> >>> Derby 10.5 will support the DETERMINISTIC keyword in CREATE FUNCTION
> >>> statements (https://issues.apache.org/jira/browse/DERBY-3570), but I
> >>> haven't tested if that will actually allow you to use user-defined
> >>> functions in GROUP BY.
> >>>
> >>>
> >> Nothing was done to relax the limitation on user-defined functions in
> >> GROUP BY expressions. Even if you declare the function to be
> >> DETERMINISTIC, the query will fail.
> >>
> >>> Putting the function call in a subquery and renaming the column
> holding
> >>> the value returned by the function should work even without the
> >>> DETERMINISTIC keyword, though:
> >>>
> >>> select x, count(*) from
> >>>   (select myfunction(datecolumn) from mytable) t(x)
> >>> group by x
> >>>
> >>>
> >>>
> >> I agree that it would be reasonable to allow user-defined functions in
> >> GROUP BY expressions. I have logged DERBY-4003 to track this issue.
> That
> >> JIRA would be a good place to continue the discussion about whether we
> >> should limit this extension to DETERMINISTIC functions. The limitation
> >> does not seem to me to be rooted in the SQL standard, but there may be
> >> some good implementation-related reasons for maintaining it.
> >>
> >> Regards,
> >> -Rick
> >>
> >
> >
> >