You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cayenne.apache.org by Nikita Timofeev <nt...@objectstyle.com> on 2017/01/06 13:08:34 UTC

API for aggregate and non aggregate SQL functions

Hi all,

I would like to present new Cayenne feature in development to you: API
for SQL functions (both aggregate and non aggregate).

When it will be completed it will be possible to use functions in
select queries with pure API calls without writing any custom SQL or
EJBQL queries.
You'll be able to do something like this (though it may be not a final
version of API):

  long totalCount = ObjectSelect.query(Artist.class)
                                .column(Artist.ARTIST_COUNT)
                                .selectOne(context);

Or make even more complex queries with HAVING clause:

  List<Object[]> result = ObjectSelect.query(Artist.class)
                                        .columns(artistCount,
minSalary, namePart)

.where(Artist.DATE_OF_BIRTH.lt(new Date()))
                                        .or(...) // additional
condition in WHERE clause
                                        .having(namePart.like("P%"))
                                        .or(...) // additional
condition in HAVING clause
                                        .select(context);

  for(Object[] r : result) {
      long count = (long)r[0];
      int min = (int)r[1];
      String name = (String)r[2];
  }

All this queries will rely on custom properties that can be created
ad-hoc for specific query:

  Expression substrExp =
FunctionExpressionFactory.substringExp(Artist.ARTIST_NAME.path(), 10,
15);
  Property<String> namePart = Property.create("namePart", substrExp,
String.class);

or be defined directly in a model class:

  public static final Property<Long> ARTIST_COUNT =
Property.create("artistCount", FunctionExpressionFactory.countExp(),
Long.class);

You can find additional information and track progress in Apache JIRA:
https://issues.apache.org/jira/browse/CAY-2187

And of course any feedback will be really appreciated!

-- 
Best regards,
Nikita Timofeev

Re: API for aggregate and non aggregate SQL functions

Posted by Andrus Adamchik <an...@objectstyle.org>.
> On Jan 9, 2017, at 4:45 AM, Nikita Timofeev <nt...@objectstyle.com> wrote:
> 
>> 
>> Finally, 'having()' matches the SQL (HAVING is needed for queries on the result of functions). Have you thought about how that could be abstracted away so that the user can use where() and the correct SQL is still generated by Cayenne?
>> 
> 
> I can't prove but I think there are some cases when the final result
> depends on whether you use where() or having() on column.

Yes, "having" and "where" are 2 distinct filters. The former filters data before *aggregation*, the later - after. Aggregation functions operate on the result filtered with "where". No matter what API we ultimately provide, this distinction is unavoidable, and I'd rather we keep it explicit.

Andrus

Re: Column select API - fetch return types

Posted by Nikita Timofeev <nt...@objectstyle.com>.
I think your confusion is from the bug that I'm trying to solve right
now: single argument in columns() method leads to erroneous result:
return type is declared like Object[] but actual result is single
Object.
This shouldn't happen as columns() method should always return
Object[] regardless argument count. Only way to have single Object in
result is explicitly call column() or corresponding factory method
ColumnSelect.query(EntityClass.class, singleProperty).

And I'm implementing fetchDataRow() support for the ColumnSelect too.

So I hope it will reduce inconsistency in new API.

On Mon, Jan 16, 2017 at 12:41 PM, Hugi Thordarson <hu...@karlmenn.is> wrote:
> Hi all,
>
> I’ve been using the new column select API for the past week days and I’m loving it so far.
>
> One thing that I find a little bothersome (and at times confusing) when using these APIs though is how return types change based on what you’re fetching, especially how the type will change between List<Object> to List<Object[]> based on if I’m fetching multiple columns or a single column (I see this was discussed a little on the list the other day). As an API consumer I think it would be much nicer if fetches consistently returned List<Object[]>.
>
> If this isn’t possible, might a possible workaround be to make DataRow fetches always return DataRows? That would allow me to stick to a single API and have consistent results from fetches, regardless of whether I’m limiting myself to a set of columns. Example:
>
>                 List<DataRow> list = ObjectSelect.dataRowQuery( Entry.class )
>                                 .where( Entry.RECEIPT.dot( Receipt.USER ).dot( User.NAME ).eq( "Hugi Þórðarson" ) )
>                                 .columns( Entry.TEXT, Entry.PRICE )
>                                 .select( someObjectContext() );
>
>                 for( DataRow entry : list ) {
>                         System.out.println( entry.get( "text" ) );
>                 }
>
> Cheers,
> - hugi


-- 
Best regards,
Nikita Timofeev

Re: Column select API - fetch return types

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 16/1/17 8:41pm, Hugi Thordarson wrote:
> One thing that I find a little bothersome (and at times confusing) when using these APIs though is how return types change based on what you\u2019re fetching, especially how the type will change between List<Object> to List<Object[]> based on if I\u2019m fetching multiple columns or a single column (I see this was discussed a little on the list the other day). As an API consumer I think it would be much nicer if fetches consistently returned List<Object[]>.


The problem is that the result type depends on too many different things:

* the execute function: select(), selectOne()
* the initial constructor: dataRowQuery(), query()
* something in the middle: columns(), column()
* maybe some shortcut execute function (if we want to add such a thing) like: max(), sum()

Would a different initial constructor make things clearer (as Nikita's idea)?

   columnQuery(<T>Artist.class) -> ColumnSelect<T>

or is it simply enough to always return List<Object[]> and never List<Object>. That way, we never worry about mixing columns() and column().


When we want shortcuts for count() and sum(), then we could add another constructor like this:

   sum(Artist.class, <T>property) -> AggregateSelect<T>
   max(Artist.class, <T>property) -> AggregateSelect<T>
   min(Artist.class, <T>property) -> AggregateSelect<T>
   count(Artist.class, <T>property) -> ScalarSelect<Integer>

Is that too much? I don't think we need more shortcuts than those. And these classes would not implement orderBy(), pageSize() or prefetch()

Ari






-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A


Column select API - fetch return types

Posted by Hugi Thordarson <hu...@karlmenn.is>.
Hi all,

I’ve been using the new column select API for the past week days and I’m loving it so far.

One thing that I find a little bothersome (and at times confusing) when using these APIs though is how return types change based on what you’re fetching, especially how the type will change between List<Object> to List<Object[]> based on if I’m fetching multiple columns or a single column (I see this was discussed a little on the list the other day). As an API consumer I think it would be much nicer if fetches consistently returned List<Object[]>.

If this isn’t possible, might a possible workaround be to make DataRow fetches always return DataRows? That would allow me to stick to a single API and have consistent results from fetches, regardless of whether I’m limiting myself to a set of columns. Example:

		List<DataRow> list = ObjectSelect.dataRowQuery( Entry.class )
				.where( Entry.RECEIPT.dot( Receipt.USER ).dot( User.NAME ).eq( "Hugi Þórðarson" ) )
				.columns( Entry.TEXT, Entry.PRICE )
				.select( someObjectContext() );

		for( DataRow entry : list ) {
			System.out.println( entry.get( "text" ) );
		}

Cheers,
- hugi

Re: API for aggregate and non aggregate SQL functions

Posted by Nikita Timofeev <nt...@objectstyle.com>.
On Tue, Jan 10, 2017 at 5:45 AM, Aristedes Maniatis <ar...@maniatis.org> wrote:
> On 9/1/17 8:45pm, Nikita Timofeev wrote:
>> But your feedback is pushing me towards the solution that I didn't
>> want to use initially:
>> to introduce some abstract FluentSelect with descendants like
>> ObjectSelect and ColumnSelect.
>> Where ColumnSelect can be used directly or can be created within
>> column() or columns() method call in ObjectSelect (as you suggested in
>> the first place).
>>
>> Additional benefit of this new query that it can provide a nice
>> shortcut for the COUNT(*) query:
>>
>> long count = ColumnSelect.query(Artist.class, Property.COUNT)
>>         .where(Artist.ARTIST_NAME.like("artist%"))
>>         .selectOne(context);
>
>> And here is example for several properties:
>>
>> Object[] nameAndDate = ColumnSelect.query(Artist.class,
>> Artist.ARTIST_NAME, Artist.DATE_OF_BIRTH)
>>         .orderBy(Artist.ARTIST_NAME.asc())
>>         .selectFirst(context);
>
>
> I'm not convinced adding the columns to the constructor is any clearer than explicit columns() calls.
>
>
>> or
>>
>> Object[] nameAndDate = ColumnSelect.query(Artist.class)
>>         .columns(Artist.ARTIST_NAME)
>>         .columns(Artist.DATE_OF_BIRTH)
>>         .orderBy(Artist.ARTIST_NAME.asc())
>>         .selectFirst(context);
>>
>> Do you think that this is a more clear way of doing columns queries?
>> Or it is excessive for the purpose of avoiding limitation of multiple
>> column() method calls?
>
>
> What are the issues we are trying to solve here?
>
> 1. Mixing columns() and column() is broken.
> 2. Simpler syntax for count() and max() frequently used methods
>
>
> I can't easily see how to solve (1) except by always returning List<Object[]> even if Object[] only contains a single element (if there is just one column). Then we only have column() which can take a collection of properties or vargs list of properties.
>

I'm sorry I wasn't clear enough with this idea. It's really close to
have only columns() method with varargs or collection that will always
return Object[], but with option to have single column in constructor
to still allow shortcuts.
So as long as there is no public column() method in ColumnSelect you
can't be confused with
ColumnSelect.query(Artist.class).columns(..).column(..) but still can
go with ColumnSelect.query(Artist.class, Artist.ARTIST_NAME).
And this can be nicely used for count() and other aggregates too.

-- 
Best regards,
Nikita Timofeev

Re: API for aggregate and non aggregate SQL functions

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 9/1/17 8:45pm, Nikita Timofeev wrote:
> But your feedback is pushing me towards the solution that I didn't
> want to use initially:
> to introduce some abstract FluentSelect with descendants like
> ObjectSelect and ColumnSelect.
> Where ColumnSelect can be used directly or can be created within
> column() or columns() method call in ObjectSelect (as you suggested in
> the first place).
> 
> Additional benefit of this new query that it can provide a nice
> shortcut for the COUNT(*) query:
> 
> long count = ColumnSelect.query(Artist.class, Property.COUNT)
>         .where(Artist.ARTIST_NAME.like("artist%"))
>         .selectOne(context);

> And here is example for several properties:
> 
> Object[] nameAndDate = ColumnSelect.query(Artist.class,
> Artist.ARTIST_NAME, Artist.DATE_OF_BIRTH)
>         .orderBy(Artist.ARTIST_NAME.asc())
>         .selectFirst(context);


I'm not convinced adding the columns to the constructor is any clearer than explicit columns() calls.


> or
> 
> Object[] nameAndDate = ColumnSelect.query(Artist.class)
>         .columns(Artist.ARTIST_NAME)
>         .columns(Artist.DATE_OF_BIRTH)
>         .orderBy(Artist.ARTIST_NAME.asc())
>         .selectFirst(context);
> 
> Do you think that this is a more clear way of doing columns queries?
> Or it is excessive for the purpose of avoiding limitation of multiple
> column() method calls?


What are the issues we are trying to solve here?

1. Mixing columns() and column() is broken.
2. Simpler syntax for count() and max() frequently used methods


I can't easily see how to solve (1) except by always returning List<Object[]> even if Object[] only contains a single element (if there is just one column). Then we only have column() which can take a collection of properties or vargs list of properties.

For (2), your idea of Property.COUNT as a special constructor of this type of property then starts to be what you didn't want: Property class which knows about SQL expressions. But do we care about this purity over a clean and simple API for users?


Ari


-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: API for aggregate and non aggregate SQL functions

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 9/1/17 8:45pm, Nikita Timofeev wrote:
> ObjectSelect.query(Artist.class)
>      .columns(Artist.ARTIST_NAME, artistCount)
>      .having(Artist.ARTIST_NAME.like("a%"))
>      .or(artistCount.gt(2L)) // this goes to having
>      .select(context)

You are right. Since you can combine where() clauses with and/or and then do the same with having(), it will be too hard for Cayenne to just look at some expression and pull them apart into the right WHERE and HAVING sql components.

I forsee this will be a very common way to screw up your query. It seems that some databases treat HAVING just like WHERE if there are no aggregates [1] but I think that isn't the same for all database engines. And HAVING might be slower if you could have used WHERE instead.


Ari


[1] https://msdn.microsoft.com/en-us/library/ms180199.aspx

-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: Property expressions, Was: API for aggregate and non aggregate SQL functions

Posted by Nikita Timofeev <nt...@objectstyle.com>.
On Tue, Jan 10, 2017 at 5:27 AM, Aristedes Maniatis <ar...@maniatis.org> wrote:
> On 9/1/17 8:45pm, Nikita Timofeev wrote:
>>> Could that syntax above could be cleaner like:
>>>
>>>    Property<Integer> nameLength = Artist.ARTIST_NAME.function(Expression.SQL_LENGTH)
>>>
>>> or is that just going to result in a whole lot of clutter in the class?
>>>
>> Such methods will create too much coupling between properties and
>> functions, I'm afraid,
>> as Property class would know about all SQL functions and their arguments.
>
>
> I'm not convinced that all the SQL specific code would land in the Property class. That would be a wrapper to the factory methods elsewhere.
>
> I'm not sure the enum approach above is the cleanest, but some shorter simpler API which still gives us the correct Property<Integer> generics would be nice.

It's not a specific SQL code I don't like to see there, but a bunch of
methods that doesn't seem to be very popular:

public Property<String> substring(int length) {
    return Property.create(FunctionExpressionFactory.substringExp(getExpression(),
length, 0), String.class);
}

public Property<String> substring(int length, int offset) {
    return Property.create(FunctionExpressionFactory.substringExp(getExpression(),
length, offset), String.class);
}

public Property<E> sum() {
    return Property.create(FunctionExpressionFactory.sumExp(getExpression()),
getType());
}

public Property<Integer> length() {
    return Property.create(FunctionExpressionFactory.lengthExp(getExpression()),
Integer.class);
}

The single function() method is better though. And I can imagine this
way of creating properties:

Property<Long> paintingCount =
Artist.PAINTING_ARRAY.function(PropertyCreator.COUNT);

where PropertyCreator.COUNT is something like this:

public interface PropertyCreator<T> {
    PropertyCreator<Long> COUNT = new PropertyCreator<Long>() {
        public Property<Long> create(Property<?> property) {
            return
Property.create(FunctionExpressionFactory.countExp(property.getExpression()),
Long.class);
        }
    };

    Property<T> create(Property<?> property);
}

But there is a big problem with properties' types with
Property.function(...) approach (either with enum or with this
Creator).
While some functions like SUM() (along with MIN, MAX,AVG) depend on
original property type, others like COUNT or LENGTH have constant
return type.
So for now I don't see any solution good enough for all cases: we
would end up either with bunch of methods duplicating expression
factory or without proper type checking.

-- 
Best regards,
Nikita Timofeev

Property expressions, Was: API for aggregate and non aggregate SQL functions

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 9/1/17 8:45pm, Nikita Timofeev wrote:
>> Could that syntax above could be cleaner like:
>>
>>    Property<Integer> nameLength = Artist.ARTIST_NAME.function(Expression.SQL_LENGTH)
>>
>> or is that just going to result in a whole lot of clutter in the class?
>>
> Such methods will create too much coupling between properties and
> functions, I'm afraid,
> as Property class would know about all SQL functions and their arguments.


I'm not convinced that all the SQL specific code would land in the Property class. That would be a wrapper to the factory methods elsewhere.

I'm not sure the enum approach above is the cleanest, but some shorter simpler API which still gives us the correct Property<Integer> generics would be nice.


Ari




-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: API for aggregate and non aggregate SQL functions

Posted by Nikita Timofeev <nt...@objectstyle.com>.
On Sun, Jan 8, 2017 at 3:26 PM, Aristedes Maniatis <ar...@maniatis.org> wrote:
> On 8/1/17 1:03am, Nikita Timofeev wrote:
>
>> The later is closer to the design.
>> Both column() and columns() methods return same ObjectSelect object,
>> they only change result type (as fetchDataRows() method do):
>> column() changes result to ObjectSelect<T>, where T is type of
>> Property, so no cast of the result required
>> and columns() changes to ObjectSelect<Object[]>. The idea is that you
>> can still use all methods in ObjectSelect.
>
> Yes, fair enough and that makes sense. So could we do this:
>
> List<Object[]> namesAndBirthDates = ObjectSelect.query(Artist.class)
>               .column(Artist.ARTIST_NAME)
>               .column(Artist.ARTIST_BIRTH_DATE)
>               .select(context);
>
> how can we handle the result type?
>
> Will this work with selectOne() returning Object or Object[] in difference cases?
>
>>> I know factories are the Java way, but given how common count() and max() are, surely we'd want a shortcut. I'd like to be able to go directly to:
>>>
>>> int count = ObjectSelect.query(Artist.class)
>>>                 .where(Artist.ARTIST_NAME.eq("Ari"))
>>>                 .selectCount(context);
>>>
>> I want this shortcut too :) I'd like to avoid a feature bloat, so that
>> could be done later.
>
>
> I think its worth planning the API for this now, given that this is probably the most common function used.

With my current implementation column() method overrides result with
each successive call (in your example the result will be List<Date>
without name column)
in order to provide exact return type, whereas columns() method always
returns Object[] result.

But your feedback is pushing me towards the solution that I didn't
want to use initially:
to introduce some abstract FluentSelect with descendants like
ObjectSelect and ColumnSelect.
Where ColumnSelect can be used directly or can be created within
column() or columns() method call in ObjectSelect (as you suggested in
the first place).

Additional benefit of this new query that it can provide a nice
shortcut for the COUNT(*) query:

long count = ColumnSelect.query(Artist.class, Property.COUNT)
        .where(Artist.ARTIST_NAME.like("artist%"))
        .selectOne(context);

And here is example for several properties:

Object[] nameAndDate = ColumnSelect.query(Artist.class,
Artist.ARTIST_NAME, Artist.DATE_OF_BIRTH)
        .orderBy(Artist.ARTIST_NAME.asc())
        .selectFirst(context);

or

Object[] nameAndDate = ColumnSelect.query(Artist.class)
        .columns(Artist.ARTIST_NAME)
        .columns(Artist.DATE_OF_BIRTH)
        .orderBy(Artist.ARTIST_NAME.asc())
        .selectFirst(context);

Do you think that this is a more clear way of doing columns queries?
Or it is excessive for the purpose of avoiding limitation of multiple
column() method calls?

>
>> Property<Integer> nameLength =
>> Property.create(FunctionExpressionFactory.lengthExp(Artist.ARTIST_NAME.path()),
>> Integer.class);
>> List<Artist> artists = ObjectSelect.query(Artist.class)
>>                                .where(nameLength.gt(10))
>>                                .select(context);
>
> Could that syntax above could be cleaner like:
>
>    Property<Integer> nameLength = Artist.ARTIST_NAME.function(Expression.SQL_LENGTH)
>
> or is that just going to result in a whole lot of clutter in the class?
>

Such methods will create too much coupling between properties and
functions, I'm afraid,
as Property class would know about all SQL functions and their arguments.

>
> List<Object[]> result2 = ObjectSelect.query(Artist.class)
>                  .columns(Artist.ARTIST_NAME, Property.HOLLOW_OBJECT, paintingCountProperty)
>                  .having(paintingCountProperty.gt(10L))
>                  .select(context);
>
> This sort of query would be great for displaying a list of records (where you might want only a couple of columns to be loaded and displayed rather than whole objects). But then double clicking opens an edit view and you want to easily fault all its attributes without worrying about tracking its PK directly.
>

Yes, it would be really nice to have this feature. But as I mentioned
the exact implementation is out of scope for now.

>
> Finally, 'having()' matches the SQL (HAVING is needed for queries on the result of functions). Have you thought about how that could be abstracted away so that the user can use where() and the correct SQL is still generated by Cayenne?
>

I can't prove but I think there are some cases when the final result
depends on whether you use where() or having() on column.
Something like this (though in this exact case or() method can still
trace the difference):

ObjectSelect.query(Artist.class)
     .columns(Artist.ARTIST_NAME, artistCount)
     .where(Artist.ARTIST_NAME.like("a%"))
     .having(artistCount.gt(2L))
     .select(context)

vs

ObjectSelect.query(Artist.class)
     .columns(Artist.ARTIST_NAME, artistCount)
     .having(Artist.ARTIST_NAME.like("a%"))
     .or(artistCount.gt(2L)) // this goes to having
     .select(context)

But more important that it's really hard to find whether expression
can be used in having() or with where() clause as this requires to
trace every expression
backwards to its db attribute before assembling the final SQL.
So in this case I vote for the simplicity.

-- 
Best regards,
Nikita Timofeev

Re: API for aggregate and non aggregate SQL functions

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 8/1/17 1:03am, Nikita Timofeev wrote:

> The later is closer to the design.
> Both column() and columns() methods return same ObjectSelect object,
> they only change result type (as fetchDataRows() method do):
> column() changes result to ObjectSelect<T>, where T is type of
> Property, so no cast of the result required
> and columns() changes to ObjectSelect<Object[]>. The idea is that you
> can still use all methods in ObjectSelect.

Yes, fair enough and that makes sense. So could we do this:

List<Object[]> namesAndBirthDates = ObjectSelect.query(Artist.class)
              .column(Artist.ARTIST_NAME)
              .column(Artist.ARTIST_BIRTH_DATE)
              .select(context);

how can we handle the result type?

Will this work with selectOne() returning Object or Object[] in difference cases?


> Property<Integer> nameLength =
> Property.create(FunctionExpressionFactory.lengthExp(Artist.ARTIST_NAME.path()),
> Integer.class);
> List<Artist> artists = ObjectSelect.query(Artist.class)
>                                .where(nameLength.gt(10))
>                                .select(context);

Could that syntax above could be cleaner like:

   Property<Integer> nameLength = Artist.ARTIST_NAME.function(Expression.SQL_LENGTH)

or is that just going to result in a whole lot of clutter in the class?


>> I know factories are the Java way, but given how common count() and max() are, surely we'd want a shortcut. I'd like to be able to go directly to:
>>
>> int count = ObjectSelect.query(Artist.class)
>>                 .where(Artist.ARTIST_NAME.eq("Ari"))
>>                 .selectCount(context);
>>
> I want this shortcut too :) I'd like to avoid a feature bloat, so that
> could be done later.


I think its worth planning the API for this now, given that this is probably the most common function used.

I was trying to think of some syntax using Java 8 lambas, but couldn't quite see how it might work cleanly.




>> would a map be more useful to return than this tuple style approach? Java's collections are a bit clunky, but:
>>
>>
>> Map<PersistentObject, Object[]> result2 = ObjectSelect.query(Artist.class)
>>                 .columns(Artist.ARTIST_NAME, paintingCountProperty)
>>                 .having(paintingCountProperty.gt(10L))
>>                 .select(context);
>>
> I'm not sure that Map<> will be a good approach to the problem.
> We actually discussed with Andrus how PersistentObject can be returned
> with columns() method.
> And we decided that it will be better to define explicitly that you
> want PersistentObject in result:
> 
> Property<Artist> artistSelf = Property.create("hollowArtist", ?some
> expression?, Artist.class);
> List<Object[]> result2 = ObjectSelect.query(Artist.class)
>                 .columns(Artist.ARTIST_NAME, artistSelf, paintingCountProperty)
>                 .having(paintingCountProperty.gt(10L))
>                 .select(context);
> 
> What do you think about it?


Seems reasonable. Given this could be a very useful approach, it makes sense to make the syntax above a it more explicit or simple.


List<Object[]> result2 = ObjectSelect.query(Artist.class)
                 .columns(Artist.ARTIST_NAME, Property.HOLLOW_OBJECT, paintingCountProperty)
                 .having(paintingCountProperty.gt(10L))
                 .select(context);

This sort of query would be great for displaying a list of records (where you might want only a couple of columns to be loaded and displayed rather than whole objects). But then double clicking opens an edit view and you want to easily fault all its attributes without worrying about tracking its PK directly.




Finally, 'having()' matches the SQL (HAVING is needed for queries on the result of functions). Have you thought about how that could be abstracted away so that the user can use where() and the correct SQL is still generated by Cayenne?


Ari




-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: API for aggregate and non aggregate SQL functions

Posted by Andrus Adamchik <an...@objectstyle.org>.
> On Jan 7, 2017, at 5:03 PM, Nikita Timofeev <nt...@objectstyle.com> wrote:
> 
>> 
>> would a map be more useful to return than this tuple style approach? Java's collections are a bit clunky, but:
>> 
>> 
>> Map<PersistentObject, Object[]> result2 = ObjectSelect.query(Artist.class)
>>                .columns(Artist.ARTIST_NAME, paintingCountProperty)
>>                .having(paintingCountProperty.gt(10L))
>>                .select(context);
>> 
> 
> I'm not sure that Map<> will be a good approach to the problem.
> We actually discussed with Andrus how PersistentObject can be returned
> with columns() method.
> And we decided that it will be better to define explicitly that you
> want PersistentObject in result:
> 
> Property<Artist> artistSelf = Property.create("hollowArtist", ?some
> expression?, Artist.class);
> List<Object[]> result2 = ObjectSelect.query(Artist.class)
>                .columns(Artist.ARTIST_NAME, artistSelf, paintingCountProperty)
>                .having(paintingCountProperty.gt(10L))
>                .select(context);

Yeah, at this point we are constrained with the core return formats that are supported elsewhere in Cayenne (which are already pretty complex). We may (and will need to) rethink it in the later versions. But for 4.0 I suggest that we avoid any more redesign that will delay the release. As a reminder the current format is this:

List<? extends Persistent>
List<DataRow>
List<Object> // a list of scalars
List<Object[]> // a list of Object[] with each array column being Persistent|DataRow|scalar

Of course any of these can be easily converted to desired representations with simple functional one-liners by the users.

Andrus




Re: API for aggregate and non aggregate SQL functions

Posted by Nikita Timofeev <nt...@objectstyle.com>.
Hi Ari

Thank you for your so detailed feedback!

I'll try to explain a little bit our new design. (By the way you can
already try some parts of it: string and math functional expressions
and column() / columns() methods)

> The problem here is that selectOne no longer returns <Artist.class>. This seems like a backward step to me and a significant breakage of existing code. Do we need instead
>
> long totalCount = ObjectSelect.query(Artist.class)
>                         .selectColumn(context, Artist.ARTIST_COUNT);

No, selectOne() method is untouched. In fact there is no
incompatibility introduced by this feature. The only new thing
(explicit type in Property) is needed only if you are going to use
this new feature.

>
> or is it your idea that ObjectSelect.column(Artist.ARTIST_COUNT) returns some new ColumnSelect object type? Is that confusing or more helpful than selectColumn() approach?
>

The later is closer to the design.
Both column() and columns() methods return same ObjectSelect object,
they only change result type (as fetchDataRows() method do):
column() changes result to ObjectSelect<T>, where T is type of
Property, so no cast of the result required
and columns() changes to ObjectSelect<Object[]>. The idea is that you
can still use all methods in ObjectSelect.

Actually, column() method is only a shortcut of columns() method made
to return correct type for single Property case without type casting.
I.e. if you need only names of Artists you can do this :

  List<String> names = ObjectSelect.query(Artist.class)

.column(Artist.ARTIST_NAME).select(context);

And if you need more than one property then columns() method is your choice:

  List<Object[]> namesAndBirthDates = ObjectSelect.query(Artist.class)
                                         .columns(Artist.ARTIST_NAME,
Artist.ARTIST_BIRTH_DATE).select(context);

So selectColumn() may be a handy method but then it will be
inconsistent with columns().


> Now this is interesting. What generators do you anticipate having in FunctionExpressionFactory?
> I'm assuming just those things that are commonly available in many SQL implementations?

Yes, you are right.
There will be all standard aggregate functions:
* COUNT
* MIN
* MAX
* AVG
* SUM
and common SQL functions (based on JPA standard):
* CONCAT
* SUBSTRING
* TRIM
* LOWER
* UPPER
* LENGTH
* LOCATE
* SQRT
* MOD
* ABS
* CURRENT_DATE
* CURRENT_TIME
* CURRENT_TIMESTAMP


> Do you imagine that these properties will only implement SQL functions, or would it also be possible to implement Java functions which act on the data once returned from JDBC?

For now only SQL functions will be implemented. Using java functions
directly on the result would probably be a nice feature on its own,
but I don't see an easy way to use them as SQL functions could be
used,
e.g. in WHERE, ORDER BY, HAVING clauses:

Property<Integer> nameLength =
Property.create(FunctionExpressionFactory.lengthExp(Artist.ARTIST_NAME.path()),
Integer.class);
List<Artist> artists = ObjectSelect.query(Artist.class)
                               .where(nameLength.gt(10))
                               .select(context);


> I know factories are the Java way, but given how common count() and max() are, surely we'd want a shortcut. I'd like to be able to go directly to:
>
> int count = ObjectSelect.query(Artist.class)
>                 .where(Artist.ARTIST_NAME.eq("Ari"))
>                 .selectCount(context);
>

I want this shortcut too :) I'd like to avoid a feature bloat, so that
could be done later.


> would a map be more useful to return than this tuple style approach? Java's collections are a bit clunky, but:
>
>
> Map<PersistentObject, Object[]> result2 = ObjectSelect.query(Artist.class)
>                 .columns(Artist.ARTIST_NAME, paintingCountProperty)
>                 .having(paintingCountProperty.gt(10L))
>                 .select(context);
>

I'm not sure that Map<> will be a good approach to the problem.
We actually discussed with Andrus how PersistentObject can be returned
with columns() method.
And we decided that it will be better to define explicitly that you
want PersistentObject in result:

Property<Artist> artistSelf = Property.create("hollowArtist", ?some
expression?, Artist.class);
List<Object[]> result2 = ObjectSelect.query(Artist.class)
                .columns(Artist.ARTIST_NAME, artistSelf, paintingCountProperty)
                .having(paintingCountProperty.gt(10L))
                .select(context);

What do you think about it?

And final hidden thought we have (not in any plans, may be not even in 4.0).
It will be possible to do relatively simple conversion into some
arbitrary POJO with plain Object[].


On Sat, Jan 7, 2017 at 5:35 AM, Aristedes Maniatis <ar...@maniatis.org> wrote:
> On 7/1/17 12:08am, Nikita Timofeev wrote:
>> Hi all,
>>
>> I would like to present new Cayenne feature in development to you: API
>> for SQL functions (both aggregate and non aggregate).
>>
>> When it will be completed it will be possible to use functions in
>> select queries with pure API calls without writing any custom SQL or
>> EJBQL queries.
>
> Wow, that's brilliant. And so very useful. I like your implementation using Properties which is unexpected but very powerful.
>
> Some of my notes below might be painting the bikeshed, but it might help us understand your thinking...
>
>
>> You'll be able to do something like this (though it may be not a final
>> version of API):
>>
>>   long totalCount = ObjectSelect.query(Artist.class)
>>                                 .column(Artist.ARTIST_COUNT)
>>                                 .selectOne(context);
>
>
> The problem here is that selectOne no longer returns <Artist.class>. This seems like a backward step to me and a significant breakage of existing code. Do we need instead
>
> long totalCount = ObjectSelect.query(Artist.class)
>                         .selectColumn(context, Artist.ARTIST_COUNT);
>
>
> or is it your idea that ObjectSelect.column(Artist.ARTIST_COUNT) returns some new ColumnSelect object type? Is that confusing or more helpful than selectColumn() approach?
>
>
>
>> All this queries will rely on custom properties that can be created
>> ad-hoc for specific query:
>>
>>   Expression substrExp =
>> FunctionExpressionFactory.substringExp(Artist.ARTIST_NAME.path(), 10,
>> 15);
>>   Property<String> namePart = Property.create("namePart", substrExp,
>> String.class);
>
>
> Now this is interesting. What generators do you anticipate having in FunctionExpressionFactory?
>
> * count
> * max
> * min
> * substring
>
> I'm assuming just those things that are commonly available in many SQL implementations?
>
> Do you imagine that these properties will only implement SQL functions, or would it also be possible to implement Java functions which act on the data once returned from JDBC?
>
>
> I know factories are the Java way, but given how common count() and max() are, surely we'd want a shortcut. I'd like to be able to go directly to:
>
> int count = ObjectSelect.query(Artist.class)
>                 .where(Artist.ARTIST_NAME.eq("Ari"))
>                 .selectCount(context);
>
> Without casting and unwrapping the result for such a common use-case.
>
>
>
> On the other hand, here:
>
> Expression paintingCountExp = FunctionExpressionFactory.countExp(Artist.PAINTING_ARRAY.path());
> Property<Long> paintingCountProperty = Property.create("paintingCount", paintingCountExp, Long.class);
> List<Object[]> result2 = ObjectSelect.query(Artist.class)
>                 .columns(Artist.ARTIST_NAME, paintingCountProperty)
>                 .having(paintingCountProperty.gt(10L))
>                 .select(context);
>
>
> would a map be more useful to return than this tuple style approach? Java's collections are a bit clunky, but:
>
>
> Map<PersistentObject, Object[]> result2 = ObjectSelect.query(Artist.class)
>                 .columns(Artist.ARTIST_NAME, paintingCountProperty)
>                 .having(paintingCountProperty.gt(10L))
>                 .select(context);
>
> Where PersistentObject is a hollow Artist. You can get the id from it directly, or fault the object to get its attributes. But still get the name and count from the Object[] without hitting the database again.
>
> And if you really didn't want the map, you still can use result2.values() with (I guess) some performance hit:
>
> * forcing SQL to always to fetch the PK
> * building a map instead of array
> * unwrapping the map
>
>
>
> I think this work could be the most important and widely used changes to Cayenne in years. But it is important to make it really easy to use.
>
>
> I looked at Rails AR to see how they did it, and at Hibernate, but I don't think either have useful patterns that were really clean. For reference Hibernate is something like:
>
> Criteria crit = session.createCriteria(Artist.class);
> crit.add( Restrictions.eq("name", "Ari"));
> crit.setProjection(Projections.rowCount());
> Integer count = (Integer)crit.uniqueResult();
>
>
> Hibernate Projections/Criteria look a bit like the way Properties and Expressions are used above, but your columns() idea is much easier to read.
>
>
> Cheers
> Ari
>
>
>
> --
> -------------------------->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A



-- 
Best regards,
Nikita Timofeev

Re: API for aggregate and non aggregate SQL functions

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 7/1/17 12:08am, Nikita Timofeev wrote:
> Hi all,
> 
> I would like to present new Cayenne feature in development to you: API
> for SQL functions (both aggregate and non aggregate).
> 
> When it will be completed it will be possible to use functions in
> select queries with pure API calls without writing any custom SQL or
> EJBQL queries.

Wow, that's brilliant. And so very useful. I like your implementation using Properties which is unexpected but very powerful.

Some of my notes below might be painting the bikeshed, but it might help us understand your thinking...


> You'll be able to do something like this (though it may be not a final
> version of API):
> 
>   long totalCount = ObjectSelect.query(Artist.class)
>                                 .column(Artist.ARTIST_COUNT)
>                                 .selectOne(context);


The problem here is that selectOne no longer returns <Artist.class>. This seems like a backward step to me and a significant breakage of existing code. Do we need instead

long totalCount = ObjectSelect.query(Artist.class)
                        .selectColumn(context, Artist.ARTIST_COUNT);


or is it your idea that ObjectSelect.column(Artist.ARTIST_COUNT) returns some new ColumnSelect object type? Is that confusing or more helpful than selectColumn() approach?



> All this queries will rely on custom properties that can be created
> ad-hoc for specific query:
> 
>   Expression substrExp =
> FunctionExpressionFactory.substringExp(Artist.ARTIST_NAME.path(), 10,
> 15);
>   Property<String> namePart = Property.create("namePart", substrExp,
> String.class);


Now this is interesting. What generators do you anticipate having in FunctionExpressionFactory?

* count
* max
* min
* substring

I'm assuming just those things that are commonly available in many SQL implementations?

Do you imagine that these properties will only implement SQL functions, or would it also be possible to implement Java functions which act on the data once returned from JDBC?


I know factories are the Java way, but given how common count() and max() are, surely we'd want a shortcut. I'd like to be able to go directly to:

int count = ObjectSelect.query(Artist.class)
                .where(Artist.ARTIST_NAME.eq("Ari"))
                .selectCount(context);

Without casting and unwrapping the result for such a common use-case.



On the other hand, here:

Expression paintingCountExp = FunctionExpressionFactory.countExp(Artist.PAINTING_ARRAY.path());
Property<Long> paintingCountProperty = Property.create("paintingCount", paintingCountExp, Long.class);
List<Object[]> result2 = ObjectSelect.query(Artist.class)
                .columns(Artist.ARTIST_NAME, paintingCountProperty)
                .having(paintingCountProperty.gt(10L))
                .select(context);


would a map be more useful to return than this tuple style approach? Java's collections are a bit clunky, but:


Map<PersistentObject, Object[]> result2 = ObjectSelect.query(Artist.class)
                .columns(Artist.ARTIST_NAME, paintingCountProperty)
                .having(paintingCountProperty.gt(10L))
                .select(context);

Where PersistentObject is a hollow Artist. You can get the id from it directly, or fault the object to get its attributes. But still get the name and count from the Object[] without hitting the database again.

And if you really didn't want the map, you still can use result2.values() with (I guess) some performance hit:

* forcing SQL to always to fetch the PK
* building a map instead of array
* unwrapping the map



I think this work could be the most important and widely used changes to Cayenne in years. But it is important to make it really easy to use.


I looked at Rails AR to see how they did it, and at Hibernate, but I don't think either have useful patterns that were really clean. For reference Hibernate is something like:

Criteria crit = session.createCriteria(Artist.class);
crit.add( Restrictions.eq("name", "Ari"));
crit.setProjection(Projections.rowCount());
Integer count = (Integer)crit.uniqueResult();


Hibernate Projections/Criteria look a bit like the way Properties and Expressions are used above, but your columns() idea is much easier to read.


Cheers
Ari



-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: API for aggregate and non aggregate SQL functions

Posted by Andrus Adamchik <an...@objectstyle.org>.
This is really nice and something that we wanted to have in Cayenne for years! It plugs a huge hole in the query capabilities and will be a major 4.0 feature. I am looking forward to upgrading a few of my projects and rewriting a bunch of queries using the new API. 

Andrus

> On Jan 6, 2017, at 4:08 PM, Nikita Timofeev <nt...@objectstyle.com> wrote:
> 
> Hi all,
> 
> I would like to present new Cayenne feature in development to you: API
> for SQL functions (both aggregate and non aggregate).
> 
> When it will be completed it will be possible to use functions in
> select queries with pure API calls without writing any custom SQL or
> EJBQL queries.
> You'll be able to do something like this (though it may be not a final
> version of API):
> 
>  long totalCount = ObjectSelect.query(Artist.class)
>                                .column(Artist.ARTIST_COUNT)
>                                .selectOne(context);
> 
> Or make even more complex queries with HAVING clause:
> 
>  List<Object[]> result = ObjectSelect.query(Artist.class)
>                                        .columns(artistCount,
> minSalary, namePart)
> 
> .where(Artist.DATE_OF_BIRTH.lt(new Date()))
>                                        .or(...) // additional
> condition in WHERE clause
>                                        .having(namePart.like("P%"))
>                                        .or(...) // additional
> condition in HAVING clause
>                                        .select(context);
> 
>  for(Object[] r : result) {
>      long count = (long)r[0];
>      int min = (int)r[1];
>      String name = (String)r[2];
>  }
> 
> All this queries will rely on custom properties that can be created
> ad-hoc for specific query:
> 
>  Expression substrExp =
> FunctionExpressionFactory.substringExp(Artist.ARTIST_NAME.path(), 10,
> 15);
>  Property<String> namePart = Property.create("namePart", substrExp,
> String.class);
> 
> or be defined directly in a model class:
> 
>  public static final Property<Long> ARTIST_COUNT =
> Property.create("artistCount", FunctionExpressionFactory.countExp(),
> Long.class);
> 
> You can find additional information and track progress in Apache JIRA:
> https://issues.apache.org/jira/browse/CAY-2187
> 
> And of course any feedback will be really appreciated!
> 
> -- 
> Best regards,
> Nikita Timofeev