You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Carsten Ziegeler <cz...@apache.org> on 2015/05/18 08:17:26 UTC

[RT] New resource query API

The current resource query api has several problems:
- it's using the JCR spec to define a query
- it's not clear which queries are supported by providers
- queries are string based
- implementing queries in a resource provider is way too hard as this
would require to implement the complete jcr query api.

I've created a draft for a new, object based API at [1]. The main idea
is to use a builder pattern to create Query objects. This are immutable
and have a unique identifier. The QueryManager service can be used to
execute a query in the context of a resource resolver. The manager
delegates the query to the providers. As each Query object has this
identifier, implementations can use this to cache the parsing of the query.
In addition to the query object you can pass in query instructions to
specify a limit or range for the query.

Obviously this is a reduced set compared to the full fledged jcr search
api, however it should be suitable for the majority of use cases.

[1]
https://svn.apache.org/repos/asf/sling/whiteboard/cziegeler/api-v3/src/main/java/org/apache/sling/api/resource/query/

Regards
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 01.06.15 um 14:00 schrieb Alexander Klimetschek:
> On 01.06.2015, at 08:50, Carsten Ziegeler <cz...@apache.org> wrote:
>> So I think, we should support sorting across providers
> 
> This means sorting will be very slow. Since you have to re-sort the partial results on the sling level without usage of an index.
> 
No, that's not true - each provider sorts, so all you have to do is
merge the sorted results, which is trivial and does not require any
resorting or processing of the full data set

Carsten


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 09.06.2015, at 22:24, Carsten Ziegeler <cz...@apache.org> wrote:
>> If you mean whatever query language is supported by mongo or other nosql providers (and fits into a string), then that's the best you can do on the Sling layer.
>> 
> Thanks for confirming my point why an abstraction is necessary :)

But that's my point: introducing abstraction here comes at great cost. It's much easier to set up an external search index such as Solr, pump data from different backends (behind the different resource providers) and query this one directly for a search across everything (if this is what an application needs).

Cheers,
Alex


Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 09.06.15 um 21:01 schrieb Alexander Klimetschek:
> On 04.06.2015, at 21:56, Carsten Ziegeler <cz...@apache.org> wrote:
>> Ok, could you then please provide an implementation for the mongo
>> resource provider or one of the other nosql providers?
> 
> If you mean "jcr xpath" (or any other jcr/oak supported query language), no.
> 
> If you mean whatever query language is supported by mongo or other nosql providers (and fits into a string), then that's the best you can do on the Sling layer.
> 
Thanks for confirming my point why an abstraction is necessary :)

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 04.06.2015, at 21:56, Carsten Ziegeler <cz...@apache.org> wrote:
> Ok, could you then please provide an implementation for the mongo
> resource provider or one of the other nosql providers?

If you mean "jcr xpath" (or any other jcr/oak supported query language), no.

If you mean whatever query language is supported by mongo or other nosql providers (and fits into a string), then that's the best you can do on the Sling layer.

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 04.06.15 um 15:52 schrieb Alexander Klimetschek:
> On 03.06.2015, at 14:52, Carsten Ziegeler <cz...@apache.org> wrote:
>> Well, let's agree that we disagree here. For the majority of users,
>> there is only JCR anyway, which means there is no difference between
>> using a nice api and fiddling with strings by hand when it comes to
>> performance.
> 
> And that's exaclty the argument for not having a new query API that is agnostic to the individual resource provider implementations and would support aggregation.
> 
> What's there with findResources() is fine - just do your jcr query there. Only passing offset/limit is missing.
> 
Ok, could you then please provide an implementation for the mongo
resource provider or one of the other nosql providers?

Thanks
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 03.06.2015, at 14:52, Carsten Ziegeler <cz...@apache.org> wrote:
> Well, let's agree that we disagree here. For the majority of users,
> there is only JCR anyway, which means there is no difference between
> using a nice api and fiddling with strings by hand when it comes to
> performance.

And that's exaclty the argument for not having a new query API that is agnostic to the individual resource provider implementations and would support aggregation.

What's there with findResources() is fine - just do your jcr query there. Only passing offset/limit is missing.

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 03.06.15 um 11:40 schrieb Alexander Klimetschek:
> On 02.06.2015, at 16:39, Carsten Ziegeler <cz...@apache.org> wrote:
>> The query contains the sort information (which properties and whether
>> ascending or descending), so you can get the values of the props and
>> compare them.
> 
> But then you need to
> 
> a) be able to understand and fully evaluate the query on the aggregate level

Nope, just the sorting

> b) cannot use an index for that and do a property read for every result entry

It's true, the property is read for every entry - but only(!) if there
is more than one resource provider providing results.

> 
> I am just saying, you are getting into a very complex and performance critical business. And even if you say "users won't use such edge-case queries, it's ok if we don't look for perfect performance", they will use it (experience tells me, people always find ways to run difficult & slow queries :D), and then you have just created a new performance critical area out of nowhere.

Well, let's agree that we disagree here. For the majority of users,
there is only JCR anyway, which means there is no difference between
using a nice api and fiddling with strings by hand when it comes to
performance.

Regards
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 02.06.2015, at 16:39, Carsten Ziegeler <cz...@apache.org> wrote:
> The query contains the sort information (which properties and whether
> ascending or descending), so you can get the values of the props and
> compare them.

But then you need to

a) be able to understand and fully evaluate the query on the aggregate level
b) cannot use an index for that and do a property read for every result entry

I am just saying, you are getting into a very complex and performance critical business. And even if you say "users won't use such edge-case queries, it's ok if we don't look for perfect performance", they will use it (experience tells me, people always find ways to run difficult & slow queries :D), and then you have just created a new performance critical area out of nowhere.

Cheers,
Alex


Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 02.06.15 um 12:32 schrieb Alexander Klimetschek:
> On 02.06.2015, at 05:17, Daniel Klco <da...@gmail.com> wrote:
>> @Alex,
>>
>> Sorting wouldn't necessarily be slow.  What you could do is have the API
>> return an iterator which wraps the sorted result iterator from the various
>> resource providers.
> 
> And how do you know that result X from provider A is > result Y from provider B?
> 
The query contains the sort information (which properties and whether
ascending or descending), so you can get the values of the props and
compare them.

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 02.06.2015, at 05:17, Daniel Klco <da...@gmail.com> wrote:
> @Alex,
> 
> Sorting wouldn't necessarily be slow.  What you could do is have the API
> return an iterator which wraps the sorted result iterator from the various
> resource providers.

And how do you know that result X from provider A is > result Y from provider B?

Cheers,
Alex


Re: [RT] New resource query API

Posted by Daniel Klco <da...@gmail.com>.
@Alex,

Sorting wouldn't necessarily be slow.  What you could do is have the API
return an iterator which wraps the sorted result iterator from the various
resource providers.  This iterator would keep track of the next value of
the iterator from every resource provider result and return the closest
value for every next call of the wrapping iterator.  This should result in
performance of approximately On where n is the number of resource providers
for each next call.

In the case where sorting isn't provided, I would think it would just
interleave the results from the various resource providers.

-Dan

On Mon, Jun 1, 2015 at 5:00 PM, Alexander Klimetschek <ak...@adobe.com>
wrote:

> On 01.06.2015, at 08:50, Carsten Ziegeler <cz...@apache.org> wrote:
> > So I think, we should support sorting across providers
>
> This means sorting will be very slow. Since you have to re-sort the
> partial results on the sling level without usage of an index.
>
> Cheers,
> Alex

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 01.06.2015, at 08:50, Carsten Ziegeler <cz...@apache.org> wrote:
> So I think, we should support sorting across providers

This means sorting will be very slow. Since you have to re-sort the partial results on the sling level without usage of an index.

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
One thing we have to think of is how to use pagination and sorting.
A search could go across the whole resource tree hitting different
resource providers. For example one use case is getting all vanity paths.
Sorting such a search is possible, the resource resolver implementation
always picks one result from all providers, compares them, brings them
into an order etc. That's definitely doable.

However pagination is a different beast. Of course, for paging the
result needs to be sorted. Today, a common practice for doing pagination
is key based pagination: with the search result you get a key that is
used for the next page. We can easily do this if the search is just
hitting a single resource provider. However if the search targets more
than one, pagination becomes a problem. For example, if the search has a
page size of 20 and two providers might provide resources. Asking each
of them for 20 entries, is not very efficient. Asking each provider for
ten, hoping that it's evenly split does usually not work either. I guess
one could come up with some clever algorithm that solves the problem by
potentially doing two (or more ) searches against a provider. But this
will be complicated and not really perform well.

So I think, we should support sorting across providers, but pagination
only across a single provider and throw some exception if a search
potentially hits more than one provider. It's a limitation we can document.

WDYT?

Regards
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 30.05.15 um 10:31 schrieb Stefan Seifert:

> btw. i assume we do not remove the old support for directly passing a query string to the resource resolver, but add the additional support for the abstraction? this would allow experienced developers who now they are only using JCR still use direct JCR queries against the resource resolver.
> 
Exactly, right - thanks for pointing this out.

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: [RT] New resource query API

Posted by Stefan Seifert <ss...@pro-vision.de>.
>But do you have queries across resource providers? Do you know the
>implementation complexity and performance limitations you are asking for?

no, i never required searching across different providers in the past, it would even be ok for me to not support cross-provider searching in the beginning to keep things simple.


>"Choosing the most performent query" is absolutely non trivial. And it
>requires your resource provider implementation to be able to ask it's
>underlying database/repository to figure that out (before you translate to the
>particular query itself). Here you'll have tons of places where the
>abstraction will leak and/or you can't get the right performance.

the idea is that there is one resource provider impl per backend, so each provider does only have to know the specialities of it's backend. but ok concerning JCR there are already two underlying implementations with JCR2 and oak. i agree that this is an absolutely nontrivial task and should not be part of a first implementation as well.

stefan


Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 01.06.15 um 13:58 schrieb Alexander Klimetschek:

> But do you have queries across resource providers? Do you know the implementation complexity and performance limitations you are asking for?
> 
> If you have different resource providers with their own search index, and you have to aggregate on the resource resolver/tree level, you basically start having JOINs without any indexes, and these will be slow.

These are not joins, you get two (or more) result sets and you simply
return all of them.

Carsten


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 30.05.2015, at 01:31, Stefan Seifert <ss...@pro-vision.de> wrote:
> 
>>> And this is a typical case where abstraction fails: performance. Which is
>> extremely important for queries.
>>> 
>> Well, this is a broad statement and neither true nor wrong.
> 
> i'm the same opinion as carsten. i did a quick check for the most queries in our projects from the last years and most of them can be expressed with an API like this and the code maintainability would benefit from it. and for new developers its easier to learn a fluent API then a query syntax.

But do you have queries across resource providers? Do you know the implementation complexity and performance limitations you are asking for?

If you have different resource providers with their own search index, and you have to aggregate on the resource resolver/tree level, you basically start having JOINs without any indexes, and these will be slow.

> and the abstraction may even help improve performance for the unexperienced ones - there was some time in jackrabbit 2 where the same query in either xpath or sql syntax was quite differently in performance - if such an abstraction is implemented in an intelligent way it could always use the most performant query variant, and the user of the query does not have to care about those implementation details. of course this makes the implementation of the abstraction much more complex.

"Choosing the most performent query" is absolutely non trivial. And it requires your resource provider implementation to be able to ask it's underlying database/repository to figure that out (before you translate to the particular query itself). Here you'll have tons of places where the abstraction will leak and/or you can't get the right performance.

Also, be careful to tell (unexperienced) developers to magically do the most performance query - this just doesn't work that way in practice, especially if you have different backends involved, they will have to understand what's going on.

Cheers,
Alex

> btw. i assume we do not remove the old support for directly passing a query string to the resource resolver, but add the additional support for the abstraction? this would allow experienced developers who now they are only using JCR still use direct JCR queries against the resource resolver.
> 
> stefan 


RE: [RT] New resource query API

Posted by Stefan Seifert <ss...@pro-vision.de>.
>> And this is a typical case where abstraction fails: performance. Which is
>extremely important for queries.
>>
>Well, this is a broad statement and neither true nor wrong.

i'm the same opinion as carsten. i did a quick check for the most queries in our projects from the last years and most of them can be expressed with an API like this and the code maintainability would benefit from it. and for new developers its easier to learn a fluent API then a query syntax.

and the abstraction may even help improve performance for the unexperienced ones - there was some time in jackrabbit 2 where the same query in either xpath or sql syntax was quite differently in performance - if such an abstraction is implemented in an intelligent way it could always use the most performant query variant, and the user of the query does not have to care about those implementation details. of course this makes the implementation of the abstraction much more complex.

btw. i assume we do not remove the old support for directly passing a query string to the resource resolver, but add the additional support for the abstraction? this would allow experienced developers who now they are only using JCR still use direct JCR queries against the resource resolver.

stefan 

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 30.05.15 um 03:17 schrieb Alexander Klimetschek:
> On 28.05.2015, at 23:08, Carsten Ziegeler <cz...@apache.org> wrote:
>> I agree with this, however users of the resource api do not know which
>> provider is serving the resources. That's the hole point of an abstraction.
> 
> And this is a typical case where abstraction fails: performance. Which is extremely important for queries.
> 
Well, this is a broad statement and neither true nor wrong.

Look at the use cases in Sling where we use a query, e.g. the job
handling. Replacing the current approach of generating a very long
string for the JCR query with a more modern api does not change anything
with respect to the query; but it provides the abstraction. The
execution of the query is still as fast or slow as before.
The same is true for most of the other technical queries we have in Sling.
A good abstraction provides code to run on different providers. Today,
we have abstracted nearly everything - with the only exception being a
query. Having a 80% abstraction is more or less as good as not having an
abstraction - which means, it's bad.
I don't understand why someone is opposed to complete the abstraction.
We are trying to reach this goal for a very long time now.


Carsten

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 28.05.2015, at 23:08, Carsten Ziegeler <cz...@apache.org> wrote:
> I agree with this, however users of the resource api do not know which
> provider is serving the resources. That's the hole point of an abstraction.

And this is a typical case where abstraction fails: performance. Which is extremely important for queries.

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 29.05.15 um 01:16 schrieb Alexander Klimetschek:
> When you run a query across multiple backends, you have to aggregate the results. This is non-trivial an in most cases you are better off using an external search index that covers everything. And from my experience, you usually you don't have the use case to search across different providers, e.g. if you have a) a file system provider for bundles and code and b) a database provider providing ecommerce order entries, you never search across both at the same time.
> 
I agree with this, however users of the resource api do not know which
provider is serving the resources. That's the hole point of an abstraction.

Carsten

> Cheers,
> Alex
> 
>> On 28.05.2015, at 11:06, Carsten Ziegeler <cz...@apache.org> wrote:
>>
>> Just to clarify as it seems people got the proposal wrong: this is about
>> a new API, not an implementation. It's an abstraction on the resource
>> level. Of course with a JCR provider underneath, the search is delegated
>> to that provider. Same with other providers.
>> It should be easy for every provider to implement the api.
>>
>> Typical use cases are for example the job handling which searches for
>> jobs in the resource tree or the resource resolver implementation
>> looking for vanity paths etc.
>>
>> Right now - although these parts use the resource api - it's not
>> possible to run them with a different provider than jcr.
>>
>> Carsten
>>
>> Am 18.05.15 um 08:17 schrieb Carsten Ziegeler:
>>> The current resource query api has several problems:
>>> - it's using the JCR spec to define a query
>>> - it's not clear which queries are supported by providers
>>> - queries are string based
>>> - implementing queries in a resource provider is way too hard as this
>>> would require to implement the complete jcr query api.
>>>
>>> I've created a draft for a new, object based API at [1]. The main idea
>>> is to use a builder pattern to create Query objects. This are immutable
>>> and have a unique identifier. The QueryManager service can be used to
>>> execute a query in the context of a resource resolver. The manager
>>> delegates the query to the providers. As each Query object has this
>>> identifier, implementations can use this to cache the parsing of the query.
>>> In addition to the query object you can pass in query instructions to
>>> specify a limit or range for the query.
>>>
>>> Obviously this is a reduced set compared to the full fledged jcr search
>>> api, however it should be suitable for the majority of use cases.
>>>
>>> [1]
>>> https://svn.apache.org/repos/asf/sling/whiteboard/cziegeler/api-v3/src/main/java/org/apache/sling/api/resource/query/
>>>
>>> Regards
>>> Carsten
>>>
>>
>>
>> -- 
>> Carsten Ziegeler
>> Adobe Research Switzerland
>> cziegeler@apache.org
> 
> 


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
When you run a query across multiple backends, you have to aggregate the results. This is non-trivial an in most cases you are better off using an external search index that covers everything. And from my experience, you usually you don't have the use case to search across different providers, e.g. if you have a) a file system provider for bundles and code and b) a database provider providing ecommerce order entries, you never search across both at the same time.

Cheers,
Alex

> On 28.05.2015, at 11:06, Carsten Ziegeler <cz...@apache.org> wrote:
> 
> Just to clarify as it seems people got the proposal wrong: this is about
> a new API, not an implementation. It's an abstraction on the resource
> level. Of course with a JCR provider underneath, the search is delegated
> to that provider. Same with other providers.
> It should be easy for every provider to implement the api.
> 
> Typical use cases are for example the job handling which searches for
> jobs in the resource tree or the resource resolver implementation
> looking for vanity paths etc.
> 
> Right now - although these parts use the resource api - it's not
> possible to run them with a different provider than jcr.
> 
> Carsten
> 
> Am 18.05.15 um 08:17 schrieb Carsten Ziegeler:
>> The current resource query api has several problems:
>> - it's using the JCR spec to define a query
>> - it's not clear which queries are supported by providers
>> - queries are string based
>> - implementing queries in a resource provider is way too hard as this
>> would require to implement the complete jcr query api.
>> 
>> I've created a draft for a new, object based API at [1]. The main idea
>> is to use a builder pattern to create Query objects. This are immutable
>> and have a unique identifier. The QueryManager service can be used to
>> execute a query in the context of a resource resolver. The manager
>> delegates the query to the providers. As each Query object has this
>> identifier, implementations can use this to cache the parsing of the query.
>> In addition to the query object you can pass in query instructions to
>> specify a limit or range for the query.
>> 
>> Obviously this is a reduced set compared to the full fledged jcr search
>> api, however it should be suitable for the majority of use cases.
>> 
>> [1]
>> https://svn.apache.org/repos/asf/sling/whiteboard/cziegeler/api-v3/src/main/java/org/apache/sling/api/resource/query/
>> 
>> Regards
>> Carsten
>> 
> 
> 
> -- 
> Carsten Ziegeler
> Adobe Research Switzerland
> cziegeler@apache.org


Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Just to clarify as it seems people got the proposal wrong: this is about
a new API, not an implementation. It's an abstraction on the resource
level. Of course with a JCR provider underneath, the search is delegated
to that provider. Same with other providers.
It should be easy for every provider to implement the api.

Typical use cases are for example the job handling which searches for
jobs in the resource tree or the resource resolver implementation
looking for vanity paths etc.

Right now - although these parts use the resource api - it's not
possible to run them with a different provider than jcr.

Carsten

Am 18.05.15 um 08:17 schrieb Carsten Ziegeler:
> The current resource query api has several problems:
> - it's using the JCR spec to define a query
> - it's not clear which queries are supported by providers
> - queries are string based
> - implementing queries in a resource provider is way too hard as this
> would require to implement the complete jcr query api.
> 
> I've created a draft for a new, object based API at [1]. The main idea
> is to use a builder pattern to create Query objects. This are immutable
> and have a unique identifier. The QueryManager service can be used to
> execute a query in the context of a resource resolver. The manager
> delegates the query to the providers. As each Query object has this
> identifier, implementations can use this to cache the parsing of the query.
> In addition to the query object you can pass in query instructions to
> specify a limit or range for the query.
> 
> Obviously this is a reduced set compared to the full fledged jcr search
> api, however it should be suitable for the majority of use cases.
> 
> [1]
> https://svn.apache.org/repos/asf/sling/whiteboard/cziegeler/api-v3/src/main/java/org/apache/sling/api/resource/query/
> 
> Regards
> Carsten
> 


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
I've updated the API with the feedback I received so far. In addition I
changed the paging from a skip number to key based paging. I guess these
interfaces are not perfect yet, but show the direction.

Regards
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 28.05.2015, at 23:07, Carsten Ziegeler <cz...@apache.org> wrote:
>> No, it's both. It is a normal API [1], usage example at [2], but it also has a form that is easy to transport over http using GET/POST parameters and comes with a (popular) servlet that provides the result as json. The general predicate format is not depending on an order or correctly nested brackets, so you can easily build "advanced" search forms.
>> 
>> [1] https://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/package-summary.html
>> [2] https://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/QueryBuilder.html
>> 
> Right but it's JCR based, any plans on basing this on resources?

It converts into a JCR xpath query, yes. But other than that it gives both JCR (Node) and Resource API (Resource) in the search result hits for convenience. Nothing that should be a problem. Moving it to Sling would mean a few changes anyway, while keeping it backwards compatible on the query statement side.

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 29.05.15 um 01:13 schrieb Alexander Klimetschek:
> On 28.05.2015, at 09:32, Carsten Ziegeler <cz...@apache.org> wrote:
>>
>> Am 28.05.15 um 18:25 schrieb Alexander Klimetschek:
>>> On 27.05.2015, at 01:35, Bertrand Delacretaz <bd...@apache.org> wrote:
>>>> I'm happy to collaborate on creating these examples (which can simply
>>>> be unit tests for a relevant ResourceProvider) but before that I'd
>>>> like to discuss what the alternatives are, before we invent YAQA (*).
>>>
>>> We have the querybuilder [1] in our CQ/AEM product, we could contribute this to Sling, I think (pending internal legal processes of course).
>>>
>> I guess that would be nice, but isnt the query builder an http api? How
>> is that translated to resource queries?
> 
> No, it's both. It is a normal API [1], usage example at [2], but it also has a form that is easy to transport over http using GET/POST parameters and comes with a (popular) servlet that provides the result as json. The general predicate format is not depending on an order or correctly nested brackets, so you can easily build "advanced" search forms.
> 
> [1] https://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/package-summary.html
> [2] https://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/QueryBuilder.html
> 
Right but it's JCR based, any plans on basing this on resources?

Carsten


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 28.05.2015, at 09:32, Carsten Ziegeler <cz...@apache.org> wrote:
> 
> Am 28.05.15 um 18:25 schrieb Alexander Klimetschek:
>> On 27.05.2015, at 01:35, Bertrand Delacretaz <bd...@apache.org> wrote:
>>> I'm happy to collaborate on creating these examples (which can simply
>>> be unit tests for a relevant ResourceProvider) but before that I'd
>>> like to discuss what the alternatives are, before we invent YAQA (*).
>> 
>> We have the querybuilder [1] in our CQ/AEM product, we could contribute this to Sling, I think (pending internal legal processes of course).
>> 
> I guess that would be nice, but isnt the query builder an http api? How
> is that translated to resource queries?

No, it's both. It is a normal API [1], usage example at [2], but it also has a form that is easy to transport over http using GET/POST parameters and comes with a (popular) servlet that provides the result as json. The general predicate format is not depending on an order or correctly nested brackets, so you can easily build "advanced" search forms.

[1] https://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/package-summary.html
[2] https://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/QueryBuilder.html

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 28.05.15 um 18:25 schrieb Alexander Klimetschek:
> On 27.05.2015, at 01:35, Bertrand Delacretaz <bd...@apache.org> wrote:
>> I'm happy to collaborate on creating these examples (which can simply
>> be unit tests for a relevant ResourceProvider) but before that I'd
>> like to discuss what the alternatives are, before we invent YAQA (*).
> 
> We have the querybuilder [1] in our CQ/AEM product, we could contribute this to Sling, I think (pending internal legal processes of course).
> 
I guess that would be nice, but isnt the query builder an http api? How
is that translated to resource queries?

Carsten


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 27.05.2015, at 01:35, Bertrand Delacretaz <bd...@apache.org> wrote:
> I'm happy to collaborate on creating these examples (which can simply
> be unit tests for a relevant ResourceProvider) but before that I'd
> like to discuss what the alternatives are, before we invent YAQA (*).

We have the querybuilder [1] in our CQ/AEM product, we could contribute this to Sling, I think (pending internal legal processes of course).

[1] https://docs.adobe.com/docs/en/aem/6-1/develop/search/querybuilder-api.html

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 02.06.15 um 01:39 schrieb Bertrand Delacretaz:
> On Wed, May 27, 2015 at 10:35 AM, Bertrand Delacretaz 
> FWIW I've played a bit with the Oak query code to see if its parsers
> are reusable.
> 
> That's not the case out of the box but it looks like refactoring Oak's
> SQL2Parser and XPathToSQL2Converter to provide access to the abstract
> syntax tree that they generate wouldn't be too hard.
> 
> Going this way would allow us to reuse (a subset of) the JCR query
> languages, instead of inventing yet another one.
> 
Which is a) not possible today, b) a dangerous route as we have to
clearly define the subset and c) still ugly;

Keep also in mind it's not just the provider that has to implement it,
but for sorting also the resource resolver implementation needs to be
aware of it.

I think we really should complete our abstraction.

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: [RT] New resource query API

Posted by Stefan Seifert <ss...@pro-vision.de>.
>FWIW I've played a bit with the Oak query code to see if its parsers
>are reusable.
>
>That's not the case out of the box but it looks like refactoring Oak's
>SQL2Parser and XPathToSQL2Converter to provide access to the abstract
>syntax tree that they generate wouldn't be too hard.
>
>Going this way would allow us to reuse (a subset of) the JCR query
>languages, instead of inventing yet another one.

with what goal? to not use a fluent API but stick with a query string and a new query syntax that is derived from oak query syntax, to help non-jcr provider parse the oak-style query and map it to their own query API?

in my view the new query api was not intended to be a new query language, but a fluent API to build queries easier and this is mapped to any query language the underlying persistence supports.

if compared to relational databases and JPA: this is the same reason why there is a query language JPQL and fluent Criteria API to do the same but without a query string. the first is more compact, the latter is easier to use for new user, and less error-prone. e.g. the fluent API takes care of parameter value escaping and other things, which is not the case if you built the query by hand (no "sql injection" attacks).

and putting limits to what the query supports (compared to oak/JCR queries) was done by carsten by design. the goal was to support ~90% of all typical queries, not the most complex ones supported by oak/JCR, to make it easer for resource resolver implementations with not-so elaborated persistence engines. if we re-use the oak/JCR syntax we have to support 100% or end in a mess with only partially supporting it in some providers.

stefan

Re: [RT] New resource query API

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Wed, May 27, 2015 at 10:35 AM, Bertrand Delacretaz
<bd...@apache.org> wrote:
> ...I'd
> like to discuss what the alternatives are, before we invent YAQA (*)...

FWIW I've played a bit with the Oak query code to see if its parsers
are reusable.

That's not the case out of the box but it looks like refactoring Oak's
SQL2Parser and XPathToSQL2Converter to provide access to the abstract
syntax tree that they generate wouldn't be too hard.

Going this way would allow us to reuse (a subset of) the JCR query
languages, instead of inventing yet another one.

-Bertrand

> (*) Yet Another Query API ;-)

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 27.05.15 um 10:35 schrieb Bertrand Delacretaz:
> Hi,
> 
> On Mon, May 18, 2015 at 8:17 AM, Carsten Ziegeler <cz...@apache.org> wrote:
>> The current resource query api has several problems:
>> - it's using the JCR spec to define a query..
> 
> Why is that a problem?
> Creating a good query API is hard work, so I'd be much more in favor
> of reusing an existing query API than inventing our own.
> 
> AFAIK Oak parses queries to the internal JCR query object model, so
> translating that to a subset that random ResourceProviders can
> implement should be possible.

The goal of this api is to do what I call technical queries, so queries
we do in our code. Its not necessarily be used for user facing queries.
Instead of using a subset of an api that doesn't look too appealing to
me I would rather stay within the resource api.

> 
>> ...Obviously this is a reduced set compared to the full fledged jcr search
>> api, however it should be suitable for the majority of use cases...
> 
> IMO it's impossible to validate such a query API in the abstract,
> without having examples of how queries look, based on a set of
> realistic use cases.

We have queries in Sling, we have already people contributing within
this thread, so I guess this works out fine.
> 
> I'm happy to collaborate on creating these examples (which can simply
> be unit tests for a relevant ResourceProvider) but before that I'd
> like to discuss what the alternatives are, before we invent YAQA (*).
> 
Make a proposal and we can discuss it

Thanks
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi,

On Mon, May 18, 2015 at 8:17 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> The current resource query api has several problems:
> - it's using the JCR spec to define a query..

Why is that a problem?
Creating a good query API is hard work, so I'd be much more in favor
of reusing an existing query API than inventing our own.

AFAIK Oak parses queries to the internal JCR query object model, so
translating that to a subset that random ResourceProviders can
implement should be possible.

> - it's not clear which queries are supported by providers..

Agreed, that's a difficult one to solve, especially with a query that
spans multiple providers.

> ...Obviously this is a reduced set compared to the full fledged jcr search
> api, however it should be suitable for the majority of use cases...

IMO it's impossible to validate such a query API in the abstract,
without having examples of how queries look, based on a set of
realistic use cases.

I'm happy to collaborate on creating these examples (which can simply
be unit tests for a relevant ResourceProvider) but before that I'd
like to discuss what the alternatives are, before we invent YAQA (*).

-Bertrand

(*) Yet Another Query API ;-)

RE: [RT] New resource query API

Posted by Stefan Seifert <ss...@pro-vision.de>.
>> 6. property conditions with deep property paths should be supported as well
>- if the underlying provider supports it. so .property() could optionally
>> accept a path to a deeper property. to clarify in javadocs.
>
>I'm not sure if we should go there, so your use case is searching for a
>resource which has a child resource that has a property foo=bar (or
>something like that)?

yes - this is supported by xpath query currently, and - at least in jackrabbit 2 - it was performant if it matches with a proper lucene index configuration for the search index that includes a certain level of child nodes.

i looked in some of our old projects as well and found this usecase in some places.

stefan

RE: [RT] New resource query API

Posted by Stefan Seifert <ss...@pro-vision.de>.
>I changed my mind and went with property("*") - the main reason is to
>keep the query interface smaller as that needs to be interpreted by the
>providers.

ok

stefan 

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 26.05.15 um 22:10 schrieb Stefan Seifert:
> 
>>> 5. full text search on any property (jcr:contains) - is this possible with
>> .property("*").approx("searchterm")? or perhaps something
>>> like .anyProperty().approx("searchtearm") - or a special signature
>> like .anyPropertyApprox("searchtearm")?
>>
>> Haven't thought about this one yet, but I guess a special signature
>> sounds better.
> 
> do you want to add it? i've not found it in the updated API.
> 
I changed my mind and went with property("*") - the main reason is to
keep the query interface smaller as that needs to be interpreted by the
providers.


Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: [RT] New resource query API

Posted by Stefan Seifert <ss...@pro-vision.de>.
>> 5. full text search on any property (jcr:contains) - is this possible with
>.property("*").approx("searchterm")? or perhaps something
>> like .anyProperty().approx("searchtearm") - or a special signature
>like .anyPropertyApprox("searchtearm")?
>
>Haven't thought about this one yet, but I guess a special signature
>sounds better.

do you want to add it? i've not found it in the updated API.

stefan

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Thanks for your feedback Stefan, more inline...

Am 21.05.15 um 13:00 schrieb Stefan Seifert:
> 
> 1. Query can reference further Queryies for nesting with and/or expressions. in this case the sort* methods do not make sense. perhaps the two sort methods should be moved to QueryInstructions?

Yes, I had it this way in my first draft, but for some reason (which I
can't rememeber...) decided against. Make moving is better.

> 
> 2. it would be useful not only to filter by property values, but by node/resource name as well (e.g. resource name = X)

ah good one.

> 
> 3. I suppose the isA could not only mean a resource type but a JCR primary type as well if the resource provider supports it? to clarify in javadocs.

Yep

> 
> 4. perhaps we should not use arrays in public interfaces as output values, they have always the problem with immutability (e.g. Query.getPaths etc.)

ok

> 
> 5. full text search on any property (jcr:contains) - is this possible with .property("*").approx("searchterm")? or perhaps something 
> like .anyProperty().approx("searchtearm") - or a special signature
like .anyPropertyApprox("searchtearm")?

Haven't thought about this one yet, but I guess a special signature
sounds better.

> 
> 6. property conditions with deep property paths should be supported as well - if the underlying provider supports it. so .property() could optionally 
> accept a path to a deeper property. to clarify in javadocs.

I'm not sure if we should go there, so your use case is searching for a
resource which has a child resource that has a property foo=bar (or
something like that)?

> 
> 7. all in all this query supports a lot of features, although not all as it is possible with XPath etc. what happens if a resource provider can only 
> support a subset of those features?

The main idea is to cover most search use cases, looking for example at
the queries we have in the Sling code base, I guess we can cover all of
those. So the set should be implementable by all providers. We could
make this mandatory. Having optional parts is always problematic as
you never know if something is supported or not, and even asking before
if something is supported doesn't help if you get back a "no".
Therefore I personally would like to keep this simply and therefore
implementable by everyone.

Carsten

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: [RT] New resource query API

Posted by Stefan Seifert <ss...@pro-vision.de>.
hello carsten.

some feedback:

1. Query can reference further Queryies for nesting with and/or expressions. in this case the sort* methods do not make sense. perhaps the two sort methods should be moved to QueryInstructions?

2. it would be useful not only to filter by property values, but by node/resource name as well (e.g. resource name = X)

3. I suppose the isA could not only mean a resource type but a JCR primary type as well if the resource provider supports it? to clarify in javadocs.

4. perhaps we should not use arrays in public interfaces as output values, they have always the problem with immutability (e.g. Query.getPaths etc.)

5. full text search on any property (jcr:contains) - is this possible with .property("*").approx("searchterm")? or perhaps something like .anyProperty().approx("searchtearm") - or a special signature like .anyPropertyApprox("searchtearm")?

6. property conditions with deep property paths should be supported as well - if the underlying provider supports it. so .property() could optionally accept a path to a deeper property. to clarify in javadocs.

7. all in all this query supports a lot of features, although not all as it is possible with XPath etc. what happens if a resource provider can only support a subset of those features?


stefan


>-----Original Message-----
>From: Carsten Ziegeler [mailto:cziegeler@apache.org]
>Sent: Monday, May 18, 2015 8:17 AM
>To: Sling Developers
>Subject: [RT] New resource query API
>
>The current resource query api has several problems:
>- it's using the JCR spec to define a query
>- it's not clear which queries are supported by providers
>- queries are string based
>- implementing queries in a resource provider is way too hard as this
>would require to implement the complete jcr query api.
>
>I've created a draft for a new, object based API at [1]. The main idea
>is to use a builder pattern to create Query objects. This are immutable
>and have a unique identifier. The QueryManager service can be used to
>execute a query in the context of a resource resolver. The manager
>delegates the query to the providers. As each Query object has this
>identifier, implementations can use this to cache the parsing of the query.
>In addition to the query object you can pass in query instructions to
>specify a limit or range for the query.
>
>Obviously this is a reduced set compared to the full fledged jcr search
>api, however it should be suitable for the majority of use cases.
>
>[1]
>https://svn.apache.org/repos/asf/sling/whiteboard/cziegeler/api-
>v3/src/main/java/org/apache/sling/api/resource/query/
>
>Regards
>Carsten
>--
>Carsten Ziegeler
>Adobe Research Switzerland
>cziegeler@apache.org