You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sling.apache.org by Bertrand Delacretaz <bd...@apache.org> on 2015/06/15 10:44:29 UTC

New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Hi,

On Sat, Jun 13, 2015 at 5:50 PM, Carsten Ziegeler (JIRA)
<ji...@apache.org> wrote:
> ...Carsten Ziegeler commented on SLING-4752:
> I've moved the prototype api to trunk...

I don't feel we have strong agreement on doing that, this new query
API has been heavily discussed but I don't see an emerging consensus
to add it to our core API.

Can we move this to a separate query-api bundle in order to avoid
polluting our "sacred" core API bundle with something on which we
don't strongly agree?

-Bertrand

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 24.06.2015, at 23:10, Justin Edelson <ju...@justinedelson.com> wrote:
> Agree, but I (and perhaps you disagree) would think this behavior would be
> totally understandable and we could make it transparent what was happening,
> i.e. have a 'show plan' output.

But note that the resource resolver level has no idea how the individual providers implement the search and if/what of their indexes they use. Unless you come up with a generic search index API that is exposed by the providers (don't think that's a good idea).

> How do you see this working with the existing Sling API (i.e. before this
> addition)? Would it look like:
> 
> resourceResolver.findResources("SOLR", <some solar syntax query>)

I guess you refer to "access 3rd party search index that indexes all resource providers". In that case, not sure if you need to integrate it into the resource resolver API, you'd talk to the 3rd party search API directly and it should return something including resource paths that you can then lookup using resolver.getResource().

For the other case, a query that one resource provider but not all understand, like it is with JCR today, yes, that is what I mean.

Cheers,
Alex

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Justin Edelson <ju...@justinedelson.com>.

Hi,
On Wed, Jun 24, 2015 at 9:01 PM Alexander Klimetschek <ak...@adobe.com>
wrote:

> On 23.06.2015, at 15:22, Justin Edelson <ju...@justinedelson.com> wrote:
> >>
> https://docs.adobe.com/docs/en/cq/5-6-1/javadoc/com/day/cq/search/eval/RelativeDateRangePredicateEvaluator.html
> >>
> >> This is not the Query API. This is the SPI.
> >
> > Yes, I know this is the SPI of the QueryBuilder. My point is that because
> > the current Sling Query API is all strongly typed, there's no way to
> extend
> > it with custom predicates like this.
>
> Oh, you refered to "Sling query API". I thought with "query API" you meant
> the "AEM querybuilder API" :)
>

Yes... the wording is very complicated because both are "Query APIs" and
both have something called a "QueryBuilder". Not surprised I missed
qualifying one or two :)


>
> My comments were independent from the current sling query API proposal.
>
> > Perhaps this extensibility is not desired in Sling, but IMHO it certainly
> > is one advantage of the (AEM) QueryBuilder.
> >
> > And if we don't have it in Sling, it only makes the developer decision as
> > to what query abstraction to use that much more complicated.
>
> Right.
>
> >> Could be a new predicate:
> >>
> >> compare.left=jcr:title
> >> compare.right=jcr:description
> >>
> >
> > It could be in the AEM QueryBuilder, but this isn't something the Sling
> > Query API can support.
>
> Ok.
>
> >> Once you join/merge results across different resource providers, you
> will
> >> never be able to get acceptable performance. And the implementation is
> no
> >> longer resource provider specific, since you need someone on the
> resource
> >> resolver level to understand the query.
> >
> > I'm not sure why the performance would be suboptimal in this case unless
> > sorting was involved.
>
> A true join, like "where a.value = b.value", with a and b coming from
> different resource providers.
>

Ah, that kind of join. Yes, I agree.


>
> Also, the overhead of separate index lookups (instead of 1 index, you look
> at N = number of resource providers), especially for full text searches,
> should not be neglected.
>

Agree, but I (and perhaps you disagree) would think this behavior would be
totally understandable and we could make it transparent what was happening,
i.e. have a 'show plan' output.


>
> And sorting is not that uncommon :) Especially if you have different
> buckets (resource providers) - do you always want to return them in their
> rp registration order? What use case would be solved by that and ok with it?
>
> > This predicate list would map to three queries (in
> > the JCR + Mongo use case):
> >
> > //element(*, dam:Asset)[@jcr:contains(., 'Management')
> > //element(*, nt:base)[@sling:resourceType='some/resource/type' and
> > @jcr:contains(., 'Management')
> > { 'sling:resourceType' : { $eq : 'some/resource/type' } }, { $text :
> > 'Management' }
> >
> > And you wouldn't actually need to execute all three queries at once
> (unless
> > you needed sizing information) - just return some kind of lazy executor
> > which went through each result set before executing one query.
> >
> > The performance for this would be as good as could be expected.
>
> Depending on 3 separately configured search indexes that work completely
> different… which sounds difficult to me, and a central, external search
> index should be much more manageable and efficient.
>
> > But let's be clear - query is always going to be a highly leaky
> > abstraction. Even querying against the JCR API directly is very leaky at
> > this point in Oak because you really need to know the indexes available
> in
> > the system in order to know that a query is going to perform well. Ditto
> > with MongoDB or any other queryable system.
>
> Sure.
>
> > I don't disagree that a centralized index would be a better functional
> > match, albeit with additional operational complexity. I don't think
> there's
> > anything in the model I proposed which would preclude the
> ResourceResolver
> > from handing the query off directly to Solr instead of passing it down to
> > the ResourceProviders.
>
> Which would not require a new sling query API.
>

How do you see this working with the existing Sling API (i.e. before this
addition)? Would it look like:

resourceResolver.findResources("SOLR", <some solar syntax query>)

??

Regards,
Justin




>
> Cheers,
> Alex

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 23.06.2015, at 15:22, Justin Edelson <ju...@justinedelson.com> wrote:
>> https://docs.adobe.com/docs/en/cq/5-6-1/javadoc/com/day/cq/search/eval/RelativeDateRangePredicateEvaluator.html
>> 
>> This is not the Query API. This is the SPI.
> 
> Yes, I know this is the SPI of the QueryBuilder. My point is that because
> the current Sling Query API is all strongly typed, there's no way to extend
> it with custom predicates like this.

Oh, you refered to "Sling query API". I thought with "query API" you meant the "AEM querybuilder API" :)

My comments were independent from the current sling query API proposal.

> Perhaps this extensibility is not desired in Sling, but IMHO it certainly
> is one advantage of the (AEM) QueryBuilder.
> 
> And if we don't have it in Sling, it only makes the developer decision as
> to what query abstraction to use that much more complicated.

Right.

>> Could be a new predicate:
>> 
>> compare.left=jcr:title
>> compare.right=jcr:description
>> 
> 
> It could be in the AEM QueryBuilder, but this isn't something the Sling
> Query API can support.

Ok.

>> Once you join/merge results across different resource providers, you will
>> never be able to get acceptable performance. And the implementation is no
>> longer resource provider specific, since you need someone on the resource
>> resolver level to understand the query.
> 
> I'm not sure why the performance would be suboptimal in this case unless
> sorting was involved.

A true join, like "where a.value = b.value", with a and b coming from different resource providers.

Also, the overhead of separate index lookups (instead of 1 index, you look at N = number of resource providers), especially for full text searches, should not be neglected.

And sorting is not that uncommon :) Especially if you have different buckets (resource providers) - do you always want to return them in their rp registration order? What use case would be solved by that and ok with it?

> This predicate list would map to three queries (in
> the JCR + Mongo use case):
> 
> //element(*, dam:Asset)[@jcr:contains(., 'Management')
> //element(*, nt:base)[@sling:resourceType='some/resource/type' and
> @jcr:contains(., 'Management')
> { 'sling:resourceType' : { $eq : 'some/resource/type' } }, { $text :
> 'Management' }
> 
> And you wouldn't actually need to execute all three queries at once (unless
> you needed sizing information) - just return some kind of lazy executor
> which went through each result set before executing one query.
> 
> The performance for this would be as good as could be expected.

Depending on 3 separately configured search indexes that work completely different… which sounds difficult to me, and a central, external search index should be much more manageable and efficient.

> But let's be clear - query is always going to be a highly leaky
> abstraction. Even querying against the JCR API directly is very leaky at
> this point in Oak because you really need to know the indexes available in
> the system in order to know that a query is going to perform well. Ditto
> with MongoDB or any other queryable system.

Sure.

> I don't disagree that a centralized index would be a better functional
> match, albeit with additional operational complexity. I don't think there's
> anything in the model I proposed which would preclude the ResourceResolver
> from handing the query off directly to Solr instead of passing it down to
> the ResourceProviders.

Which would not require a new sling query API.

Cheers,
Alex

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Justin Edelson <ju...@justinedelson.com>.

Hi,

On Tue, Jun 23, 2015 at 8:49 PM Alexander Klimetschek <ak...@adobe.com>
wrote:

> > On 22.06.2015, at 15:49, Justin Edelson <ju...@justinedelson.com>
> wrote:
> > IIUC, the core problem we are trying to solve is to provide a query
> syntax
> > indepdent of any particular ResourceResolver implementation. While, to be
> > honest, this is not a problem I have personally run into using Sling for
> > the past 6 years,
>
> That's my main concern as well. An edge case creating a ton of complexity
> with probably leaky abstractions (inevitable little tricks to pass through
> query language/resource provider specific stuff, the AEM query builder
> experienced this already with the @orderby statement and fn: functions).
>
> > One thing which concerns me about the current Query API is that it
> appears
> > to be completely non-extensible. How, for example, would one implement
> > something like
> >
> https://docs.adobe.com/docs/en/cq/5-6-1/javadoc/com/day/cq/search/eval/RelativeDateRangePredicateEvaluator.html
>
> This is not the Query API. This is the SPI.


Yes, I know this is the SPI of the QueryBuilder. My point is that because
the current Sling Query API is all strongly typed, there's no way to extend
it with custom predicates like this. In order to add this, the Query API
itself would need to be modified.

Perhaps this extensibility is not desired in Sling, but IMHO it certainly
is one advantage of the (AEM) QueryBuilder.

And if we don't have it in Sling, it only makes the developer decision as
to what query abstraction to use that much more complicated.


> But yes, you would need a way to have a different SPI per resource
> provider. Currently a PredicateEvaluator [1] has a single
> getXpathExpression().
>
> [1]
> https://docs.adobe.com/docs/en/aem/6-1/ref/javadoc/com/day/cq/search/eval/PredicateEvaluator.html
>
> > ? If I'm reading this correctly, the date math has to be done by the
> > caller. Which isn't that problematic at first, but the code would be
> > significantly more verbose than
> >
> > relativedaterange.property=jcr:lastModified
> > relativedaterange.lowerBound=-1d
>
> Some of that common parsing logic should be shared, of course, used by the
> different SPIs.


> > What is potentially problematic about not having this type of
> extensibility
> > is that it prevents specific implementations from providing the best
> > implementation possible.
>
> Yep, the AEM querybuilder so far was not designed for different underlying
> query languages / engines, this would be something to look into.
>
> Its design goal was to allow customers to plugin own predicate evaluators
> mainly for making client side queries short and descriptive, and have them
> "expanded" into the full, maybe more complex xpath query involving multiple
> predicates or some custom parsing as the date example.
>
> Taken plain, this would lead to a matrix of things, predicate evaluators X
> query languages (resource providers). Not sure if this is desirable.
>
> > Here's a better example: JCR is unable to compare two properties, i.e.
> give
> > me all nodes where property foo equals the value of property bar. But
> > MongoDB *can* do this (it isn't super-efficient, but it is possible). I
> can
> > almost see how you would do this with the new Query API, but it would be
> > ugly at best. Or, more broadly, how would the MongoDB $where operator be
> > supported?
>
> Could be a new predicate:
>
> compare.left=jcr:title
> compare.right=jcr:description
>

It could be in the AEM QueryBuilder, but this isn't something the Sling
Query API can support.


>
> > 1) A map of key/value pairs is turned into a PredicateGroup object.
> > 2) The PredicateGroup (which is a nested tree) at this point represents
> the
> > query statement.
> > 3) Each ResourceProvider analyzes the predicates and decides whether or
> not
> > it knows how to evaluate all of them. If it can't, it should return no
> > results (this is debatable, but I think it makes sense). The only
> exception
> > is where you had an or clause, i.e. this query:
> >
> > fulltext=Management
> > group.p.or=true
> > group.1_jcrType=dam:Asset
> > group.2_resourceType=some/resource/type
>
> Yep, these tend to be joins.
>
> Once you join/merge results across different resource providers, you will
> never be able to get acceptable performance. And the implementation is no
> longer resource provider specific, since you need someone on the resource
> resolver level to understand the query.
>

I'm not sure why the performance would be suboptimal in this case unless
sorting was involved. This predicate list would map to three queries (in
the JCR + Mongo use case):

//element(*, dam:Asset)[@jcr:contains(., 'Management')
//element(*, nt:base)[@sling:resourceType='some/resource/type' and
@jcr:contains(., 'Management')
{ 'sling:resourceType' : { $eq : 'some/resource/type' } }, { $text :
'Management' }

And you wouldn't actually need to execute all three queries at once (unless
you needed sizing information) - just return some kind of lazy executor
which went through each result set before executing one query.

The performance for this would be as good as could be expected.

But let's be clear - query is always going to be a highly leaky
abstraction. Even querying against the JCR API directly is very leaky at
this point in Oak because you really need to know the indexes available in
the system in order to know that a query is going to perform well. Ditto
with MongoDB or any other queryable system.


>
> Here a central search index (Solr, ElasticSearch etc.) is the right
> solution anyway. And that's what I am preaching, anyone who actually has
> the use case of searching across multiple resource providers with the same
> query language should do this.
>

I don't disagree that a centralized index would be a better functional
match, albeit with additional operational complexity. I don't think there's
anything in the model I proposed which would preclude the ResourceResolver
from handing the query off directly to Solr instead of passing it down to
the ResourceProviders.


>
> If the use case is "one resource provider" only, then IMO you can live
> with rp specific query languages, and the current findResources() is fine
> (as long as you can put the query statement in a single string).
>
> > 4) The ResourceProvider uses PredicateEvaluators to map each predicate to
> > its native query syntax. For this to work, each ResourceProvider would
> > expose its own PredicateEvaluator interface (in theory,
> > a ResourceProvider doesn't need to do this if the evaluation process
> isn't
> > intended to be pluggable).
>
> The PredicateEvaluator SPI could be rp specific and not part of the sling
> resource query API.
>

Yes, this is exactly what I'm thinking.

Regards,
Justin


>
> Cheers,
> Alex

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Alexander Klimetschek <ak...@adobe.com>.

> On 22.06.2015, at 15:49, Justin Edelson <ju...@justinedelson.com> wrote:
> IIUC, the core problem we are trying to solve is to provide a query syntax
> indepdent of any particular ResourceResolver implementation. While, to be
> honest, this is not a problem I have personally run into using Sling for
> the past 6 years,

That's my main concern as well. An edge case creating a ton of complexity with probably leaky abstractions (inevitable little tricks to pass through query language/resource provider specific stuff, the AEM query builder experienced this already with the @orderby statement and fn: functions).

> One thing which concerns me about the current Query API is that it appears
> to be completely non-extensible. How, for example, would one implement
> something like
> https://docs.adobe.com/docs/en/cq/5-6-1/javadoc/com/day/cq/search/eval/RelativeDateRangePredicateEvaluator.html

This is not the Query API. This is the SPI. But yes, you would need a way to have a different SPI per resource provider. Currently a PredicateEvaluator [1] has a single getXpathExpression().

[1] https://docs.adobe.com/docs/en/aem/6-1/ref/javadoc/com/day/cq/search/eval/PredicateEvaluator.html

> ? If I'm reading this correctly, the date math has to be done by the
> caller. Which isn't that problematic at first, but the code would be
> significantly more verbose than
> 
> relativedaterange.property=jcr:lastModified
> relativedaterange.lowerBound=-1d

Some of that common parsing logic should be shared, of course, used by the different SPIs.

> What is potentially problematic about not having this type of extensibility
> is that it prevents specific implementations from providing the best
> implementation possible.

Yep, the AEM querybuilder so far was not designed for different underlying query languages / engines, this would be something to look into.

Its design goal was to allow customers to plugin own predicate evaluators mainly for making client side queries short and descriptive, and have them "expanded" into the full, maybe more complex xpath query involving multiple predicates or some custom parsing as the date example.

Taken plain, this would lead to a matrix of things, predicate evaluators X query languages (resource providers). Not sure if this is desirable.

> Here's a better example: JCR is unable to compare two properties, i.e. give
> me all nodes where property foo equals the value of property bar. But
> MongoDB *can* do this (it isn't super-efficient, but it is possible). I can
> almost see how you would do this with the new Query API, but it would be
> ugly at best. Or, more broadly, how would the MongoDB $where operator be
> supported?

Could be a new predicate:

compare.left=jcr:title
compare.right=jcr:description

> 1) A map of key/value pairs is turned into a PredicateGroup object.
> 2) The PredicateGroup (which is a nested tree) at this point represents the
> query statement.
> 3) Each ResourceProvider analyzes the predicates and decides whether or not
> it knows how to evaluate all of them. If it can't, it should return no
> results (this is debatable, but I think it makes sense). The only exception
> is where you had an or clause, i.e. this query:
> 
> fulltext=Management
> group.p.or=true
> group.1_jcrType=dam:Asset
> group.2_resourceType=some/resource/type

Yep, these tend to be joins.

Once you join/merge results across different resource providers, you will never be able to get acceptable performance. And the implementation is no longer resource provider specific, since you need someone on the resource resolver level to understand the query.

Here a central search index (Solr, ElasticSearch etc.) is the right solution anyway. And that's what I am preaching, anyone who actually has the use case of searching across multiple resource providers with the same query language should do this.

If the use case is "one resource provider" only, then IMO you can live with rp specific query languages, and the current findResources() is fine (as long as you can put the query statement in a single string).

> 4) The ResourceProvider uses PredicateEvaluators to map each predicate to
> its native query syntax. For this to work, each ResourceProvider would
> expose its own PredicateEvaluator interface (in theory,
> a ResourceProvider doesn't need to do this if the evaluation process isn't
> intended to be pluggable).

The PredicateEvaluator SPI could be rp specific and not part of the sling resource query API.

Cheers,
Alex

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Carsten Ziegeler <cz...@apache.org>.

Thanks Justin for the detailed response. I guess we all have different
experience and have different use cases in mind.

I think we all agree that the current way of searching in the resource
api is tied to JCR - and that we don't have an abstraction for the query.
My main point is simple : we need this abstraction. I want to specify a
query and I don't want to care about the implementation or the storage.
You're right that there will be situations where not the best way for a
search is used as this is not possible through the abstraction and yes
there is no extension mechanism. For the latter, as soon as there is an
extension mechanism you loose the abstraction. For the first one, well
this might be true. On the other hand when ORM became popular there was
the long debate whether a hand-crafted SQL query is more efficient than
the ones generated by the ORM tools. And in the end it became clear that
the generated ones where good enough if not better. So I don't see why
this should work in this case as well. On the other hand if you really
want to do a specific query against a specific resource provider, do
that, don't use the abstraction.

Or in other words, the propsed API will not cover 100%, it might cover
60% in a nice way. And that alone is reason for me to go this way. Of
course we can go pestimistic and say people will try to use it for the
remaining 40% and fail. We could also say this with other things like
the adapter pattern we have which allows you to break out of the
abstraction. My use cases work pretty well with that new api and can be
efficiently implemented.

For the idea of donating the query buider - are there any concrete
plans? If this would happen who is doing the refactoring? Where would
the refactoring take place, at Adobe, in Sling? We all agree that
throwing this code into Sling by itself does not help.

So whoever wants to get his hands dirty, please come up with a concrete
proposal which we can discuss

Thanks
Carsten

Am 22.06.15 um 17:49 schrieb Justin Edelson:
> Hi,
> 
> 
> Apologies for not tracking this discussion, but I wanted to weigh in before
> things got much further.
> 
> IIUC, the core problem we are trying to solve is to provide a query syntax
> indepdent of any particular ResourceResolver implementation. While, to be
> honest, this is not a problem I have personally run into using Sling for
> the past 6 years, I can certainly see why it is one.
> 
> But I do think we have a good answer available which was Alex's original
> proposal to have Adobe donate the QueryBuilder code to Sling. Now the
> QueryBuilder code as-is wouldn't solve this problem; it would require a
> refactoring, but I believe this refactoring is managable. This would have
> the following benefits:
> 
> 1) Adopt a syntax many (but certainly not all) Sling developers are
> famililar with.
> 2) Provide a path to avoid YAQL. While yes, in the near term we will have
> "Sling QueryBuilder" and "AEM QueryBuilder", the AEM QueryBuilder could be
> deprecated (obviously up to AEM Product Management) and eventually removed.
> 3) An opportunity to fix some of the issues with QueryBuilder (granted,
> this isn't necessarily Sling's problem to solve).
> 
> One thing which concerns me about the current Query API is that it appears
> to be completely non-extensible. How, for example, would one implement
> something like
> https://docs.adobe.com/docs/en/cq/5-6-1/javadoc/com/day/cq/search/eval/RelativeDateRangePredicateEvaluator.html
> ? If I'm reading this correctly, the date math has to be done by the
> caller. Which isn't that problematic at first, but the code would be
> significantly more verbose than
> 
> relativedaterange.property=jcr:lastModified
> relativedaterange.lowerBound=-1d
> 
> What is potentially problematic about not having this type of extensibility
> is that it prevents specific implementations from providing the best
> implementation possible. For example, let's say that MongoDB has a really
> efficient way to query for documents modified in the last day. If I do the
> date math in Java code, I'm making it that much harder for the MongoDB
> ResourceProvider to opimitize this query (sorry, this isn't a great
> example, but it's late and I'm getting tired). Plus, the query isn't really
> expressing what I want -- I want to find resources modified in the last
> day, not from some absolute date. So someone reading my code later has to
> figure out what the calls to Calendar.add(Calendar.DAY_OF_MONTH, -1) are
> there for.
> 
> Here's a better example: JCR is unable to compare two properties, i.e. give
> me all nodes where property foo equals the value of property bar. But
> MongoDB *can* do this (it isn't super-efficient, but it is possible). I can
> almost see how you would do this with the new Query API, but it would be
> ugly at best. Or, more broadly, how would the MongoDB $where operator be
> supported?
> 
> The advantage of the AEM QueryBuilder's model is that figuring all of this
> stuff out isn't the responsibility of the platform developer. We just need
> to provide a solid basis and then let downstream users add their own hooks.
> As soon as you say that these are the only 8 operations anyone is ever
> going to do on a property or the 4 operations anyone is ever going to do on
> a resource, you're into "640k should be enough memory for anyone" territory.
> 
> So how specifically would the Sling QueryBuilder be different than the AEM
> QueryBuilder?
> 
> I think of QueryBuilder queries being processed in these separate steps
> (FWIW, none of this is proprietary information, it is based on public
> documentation):
> 
> 1) A map of key/value pairs is turned into a PredicateGroup object. While
> technically this step is optional (you can build a PredicateGroup by hand),
> it is pretty common. This would be common functionality across all
> ResourceResolvers and the code from AEM could probably be brought over
> as-is.
> 2) The PredicateGroup (which is a nested tree) at this point represents the
> query statement. It is then passed to the ResourceResolver (this part is
> somewhat different than the AEM QueryBuilder).
> 3) Each ResourceProvider analyzes the predicates and decides whether or not
> it knows how to evaluate all of them. If it can't, it should return no
> results (this is debatable, but I think it makes sense). The only exception
> is where you had an or clause, i.e. this query:
> 
> fulltext=Management
> group.p.or=true
> group.1_jcrType=dam:Asset
> group.2_resourceType=some/resource/type
> 
> If a non-JCR provider didn't know how to evaluate the jcrType predicate
> type, it could still evaluate the query because it is OR'd with a
> resourceType predicate (which let's say it does know how to evaluate). But
> if it didn't know how to evaluate the fulltext predicate type, it shouldn't
> return any results.
> 
> 4) The ResourceProvider uses PredicateEvaluators to map each predicate to
> its native query syntax. For this to work, each ResourceProvider would
> expose its own PredicateEvaluator interface (in theory,
> a ResourceProvider doesn't need to do this if the evaluation process isn't
> intended to be pluggable). IIOW, the current AEM PredicateEvaluator
> interface would be renamed JcrPredicateEvaluator.
> 5) At least in JCR (based on current functionality), some Predicates can't
> be evaluated in a native query (i.e. XPath) and will need to be handled as
> filters on the result set, but this is an implementation detail left to
> the ResourceProvider.
> 6) The ResourceProvider returns results to the ResourceResolver.
> 7) Sorting is handled (or not) as currently proposed.
> 
> To be clear, I don't have a concrete proposal for how to replicate (or not)
> AEM QueryBuilder's facet support. Alex might...
> 
> Regards,
> Justin
> 
> 
>>
>> Cheers,
>> Alex
>>
> 


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Justin Edelson <ju...@justinedelson.com>.

Hi,

On Mon, Jun 22, 2015 at 10:57 PM Alexander Klimetschek <ak...@adobe.com>
wrote:

> On 15.06.2015, at 02:23, Carsten Ziegeler <cz...@apache.org> wrote:
> >
> > It really seems that people who are not convinced have never felt the
> > current pain - while people who are on the pro side exactly felt this
> > pain and ran into the problems which this is trying to solve. I'm
> > absolutely unsure on how to solve that situation.
>
> I was asking this before: what are the pains and specific use cases?
>
> (Apart from the paging of results)
>

Apologies for not tracking this discussion, but I wanted to weigh in before
things got much further.

IIUC, the core problem we are trying to solve is to provide a query syntax
indepdent of any particular ResourceResolver implementation. While, to be
honest, this is not a problem I have personally run into using Sling for
the past 6 years, I can certainly see why it is one.

But I do think we have a good answer available which was Alex's original
proposal to have Adobe donate the QueryBuilder code to Sling. Now the
QueryBuilder code as-is wouldn't solve this problem; it would require a
refactoring, but I believe this refactoring is managable. This would have
the following benefits:

1) Adopt a syntax many (but certainly not all) Sling developers are
famililar with.
2) Provide a path to avoid YAQL. While yes, in the near term we will have
"Sling QueryBuilder" and "AEM QueryBuilder", the AEM QueryBuilder could be
deprecated (obviously up to AEM Product Management) and eventually removed.
3) An opportunity to fix some of the issues with QueryBuilder (granted,
this isn't necessarily Sling's problem to solve).

One thing which concerns me about the current Query API is that it appears
to be completely non-extensible. How, for example, would one implement
something like
https://docs.adobe.com/docs/en/cq/5-6-1/javadoc/com/day/cq/search/eval/RelativeDateRangePredicateEvaluator.html
? If I'm reading this correctly, the date math has to be done by the
caller. Which isn't that problematic at first, but the code would be
significantly more verbose than

relativedaterange.property=jcr:lastModified
relativedaterange.lowerBound=-1d

What is potentially problematic about not having this type of extensibility
is that it prevents specific implementations from providing the best
implementation possible. For example, let's say that MongoDB has a really
efficient way to query for documents modified in the last day. If I do the
date math in Java code, I'm making it that much harder for the MongoDB
ResourceProvider to opimitize this query (sorry, this isn't a great
example, but it's late and I'm getting tired). Plus, the query isn't really
expressing what I want -- I want to find resources modified in the last
day, not from some absolute date. So someone reading my code later has to
figure out what the calls to Calendar.add(Calendar.DAY_OF_MONTH, -1) are
there for.

Here's a better example: JCR is unable to compare two properties, i.e. give
me all nodes where property foo equals the value of property bar. But
MongoDB *can* do this (it isn't super-efficient, but it is possible). I can
almost see how you would do this with the new Query API, but it would be
ugly at best. Or, more broadly, how would the MongoDB $where operator be
supported?

The advantage of the AEM QueryBuilder's model is that figuring all of this
stuff out isn't the responsibility of the platform developer. We just need
to provide a solid basis and then let downstream users add their own hooks.
As soon as you say that these are the only 8 operations anyone is ever
going to do on a property or the 4 operations anyone is ever going to do on
a resource, you're into "640k should be enough memory for anyone" territory.

So how specifically would the Sling QueryBuilder be different than the AEM
QueryBuilder?

I think of QueryBuilder queries being processed in these separate steps
(FWIW, none of this is proprietary information, it is based on public
documentation):

1) A map of key/value pairs is turned into a PredicateGroup object. While
technically this step is optional (you can build a PredicateGroup by hand),
it is pretty common. This would be common functionality across all
ResourceResolvers and the code from AEM could probably be brought over
as-is.
2) The PredicateGroup (which is a nested tree) at this point represents the
query statement. It is then passed to the ResourceResolver (this part is
somewhat different than the AEM QueryBuilder).
3) Each ResourceProvider analyzes the predicates and decides whether or not
it knows how to evaluate all of them. If it can't, it should return no
results (this is debatable, but I think it makes sense). The only exception
is where you had an or clause, i.e. this query:

fulltext=Management
group.p.or=true
group.1_jcrType=dam:Asset
group.2_resourceType=some/resource/type

If a non-JCR provider didn't know how to evaluate the jcrType predicate
type, it could still evaluate the query because it is OR'd with a
resourceType predicate (which let's say it does know how to evaluate). But
if it didn't know how to evaluate the fulltext predicate type, it shouldn't
return any results.

4) The ResourceProvider uses PredicateEvaluators to map each predicate to
its native query syntax. For this to work, each ResourceProvider would
expose its own PredicateEvaluator interface (in theory,
a ResourceProvider doesn't need to do this if the evaluation process isn't
intended to be pluggable). IIOW, the current AEM PredicateEvaluator
interface would be renamed JcrPredicateEvaluator.
5) At least in JCR (based on current functionality), some Predicates can't
be evaluated in a native query (i.e. XPath) and will need to be handled as
filters on the result set, but this is an implementation detail left to
the ResourceProvider.
6) The ResourceProvider returns results to the ResourceResolver.
7) Sorting is handled (or not) as currently proposed.

To be clear, I don't have a concrete proposal for how to replicate (or not)
AEM QueryBuilder's facet support. Alex might...

Regards,
Justin

>
> Cheers,
> Alex
>

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Alexander Klimetschek <ak...@adobe.com>.

On 15.06.2015, at 02:23, Carsten Ziegeler <cz...@apache.org> wrote:
> 
> It really seems that people who are not convinced have never felt the
> current pain - while people who are on the pro side exactly felt this
> pain and ran into the problems which this is trying to solve. I'm
> absolutely unsure on how to solve that situation.

I was asking this before: what are the pains and specific use cases?

(Apart from the paging of results)

Cheers,
Alex

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Carsten Ziegeler <cz...@apache.org>.

Am 15.06.15 um 12:02 schrieb Bertrand Delacretaz:
> On Mon, Jun 15, 2015 at 11:23 AM, Carsten Ziegeler <cz...@apache.org> wrote:
>> ...The query api is the user api, right this can be moved into a different
>> bundle. But of course the interesting part is the implementation and
>> this is part of the new provider spi which references this api. A
>> provider does the query and therefore needs access to the query object etc....
> 
> Would it work with a more abstract version of the query API in our
> main API bundle?
> 
> Conceptually at that level you only need to know that there are Query
> objects that can provide resources, maybe something like
> 
> public interface Query {
>   PagingIterator<Resource> execute();
> }
> 
> Do we need more than this (or the translated equivalent based on your
> query API) in the API bundle?
> 
> The details of how a Query is built can then go to a separate, more
> concrete, bundle.
> 

I'm not sure if I can follow :) The major problem today is that there is
no way to specify a resource provider independent query which can work
for all resource providers. So we need three things:
a) client api to formulate such queries (that's the current query
package), and
b) an extension to the resource provider api to implement the query for
a provider
c) managing this stuff within the resource resolver implementation
(delegating the query created with a) to a provider implementing b) )

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Mon, Jun 15, 2015 at 11:23 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> ...The query api is the user api, right this can be moved into a different
> bundle. But of course the interesting part is the implementation and
> this is part of the new provider spi which references this api. A
> provider does the query and therefore needs access to the query object etc....

Would it work with a more abstract version of the query API in our
main API bundle?

Conceptually at that level you only need to know that there are Query
objects that can provide resources, maybe something like

public interface Query {
  PagingIterator<Resource> execute();
}

Do we need more than this (or the translated equivalent based on your
query API) in the API bundle?

The details of how a Query is built can then go to a separate, more
concrete, bundle.

-Bertrand

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Carsten Ziegeler <cz...@apache.org>.

Am 15.06.15 um 11:15 schrieb Bertrand Delacretaz:
> On Mon, Jun 15, 2015 at 10:57 AM, Carsten Ziegeler <cz...@apache.org> wrote:
>> Am 15.06.15 um 10:44 schrieb Bertrand Delacretaz:
>> ...I have not seen any compelling reason to not do it and the
>> advantages outweight the potential disadvantages...
> 
> I'm not saying we should "not do it", and as you rightly say there's
> no better concrete proposal than yours at the moment. But we are
> introducing a new API in our core without (most of us) being really
> convinced about it, and this is not good.
It really seems that people who are not convinced have never felt the
current pain - while people who are on the pro side exactly felt this
pain and ran into the problems which this is trying to solve. I'm
absolutely unsure on how to solve that situation.

> 
> Modularizing is our usual answer to such situations, to keep a bit of
> flexibility if we change our minds later on.
> 
> You say we cannot put the query API in a separate bundle because "we
> need provider support", can you elaborate? I don't see what exactly
> prevents the  org.apache.sling.api.resource.query package from being
> provided by a different bundle - but maybe I missed something.
The query api is the user api, right this can be moved into a different
bundle. But of course the interesting part is the implementation and
this is part of the new provider spi which references this api. A
provider does the query and therefore needs access to the query object etc.
I just put the query client api into a separate package as the resource
package is overloaded already.

Carsten

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Mon, Jun 15, 2015 at 10:57 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> Am 15.06.15 um 10:44 schrieb Bertrand Delacretaz:
> ...I have not seen any compelling reason to not do it and the
> advantages outweight the potential disadvantages...

I'm not saying we should "not do it", and as you rightly say there's
no better concrete proposal than yours at the moment. But we are
introducing a new API in our core without (most of us) being really
convinced about it, and this is not good.

Modularizing is our usual answer to such situations, to keep a bit of
flexibility if we change our minds later on.

You say we cannot put the query API in a separate bundle because "we
need provider support", can you elaborate? I don't see what exactly
prevents the  org.apache.sling.api.resource.query package from being
provided by a different bundle - but maybe I missed something.

-Bertrand

Re: New Query API - in a distinct bundle? (was [jira] [Commented] (SLING-4752) New resource query API)

Posted by Carsten Ziegeler <cz...@apache.org>.

Am 15.06.15 um 10:44 schrieb Bertrand Delacretaz:
> Hi,
> 
> On Sat, Jun 13, 2015 at 5:50 PM, Carsten Ziegeler (JIRA)
> <ji...@apache.org> wrote:
>> ...Carsten Ziegeler commented on SLING-4752:
>> I've moved the prototype api to trunk...
> 
> I don't feel we have strong agreement on doing that, this new query
> API has been heavily discussed but I don't see an emerging consensus
> to add it to our core API.
> 
> Can we move this to a separate query-api bundle in order to avoid
> polluting our "sacred" core API bundle with something on which we
> don't strongly agree?
> 
Well, that's not possible as we need provider support and providers are
our core part.

So far I have not seen any compelling reason to not do it and the
advantages outweight the potential disadvantages.

Seriously, if we're not doing this, I'll simply pull off all the new
stuff and stop working on it.

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org