You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Carsten Ziegeler <cz...@apache.org> on 2015/06/01 17:50:42 UTC

Re: [RT] New resource query API

One thing we have to think of is how to use pagination and sorting.
A search could go across the whole resource tree hitting different
resource providers. For example one use case is getting all vanity paths.
Sorting such a search is possible, the resource resolver implementation
always picks one result from all providers, compares them, brings them
into an order etc. That's definitely doable.

However pagination is a different beast. Of course, for paging the
result needs to be sorted. Today, a common practice for doing pagination
is key based pagination: with the search result you get a key that is
used for the next page. We can easily do this if the search is just
hitting a single resource provider. However if the search targets more
than one, pagination becomes a problem. For example, if the search has a
page size of 20 and two providers might provide resources. Asking each
of them for 20 entries, is not very efficient. Asking each provider for
ten, hoping that it's evenly split does usually not work either. I guess
one could come up with some clever algorithm that solves the problem by
potentially doing two (or more ) searches against a provider. But this
will be complicated and not really perform well.

So I think, we should support sorting across providers, but pagination
only across a single provider and throw some exception if a search
potentially hits more than one provider. It's a limitation we can document.

WDYT?

Regards
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 01.06.15 um 14:00 schrieb Alexander Klimetschek:
> On 01.06.2015, at 08:50, Carsten Ziegeler <cz...@apache.org> wrote:
>> So I think, we should support sorting across providers
> 
> This means sorting will be very slow. Since you have to re-sort the partial results on the sling level without usage of an index.
> 
No, that's not true - each provider sorts, so all you have to do is
merge the sorted results, which is trivial and does not require any
resorting or processing of the full data set

Carsten


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 09.06.2015, at 22:24, Carsten Ziegeler <cz...@apache.org> wrote:
>> If you mean whatever query language is supported by mongo or other nosql providers (and fits into a string), then that's the best you can do on the Sling layer.
>> 
> Thanks for confirming my point why an abstraction is necessary :)

But that's my point: introducing abstraction here comes at great cost. It's much easier to set up an external search index such as Solr, pump data from different backends (behind the different resource providers) and query this one directly for a search across everything (if this is what an application needs).

Cheers,
Alex


Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 09.06.15 um 21:01 schrieb Alexander Klimetschek:
> On 04.06.2015, at 21:56, Carsten Ziegeler <cz...@apache.org> wrote:
>> Ok, could you then please provide an implementation for the mongo
>> resource provider or one of the other nosql providers?
> 
> If you mean "jcr xpath" (or any other jcr/oak supported query language), no.
> 
> If you mean whatever query language is supported by mongo or other nosql providers (and fits into a string), then that's the best you can do on the Sling layer.
> 
Thanks for confirming my point why an abstraction is necessary :)

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 04.06.2015, at 21:56, Carsten Ziegeler <cz...@apache.org> wrote:
> Ok, could you then please provide an implementation for the mongo
> resource provider or one of the other nosql providers?

If you mean "jcr xpath" (or any other jcr/oak supported query language), no.

If you mean whatever query language is supported by mongo or other nosql providers (and fits into a string), then that's the best you can do on the Sling layer.

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 04.06.15 um 15:52 schrieb Alexander Klimetschek:
> On 03.06.2015, at 14:52, Carsten Ziegeler <cz...@apache.org> wrote:
>> Well, let's agree that we disagree here. For the majority of users,
>> there is only JCR anyway, which means there is no difference between
>> using a nice api and fiddling with strings by hand when it comes to
>> performance.
> 
> And that's exaclty the argument for not having a new query API that is agnostic to the individual resource provider implementations and would support aggregation.
> 
> What's there with findResources() is fine - just do your jcr query there. Only passing offset/limit is missing.
> 
Ok, could you then please provide an implementation for the mongo
resource provider or one of the other nosql providers?

Thanks
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 03.06.2015, at 14:52, Carsten Ziegeler <cz...@apache.org> wrote:
> Well, let's agree that we disagree here. For the majority of users,
> there is only JCR anyway, which means there is no difference between
> using a nice api and fiddling with strings by hand when it comes to
> performance.

And that's exaclty the argument for not having a new query API that is agnostic to the individual resource provider implementations and would support aggregation.

What's there with findResources() is fine - just do your jcr query there. Only passing offset/limit is missing.

Cheers,
Alex

Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 03.06.15 um 11:40 schrieb Alexander Klimetschek:
> On 02.06.2015, at 16:39, Carsten Ziegeler <cz...@apache.org> wrote:
>> The query contains the sort information (which properties and whether
>> ascending or descending), so you can get the values of the props and
>> compare them.
> 
> But then you need to
> 
> a) be able to understand and fully evaluate the query on the aggregate level

Nope, just the sorting

> b) cannot use an index for that and do a property read for every result entry

It's true, the property is read for every entry - but only(!) if there
is more than one resource provider providing results.

> 
> I am just saying, you are getting into a very complex and performance critical business. And even if you say "users won't use such edge-case queries, it's ok if we don't look for perfect performance", they will use it (experience tells me, people always find ways to run difficult & slow queries :D), and then you have just created a new performance critical area out of nowhere.

Well, let's agree that we disagree here. For the majority of users,
there is only JCR anyway, which means there is no difference between
using a nice api and fiddling with strings by hand when it comes to
performance.

Regards
Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 02.06.2015, at 16:39, Carsten Ziegeler <cz...@apache.org> wrote:
> The query contains the sort information (which properties and whether
> ascending or descending), so you can get the values of the props and
> compare them.

But then you need to

a) be able to understand and fully evaluate the query on the aggregate level
b) cannot use an index for that and do a property read for every result entry

I am just saying, you are getting into a very complex and performance critical business. And even if you say "users won't use such edge-case queries, it's ok if we don't look for perfect performance", they will use it (experience tells me, people always find ways to run difficult & slow queries :D), and then you have just created a new performance critical area out of nowhere.

Cheers,
Alex


Re: [RT] New resource query API

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 02.06.15 um 12:32 schrieb Alexander Klimetschek:
> On 02.06.2015, at 05:17, Daniel Klco <da...@gmail.com> wrote:
>> @Alex,
>>
>> Sorting wouldn't necessarily be slow.  What you could do is have the API
>> return an iterator which wraps the sorted result iterator from the various
>> resource providers.
> 
> And how do you know that result X from provider A is > result Y from provider B?
> 
The query contains the sort information (which properties and whether
ascending or descending), so you can get the values of the props and
compare them.

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 02.06.2015, at 05:17, Daniel Klco <da...@gmail.com> wrote:
> @Alex,
> 
> Sorting wouldn't necessarily be slow.  What you could do is have the API
> return an iterator which wraps the sorted result iterator from the various
> resource providers.

And how do you know that result X from provider A is > result Y from provider B?

Cheers,
Alex


Re: [RT] New resource query API

Posted by Daniel Klco <da...@gmail.com>.
@Alex,

Sorting wouldn't necessarily be slow.  What you could do is have the API
return an iterator which wraps the sorted result iterator from the various
resource providers.  This iterator would keep track of the next value of
the iterator from every resource provider result and return the closest
value for every next call of the wrapping iterator.  This should result in
performance of approximately On where n is the number of resource providers
for each next call.

In the case where sorting isn't provided, I would think it would just
interleave the results from the various resource providers.

-Dan

On Mon, Jun 1, 2015 at 5:00 PM, Alexander Klimetschek <ak...@adobe.com>
wrote:

> On 01.06.2015, at 08:50, Carsten Ziegeler <cz...@apache.org> wrote:
> > So I think, we should support sorting across providers
>
> This means sorting will be very slow. Since you have to re-sort the
> partial results on the sling level without usage of an index.
>
> Cheers,
> Alex

Re: [RT] New resource query API

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 01.06.2015, at 08:50, Carsten Ziegeler <cz...@apache.org> wrote:
> So I think, we should support sorting across providers

This means sorting will be very slow. Since you have to re-sort the partial results on the sling level without usage of an index.

Cheers,
Alex