You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Timothée Maret <ti...@gmail.com> on 2016/07/20 22:23:28 UTC

Streaming version of JobManager#findJobs API

Hi,

In our platform, we use the o.a.s.e.j.JobManager API signature:

      Collection<Job> findJobs(QueryType type, String topic, long limit,
Map<String, Object>... templates);

This signature and its implementation are fine except in cases where the
number of returned jobs is very large.
In those cases, the implementation can run the instance OOM because the
heap is consumed by the collection of jobs to be returned.

Instead of building the whole list in memory, we could instead stream the
set of jobs to be returned, thus allowing the API consumer to avoid the OOM.
This streaming behaviour would only be efficient if all the calls
implementing the signature do stream the result all the way down.

Looking at the JobManager#findJobs implementation, it seems this is the
case.

So, I suggest we add a method signature in JobManager API that allows to
return an Iterator or an Iterable.

Adding a signature to the JobManager would be a backward compatible change
as the JobManager is a provider type.

wdyt ?

Regards,

Timothee

Re: Streaming version of JobManager#findJobs API

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Thu, Jul 21, 2016 at 7:03 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> ...this means we shouldn't have added findJobs in the first place,
> therefore I'm more thinking of deprecating such functionality completely....

+1

-Bertrand

Re: Streaming version of JobManager#findJobs API

Posted by Timothée Maret <ti...@gmail.com>.
Hi Carsten,

Thanks! I have opened SLING-5884 to track deprecating (without alternative)
the JobManager methods allowing to manage the queue.

I could not find a reference to Ian's thread referring to that issue.
However if someone does, please link it in the issue.

Regards,

Timothee

2016-07-21 11:48 GMT+02:00 Carsten Ziegeler <cz...@apache.org>:

> Hi Tim,
>
> >
> >
> > I think the limit can be implemented in the loop that gets the items out
> of
> > the iterator (in the code consuming the iterator).
> > Since the down providers also provide iterators (and it seems they
> > implement those iterators properly without loading the whole content in
> > memory at any point), everything's in place to allow supporting a memory
> > efficient limit mechanism.
>
> True.
>
> >
> > In order to have an API which allows this
> >> I suggest a query API to be added to the resource provider. If we would
> >> have this, we wouldn't need a new method - given that the limit is set
> >> to a sensible value.
> >>
> >
> > Do you mean that the current consumers of JobManager#findJobs would
> > use o.a.s.s.r.p.QueryLanguageProvider#findResources
> > instead ?
> > This would work, already now IIUC, but it would require the current
> > consumers of JobManager#findJobs to know about the JobManager
> > implementation (content structure).
> >
>
> I actually meant an API which doesn't exist atm - but let's forget about
> that idea :)
>
> >
> > I understand the issue with giving the ability to search for jobs in an
> > unbounded queues.
> > This streaming idea was an attempt to mitigate one of those issues
> (memory
> > consumption).
> >
> > If the plan is to soon deprecate the method completely, there's no need
> for
> > a streaming version.
>
> I think we should do this - together with a set of other methods.
>
> >
> > However I don't see a direct alternative to this method except using the
> > more generic QueryLanguageProvider#findResources which would require the
> > consumer to know about the JobManager internals. Or would we deprecate
> > without alternative (which I think makes sense from Sling PoV) ?
> >
> Yes, the idea would be to deprecate without an alternative. So there is
> no way to find out the contents of a queue.
>
> Regards
> Carsten
>
> --
> Carsten Ziegeler
> Adobe Research Switzerland
> cziegeler@apache.org
>
>

Re: Streaming version of JobManager#findJobs API

Posted by Carsten Ziegeler <cz...@apache.org>.
Hi Tim,

> 
> 
> I think the limit can be implemented in the loop that gets the items out of
> the iterator (in the code consuming the iterator).
> Since the down providers also provide iterators (and it seems they
> implement those iterators properly without loading the whole content in
> memory at any point), everything's in place to allow supporting a memory
> efficient limit mechanism.

True.

> 
> In order to have an API which allows this
>> I suggest a query API to be added to the resource provider. If we would
>> have this, we wouldn't need a new method - given that the limit is set
>> to a sensible value.
>>
> 
> Do you mean that the current consumers of JobManager#findJobs would
> use o.a.s.s.r.p.QueryLanguageProvider#findResources
> instead ?
> This would work, already now IIUC, but it would require the current
> consumers of JobManager#findJobs to know about the JobManager
> implementation (content structure).
> 

I actually meant an API which doesn't exist atm - but let's forget about
that idea :)

> 
> I understand the issue with giving the ability to search for jobs in an
> unbounded queues.
> This streaming idea was an attempt to mitigate one of those issues (memory
> consumption).
> 
> If the plan is to soon deprecate the method completely, there's no need for
> a streaming version.

I think we should do this - together with a set of other methods.

> 
> However I don't see a direct alternative to this method except using the
> more generic QueryLanguageProvider#findResources which would require the
> consumer to know about the JobManager internals. Or would we deprecate
> without alternative (which I think makes sense from Sling PoV) ?
> 
Yes, the idea would be to deprecate without an alternative. So there is
no way to find out the contents of a queue.

Regards
Carsten

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org


Re: Streaming version of JobManager#findJobs API

Posted by Timothée Maret <ti...@gmail.com>.
Hi Carsten,

2016-07-21 7:03 GMT+02:00 Carsten Ziegeler <cz...@apache.org>:

> Hi,
>
> I'm not against adding such a method in general, however there are some
> things to consider:
> Usually you should provide a limit, unfortunately that limit can't be
> passed down to the providers.


I think the limit can be implemented in the loop that gets the items out of
the iterator (in the code consuming the iterator).
Since the down providers also provide iterators (and it seems they
implement those iterators properly without loading the whole content in
memory at any point), everything's in place to allow supporting a memory
efficient limit mechanism.

In order to have an API which allows this
> I suggest a query API to be added to the resource provider. If we would
> have this, we wouldn't need a new method - given that the limit is set
> to a sensible value.
>

Do you mean that the current consumers of JobManager#findJobs would
use o.a.s.s.r.p.QueryLanguageProvider#findResources
instead ?
This would work, already now IIUC, but it would require the current
consumers of JobManager#findJobs to know about the JobManager
implementation (content structure).


>
> But more important, if you compare Sling jobs to other messaging/queuing
> systems, these other solutions have no findJobs method. It's not
> possible to get the contents of the queue. I think Ian wrote some very
> good analysis some time ago and create a prototype for a new messaging api.
> And this means we shouldn't have added findJobs in the first place,
> therefore I'm more thinking of deprecating such functionality completely.
>

I understand the issue with giving the ability to search for jobs in an
unbounded queues.
This streaming idea was an attempt to mitigate one of those issues (memory
consumption).

If the plan is to soon deprecate the method completely, there's no need for
a streaming version.

However I don't see a direct alternative to this method except using the
more generic QueryLanguageProvider#findResources which would require the
consumer to know about the JobManager internals. Or would we deprecate
without alternative (which I think makes sense from Sling PoV) ?

Regards,

Timothee


> Regards
> Carsten
>
> > Hi,
> >
> > In our platform, we use the o.a.s.e.j.JobManager API signature:
> >
> >       Collection<Job> findJobs(QueryType type, String topic, long limit,
> > Map<String, Object>... templates);
> >
> > This signature and its implementation are fine except in cases where the
> > number of returned jobs is very large.
> > In those cases, the implementation can run the instance OOM because the
> > heap is consumed by the collection of jobs to be returned.
> >
> > Instead of building the whole list in memory, we could instead stream the
> > set of jobs to be returned, thus allowing the API consumer to avoid the
> OOM.
> > This streaming behaviour would only be efficient if all the calls
> > implementing the signature do stream the result all the way down.
> >
> > Looking at the JobManager#findJobs implementation, it seems this is the
> > case.
> >
> > So, I suggest we add a method signature in JobManager API that allows to
> > return an Iterator or an Iterable.
> >
> > Adding a signature to the JobManager would be a backward compatible
> change
> > as the JobManager is a provider type.
> >
> > wdyt ?
> >
> > Regards,
> >
> > Timothee
> >
>
>
>
>
> --
> Carsten Ziegeler
> Adobe Research Switzerland
> cziegeler@apache.org
>
>

Re: Streaming version of JobManager#findJobs API

Posted by Carsten Ziegeler <cz...@apache.org>.
Hi,

I'm not against adding such a method in general, however there are some
things to consider:
Usually you should provide a limit, unfortunately that limit can't be
passed down to the providers. In order to have an API which allows this
I suggest a query API to be added to the resource provider. If we would
have this, we wouldn't need a new method - given that the limit is set
to a sensible value.

But more important, if you compare Sling jobs to other messaging/queuing
systems, these other solutions have no findJobs method. It's not
possible to get the contents of the queue. I think Ian wrote some very
good analysis some time ago and create a prototype for a new messaging api.
And this means we shouldn't have added findJobs in the first place,
therefore I'm more thinking of deprecating such functionality completely.

Regards
Carsten

> Hi,
> 
> In our platform, we use the o.a.s.e.j.JobManager API signature:
> 
>       Collection<Job> findJobs(QueryType type, String topic, long limit,
> Map<String, Object>... templates);
> 
> This signature and its implementation are fine except in cases where the
> number of returned jobs is very large.
> In those cases, the implementation can run the instance OOM because the
> heap is consumed by the collection of jobs to be returned.
> 
> Instead of building the whole list in memory, we could instead stream the
> set of jobs to be returned, thus allowing the API consumer to avoid the OOM.
> This streaming behaviour would only be efficient if all the calls
> implementing the signature do stream the result all the way down.
> 
> Looking at the JobManager#findJobs implementation, it seems this is the
> case.
> 
> So, I suggest we add a method signature in JobManager API that allows to
> return an Iterator or an Iterable.
> 
> Adding a signature to the JobManager would be a backward compatible change
> as the JobManager is a provider type.
> 
> wdyt ?
> 
> Regards,
> 
> Timothee
> 


 

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org