You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Julian Reschke <ju...@gmx.de> on 2014/08/25 16:06:32 UTC

MissingLastRevSeeker

Hi there,

it appears that the MissingLastRevSeeker (oak-core), when run, will be 
very slow on large repos, unless they use a MongoDocumentStore (which 
has a special-cased query).

Question: when will this code execute? I've seen it occasionally during 
benchmarking, but it doesn't seem to happen always.

Proposal: if this code *is* used regularly, we'll need an API so that 
DocumentStore implementations other than Mongo can optimize the query.

Best regards, Julian

Re: MissingLastRevSeeker

Posted by Julian Reschke <ju...@gmx.de>.
On 2014-08-26 08:03, Amit Jain wrote:
> Hi Julian,
>
> The LastRevRecoveryAgent is executed at 2 places
> 1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
> get potential candidates for recovery.

Sure? I've been logging it, and I don't see it called on every startup...

>  ...

Best regards, Julian

Re: MissingLastRevSeeker

Posted by Julian Reschke <ju...@gmx.de>.
On 2014-08-26 14:49, Julian Reschke wrote:
> On 2014-08-26 11:32, Amit Jain wrote:
>> Hi,
>>
>> I was proposing the additional method for cases where we want to query
>> the
>> indexed properties other than _id like needed in
>> MongoBlobReferenceIterator
>> and MongoMissingLastRevSeeker.
>>
>> But,
>>>> Can't we use this method to at least narrow down the query to the
>>>> lower bound? I think for the purpose of the last rev seeker, this
>>>> should be sufficient.
>> Yes, this should speed up the query from what we have currently. So,
>> right
>> now we can make this change and see if further improvement is necessary.
>> Will create a jira to track this.
>> ...
>
> +1.
>
> I can take over, if you want...

-> <https://issues.apache.org/jira/browse/OAK-2054>


Re: MissingLastRevSeeker

Posted by Julian Reschke <ju...@gmx.de>.
On 2014-08-26 11:32, Amit Jain wrote:
> Hi,
>
> I was proposing the additional method for cases where we want to query the
> indexed properties other than _id like needed in MongoBlobReferenceIterator
> and MongoMissingLastRevSeeker.
>
> But,
>>> Can't we use this method to at least narrow down the query to the
>>> lower bound? I think for the purpose of the last rev seeker, this
>>> should be sufficient.
> Yes, this should speed up the query from what we have currently. So, right
> now we can make this change and see if further improvement is necessary.
> Will create a jira to track this.
> ...

+1.

I can take over, if you want...


Re: MissingLastRevSeeker

Posted by Amit Jain <am...@ieee.org>.
Hi,

I was proposing the additional method for cases where we want to query the
indexed properties other than _id like needed in MongoBlobReferenceIterator
and MongoMissingLastRevSeeker.

But,
>> Can't we use this method to at least narrow down the query to the
>> lower bound? I think for the purpose of the last rev seeker, this
>> should be sufficient.
Yes, this should speed up the query from what we have currently. So, right
now we can make this change and see if further improvement is necessary.
Will create a jira to track this.



On Tue, Aug 26, 2014 at 2:37 PM, Marcel Reutegger <mr...@adobe.com>
wrote:

> Hi,
>
> I would only add it if really necessary. We already have a very
> similar method:
>
> /**
>      * Get a list of documents where the key is greater than a start value
> and
>      * less than an end value. The returned documents are immutable.
>      *
>      * @param <T> the document type
>      * @param collection the collection
>      * @param fromKey the start value (excluding)
>      * @param toKey the end value (excluding)
>      * @param indexedProperty the name of the indexed property (optional)
>      * @param startValue the minimum value of the indexed property
>      * @param limit the maximum number of entries to return
>      * @return the list (possibly empty)
>      */
>     @Nonnull
>     <T extends Document> List<T> query(Collection<T> collection,
>                                    String fromKey,
>                                    String toKey,
>                                    String indexedProperty,
>                                    long startValue,
>                                    int limit);
>
> Can't we use this method to at least narrow down the query to the
> lower bound? I think for the purpose of the last rev seeker, this
> should be sufficient.
>
> Regards
>  Marcel
>
>
>
> On 26/08/14 10:18, "Amit Jain" <am...@ieee.org> wrote:
>
> >Hi,
> >
> >>> OK, so can we put what's needed into the DocumentStore API, or
> >alternatively have an extension interface, that both MongoDocumentStore
> >and
> >RDBDocumentStore could implement?
> >
> >It would make sense to add a generic method which queries on a particular
> >property(possibly limiting to only indexed ones), like below, to the
> >DocumentStore interface.
> >    <T extends Document> List<T> queryProperty(Collection<T> collection,
> >                                       String indexedProperty,
> >                                       String fromKey,
> >                                       String toKey,
> >                                       int limit);
> >Thoughts?
> >
> >Thanks
> >Amit
> >
> >On Tue, Aug 26, 2014 at 12:03 PM, Julian Reschke <
> >julian.reschke@greenbytes.de> wrote:
> >
> >> On 2014-08-26 08:03, Amit Jain wrote:
> >>
> >>> Hi Julian,
> >>>
> >>> The LastRevRecoveryAgent is executed at 2 places
> >>> 1. On DocumentNodeStore startup where the MissingLastRevSeeker is used
> >>>to
> >>> get potential candidates for recovery.
> >>>   2. At regular intervals defined by the property
> >>> 'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService
> >>> (default
> >>> 60 seconds). Short description is that MissingLastRevSeeker will be
> >>>called
> >>> rarely in this case.
> >>> Long description - In this case a less expensive query is executed to
> >>>find
> >>> out all the stale clusterNodes for which recovery is to be performed.
> >>>If
> >>> there are clusterNodes that have unexpectedly shutdown and their
> >>> 'leaseEndTime' has not expired then MissingLastRevSeeker will check all
> >>> potential candidates.
> >>>
> >>>  Proposal: if this code *is* used regularly, we'll need an API so that
> >>>>>
> >>>> DocumentStore implementations other than Mongo can optimize the query.
> >>> +1. Since, It will be executed on every startup. RDBDocumentStore
> >>>already
> >>> maintains the index on _modified property so, optimized querying is
> >>> possible.
> >>>
> >>> Thanks
> >>> Amit
> >>>
> >>
> >> OK, so can we put what's needed into the DocumentStore API, or
> >> alternatively have an extension interface, that both MongoDocumentStore
> >>and
> >> RDBDocumentStore could implement?
> >>
> >> Best regards, Julian
> >>
>
>

Re: MissingLastRevSeeker

Posted by Marcel Reutegger <mr...@adobe.com>.
Hi,

I would only add it if really necessary. We already have a very
similar method:

/**
     * Get a list of documents where the key is greater than a start value
and
     * less than an end value. The returned documents are immutable.
     *
     * @param <T> the document type
     * @param collection the collection
     * @param fromKey the start value (excluding)
     * @param toKey the end value (excluding)
     * @param indexedProperty the name of the indexed property (optional)
     * @param startValue the minimum value of the indexed property
     * @param limit the maximum number of entries to return
     * @return the list (possibly empty)
     */
    @Nonnull
    <T extends Document> List<T> query(Collection<T> collection,
                                   String fromKey,
                                   String toKey,
                                   String indexedProperty,
                                   long startValue,
                                   int limit);

Can't we use this method to at least narrow down the query to the
lower bound? I think for the purpose of the last rev seeker, this
should be sufficient.

Regards
 Marcel



On 26/08/14 10:18, "Amit Jain" <am...@ieee.org> wrote:

>Hi,
>
>>> OK, so can we put what's needed into the DocumentStore API, or
>alternatively have an extension interface, that both MongoDocumentStore
>and
>RDBDocumentStore could implement?
>
>It would make sense to add a generic method which queries on a particular
>property(possibly limiting to only indexed ones), like below, to the
>DocumentStore interface.
>    <T extends Document> List<T> queryProperty(Collection<T> collection,
>                                       String indexedProperty,
>                                       String fromKey,
>                                       String toKey,
>                                       int limit);
>Thoughts?
>
>Thanks
>Amit
>
>On Tue, Aug 26, 2014 at 12:03 PM, Julian Reschke <
>julian.reschke@greenbytes.de> wrote:
>
>> On 2014-08-26 08:03, Amit Jain wrote:
>>
>>> Hi Julian,
>>>
>>> The LastRevRecoveryAgent is executed at 2 places
>>> 1. On DocumentNodeStore startup where the MissingLastRevSeeker is used
>>>to
>>> get potential candidates for recovery.
>>>   2. At regular intervals defined by the property
>>> 'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService
>>> (default
>>> 60 seconds). Short description is that MissingLastRevSeeker will be
>>>called
>>> rarely in this case.
>>> Long description - In this case a less expensive query is executed to
>>>find
>>> out all the stale clusterNodes for which recovery is to be performed.
>>>If
>>> there are clusterNodes that have unexpectedly shutdown and their
>>> 'leaseEndTime' has not expired then MissingLastRevSeeker will check all
>>> potential candidates.
>>>
>>>  Proposal: if this code *is* used regularly, we'll need an API so that
>>>>>
>>>> DocumentStore implementations other than Mongo can optimize the query.
>>> +1. Since, It will be executed on every startup. RDBDocumentStore
>>>already
>>> maintains the index on _modified property so, optimized querying is
>>> possible.
>>>
>>> Thanks
>>> Amit
>>>
>>
>> OK, so can we put what's needed into the DocumentStore API, or
>> alternatively have an extension interface, that both MongoDocumentStore
>>and
>> RDBDocumentStore could implement?
>>
>> Best regards, Julian
>>


Re: MissingLastRevSeeker

Posted by Amit Jain <am...@ieee.org>.
Hi,

>> OK, so can we put what's needed into the DocumentStore API, or
alternatively have an extension interface, that both MongoDocumentStore and
RDBDocumentStore could implement?

It would make sense to add a generic method which queries on a particular
property(possibly limiting to only indexed ones), like below, to the
DocumentStore interface.
    <T extends Document> List<T> queryProperty(Collection<T> collection,
                                       String indexedProperty,
                                       String fromKey,
                                       String toKey,
                                       int limit);
Thoughts?

Thanks
Amit

On Tue, Aug 26, 2014 at 12:03 PM, Julian Reschke <
julian.reschke@greenbytes.de> wrote:

> On 2014-08-26 08:03, Amit Jain wrote:
>
>> Hi Julian,
>>
>> The LastRevRecoveryAgent is executed at 2 places
>> 1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
>> get potential candidates for recovery.
>>   2. At regular intervals defined by the property
>> 'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService
>> (default
>> 60 seconds). Short description is that MissingLastRevSeeker will be called
>> rarely in this case.
>> Long description - In this case a less expensive query is executed to find
>> out all the stale clusterNodes for which recovery is to be performed. If
>> there are clusterNodes that have unexpectedly shutdown and their
>> 'leaseEndTime' has not expired then MissingLastRevSeeker will check all
>> potential candidates.
>>
>>  Proposal: if this code *is* used regularly, we'll need an API so that
>>>>
>>> DocumentStore implementations other than Mongo can optimize the query.
>> +1. Since, It will be executed on every startup. RDBDocumentStore already
>> maintains the index on _modified property so, optimized querying is
>> possible.
>>
>> Thanks
>> Amit
>>
>
> OK, so can we put what's needed into the DocumentStore API, or
> alternatively have an extension interface, that both MongoDocumentStore and
> RDBDocumentStore could implement?
>
> Best regards, Julian
>

Re: MissingLastRevSeeker

Posted by Julian Reschke <ju...@greenbytes.de>.
On 2014-08-26 08:03, Amit Jain wrote:
> Hi Julian,
>
> The LastRevRecoveryAgent is executed at 2 places
> 1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
> get potential candidates for recovery.
>   2. At regular intervals defined by the property
> 'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService (default
> 60 seconds). Short description is that MissingLastRevSeeker will be called
> rarely in this case.
> Long description - In this case a less expensive query is executed to find
> out all the stale clusterNodes for which recovery is to be performed. If
> there are clusterNodes that have unexpectedly shutdown and their
> 'leaseEndTime' has not expired then MissingLastRevSeeker will check all
> potential candidates.
>
>>> Proposal: if this code *is* used regularly, we'll need an API so that
> DocumentStore implementations other than Mongo can optimize the query.
> +1. Since, It will be executed on every startup. RDBDocumentStore already
> maintains the index on _modified property so, optimized querying is
> possible.
>
> Thanks
> Amit

OK, so can we put what's needed into the DocumentStore API, or 
alternatively have an extension interface, that both MongoDocumentStore 
and RDBDocumentStore could implement?

Best regards, Julian

Re: MissingLastRevSeeker

Posted by Amit Jain <am...@ieee.org>.
Hi Julian,

The LastRevRecoveryAgent is executed at 2 places
1. On DocumentNodeStore startup where the MissingLastRevSeeker is used to
get potential candidates for recovery.
 2. At regular intervals defined by the property
'lastRevRecoveryJobIntervalInSecs' in the DocumentNodeStoreService (default
60 seconds). Short description is that MissingLastRevSeeker will be called
rarely in this case.
Long description - In this case a less expensive query is executed to find
out all the stale clusterNodes for which recovery is to be performed. If
there are clusterNodes that have unexpectedly shutdown and their
'leaseEndTime' has not expired then MissingLastRevSeeker will check all
potential candidates.

>> Proposal: if this code *is* used regularly, we'll need an API so that
DocumentStore implementations other than Mongo can optimize the query.
+1. Since, It will be executed on every startup. RDBDocumentStore already
maintains the index on _modified property so, optimized querying is
possible.

Thanks
Amit


On Mon, Aug 25, 2014 at 7:36 PM, Julian Reschke <ju...@gmx.de>
wrote:

> Hi there,
>
> it appears that the MissingLastRevSeeker (oak-core), when run, will be
> very slow on large repos, unless they use a MongoDocumentStore (which has a
> special-cased query).
>
> Question: when will this code execute? I've seen it occasionally during
> benchmarking, but it doesn't seem to happen always.
>
> Proposal: if this code *is* used regularly, we'll need an API so that
> DocumentStore implementations other than Mongo can optimize the query.
>
> Best regards, Julian
>