You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Davide Giannella <da...@apache.org> on 2014/08/04 11:39:59 UTC

Re: svn commit: r1614891 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/api/jmx/ main/java/org/apache/jackrabbit/oak/plugins/index/ test/java/org/apache/jackrabbit/oak/plugins/index/

On 31/07/2014 15:47, alexparvulescu@apache.org wrote:
> Author: alexparvulescu
> Date: Thu Jul 31 13:47:06 2014
> New Revision: 1614891
>
> URL: http://svn.apache.org/r1614891
> Log:
> OAK-2004 Add a way to pause the background async indexer
>
Didn't look deeply into the commit. Did we consider the fact that when
the index is resumed it should spool any commit that came while paused?
If not, should we run then a reindex with the consequent time taken for
re-indexing large repositories?

D.

Re: svn commit: r1614891 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/api/jmx/ main/java/org/apache/jackrabbit/oak/plugins/index/ test/java/org/apache/jackrabbit/oak/plugins/index/

Posted by Alex Parvulescu <al...@gmail.com>.

Hi Davide,

Interesting, this was not even merged to 1.0 yet and it's already being
used for other-than-mentioned purposes :)

Drawbacks are around the fact that the async indexes are stale (falling
behind) so they will report old data to any query that hits them. Other
than than, issues could pop up around the volume of data being indexed, but
that is index implementation specific.

hope this helps,
alex

On Mon, Aug 4, 2014 at 1:56 PM, Davide Giannella <da...@apache.org> wrote:

> On 04/08/2014 11:49, Alex Parvulescu wrote:
> > Hi Davide,
> >
> > The way the async indexer works is it keeps a reference to the last
> indexed
> > revision, and on the next run it will build a diff containing everything
> > since. So when you resume it will include everything that changed already
> > without needed a full reindex.
> >
> Awesome!
>
> Side thinking. I know we stated to be used for debug but I see clients
> doing it for big bunch of imports. Let's say 5M nodes.
>
> Do we foresee any drawbacks with diffing a state of somewhere around 5M
> nodes behind? Other than the actual indexing itself I mean.
>
> D.
>
>
>

Re: svn commit: r1614891 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/api/jmx/ main/java/org/apache/jackrabbit/oak/plugins/index/ test/java/org/apache/jackrabbit/oak/plugins/index/

Posted by Davide Giannella <da...@apache.org>.

On 04/08/2014 11:49, Alex Parvulescu wrote:
> Hi Davide,
>
> The way the async indexer works is it keeps a reference to the last indexed
> revision, and on the next run it will build a diff containing everything
> since. So when you resume it will include everything that changed already
> without needed a full reindex.
>
Awesome!

Side thinking. I know we stated to be used for debug but I see clients
doing it for big bunch of imports. Let's say 5M nodes.

Do we foresee any drawbacks with diffing a state of somewhere around 5M
nodes behind? Other than the actual indexing itself I mean.

D.

Re: svn commit: r1614891 - in /jackrabbit/oak/trunk/oak-core/src: main/java/org/apache/jackrabbit/oak/api/jmx/ main/java/org/apache/jackrabbit/oak/plugins/index/ test/java/org/apache/jackrabbit/oak/plugins/index/

Posted by Alex Parvulescu <al...@gmail.com>.

Hi Davide,

The way the async indexer works is it keeps a reference to the last indexed
revision, and on the next run it will build a diff containing everything
since. So when you resume it will include everything that changed already
without needed a full reindex.

best,
alex

On Mon, Aug 4, 2014 at 11:39 AM, Davide Giannella <da...@apache.org> wrote:

> On 31/07/2014 15:47, alexparvulescu@apache.org wrote:
> > Author: alexparvulescu
> > Date: Thu Jul 31 13:47:06 2014
> > New Revision: 1614891
> >
> > URL: http://svn.apache.org/r1614891
> > Log:
> > OAK-2004 Add a way to pause the background async indexer
> >
> Didn't look deeply into the commit. Did we consider the fact that when
> the index is resumed it should spool any commit that came while paused?
> If not, should we run then a reindex with the consequent time taken for
> re-indexing large repositories?
>
> D.
>
>
>