You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Hans Lund <ha...@gmail.com> on 2014/02/20 10:37:44 UTC

NRT indexing and ControlledRealTimeReopenThread

Hi all

I'm a bit unsure about the intended function of
the ControlledRealTimeReopenThread in a NRT context - especially regarding
stale times.

As of now if you are waiting for a generation to become refreshed, it looks
like the stale time is either the min stale time or the max stale time. Is
this the intended behavior?

What i'm really looking for is 2 stale times with a slightly different
semantics. a stale time for refreshing when no specific generation is
needed, and another stale time for blocking acquiring of the blocked
searcher, (well the last time can actually be avoided all together as I
can't see any usage for a blocking acquiring should actually sleep at all
It would be better to run the SearchManager.maybeRefreshBlocking(); in the
thread needing the searcher @ a current generation.


Cheers

Re: NRT indexing and ControlledRealTimeReopenThread

Posted by Hans Lund <ha...@gmail.com>.

I've created https://issues.apache.org/jira/browse/LUCENE-5461, and
attached a small test that shows the error it a setup similar to what I
would like to run

The 1% is a overestimation - it seems to be related to concurrent commit on
the index writer

Hans Lund


On Thu, Feb 20, 2014 at 2:04 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Thu, Feb 20, 2014 at 7:52 AM, Hans Lund <ha...@gmail.com> wrote:
> > Ok, thats also what I expected, but not what I observed ;-)
>
> Ahh, not good.
>
> > For the very huge majority of index updates reopens are not an issue,
> > minutes will be very fine. A very few updates are done 'interactively'
> and
> > must be in RT (or as close as possible).
>
> That's precisely the use case this class is designed for.  I tried to
> describe it here:
>
> http://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html
>
> (We've since renamed NRTManager -> ControlledRTReopenThread).
>
> > I don't know if this is a rare use case - but we don't expect the rate of
> > specific generations request to be more than max a very few pr.
> > minute/hour, we do not have any reason to "pile up"  generation request
> > behind a minStaleSec - as there will only be the one request in the end
> > anyway. Therefore I have tried to set the minStaleSec to 0. Unfortunately
> > this do not work as I expected as waitforgeneration() now blocks up to
> > maxStaleTime (in about 1% of the time).
>
> That's not right.
>
> > If this is not the expected behavior I'll open an issue on it?
> > For now I've handled it by calling maybeRefreshBlocking() when needed
> from
> > outside the reopener thread - but i hate the reflection needed to read
> the
> > searchingGen ;-.)
>
> Please open an issue!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: NRT indexing and ControlledRealTimeReopenThread

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, Feb 20, 2014 at 7:52 AM, Hans Lund <ha...@gmail.com> wrote:
> Ok, thats also what I expected, but not what I observed ;-)

Ahh, not good.

> For the very huge majority of index updates reopens are not an issue,
> minutes will be very fine. A very few updates are done 'interactively' and
> must be in RT (or as close as possible).

That's precisely the use case this class is designed for.  I tried to
describe it here:
http://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html

(We've since renamed NRTManager -> ControlledRTReopenThread).

> I don't know if this is a rare use case - but we don't expect the rate of
> specific generations request to be more than max a very few pr.
> minute/hour, we do not have any reason to "pile up"  generation request
> behind a minStaleSec - as there will only be the one request in the end
> anyway. Therefore I have tried to set the minStaleSec to 0. Unfortunately
> this do not work as I expected as waitforgeneration() now blocks up to
> maxStaleTime (in about 1% of the time).

That's not right.

> If this is not the expected behavior I'll open an issue on it?
> For now I've handled it by calling maybeRefreshBlocking() when needed from
> outside the reopener thread - but i hate the reflection needed to read the
> searchingGen ;-.)

Please open an issue!

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: NRT indexing and ControlledRealTimeReopenThread

Posted by Hans Lund <ha...@gmail.com>.

Ok, thats also what I expected, but not what I observed ;-)

For the very huge majority of index updates reopens are not an issue,
minutes will be very fine. A very few updates are done 'interactively' and
must be in RT (or as close as possible).

I don't know if this is a rare use case - but we don't expect the rate of
specific generations request to be more than max a very few pr.
minute/hour, we do not have any reason to "pile up"  generation request
behind a minStaleSec - as there will only be the one request in the end
anyway. Therefore I have tried to set the minStaleSec to 0. Unfortunately
this do not work as I expected as waitforgeneration() now blocks up to
maxStaleTime (in about 1% of the time).

If this is not the expected behavior I'll open an issue on it?
For now I've handled it by calling maybeRefreshBlocking() when needed from
outside the reopener thread - but i hate the reflection needed to read the
searchingGen ;-.)

On Thu, Feb 20, 2014 at 12:12 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> It is intended that there are two different stale times.
>
> When a specific generation is requested, we wait for the minStaleSec
> since the last reopen; this is to prevent too-frequent reopens when
> specific gens are requested.
>
> The maxStaleSec is how long we wait between reopens for the "normal"
> periodic reopens, when the incoming request does not need a specific
> generation.
>
> This approach is only effective if most searches can just use the
> current searcher, i.e. most searches do not need a specific
> generation.  If you truly need "real-time" values for nearly all
> searches then LiveFieldValues might be useful.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Feb 20, 2014 at 4:37 AM, Hans Lund <ha...@gmail.com> wrote:
> > Hi all
> >
> > I'm a bit unsure about the intended function of
> > the ControlledRealTimeReopenThread in a NRT context - especially
> regarding
> > stale times.
> >
> > As of now if you are waiting for a generation to become refreshed, it
> looks
> > like the stale time is either the min stale time or the max stale time.
> Is
> > this the intended behavior?
> >
> > What i'm really looking for is 2 stale times with a slightly different
> > semantics. a stale time for refreshing when no specific generation is
> > needed, and another stale time for blocking acquiring of the blocked
> > searcher, (well the last time can actually be avoided all together as I
> > can't see any usage for a blocking acquiring should actually sleep at all
> > It would be better to run the SearchManager.maybeRefreshBlocking(); in
> the
> > thread needing the searcher @ a current generation.
> >
> >
> > Cheers
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: NRT indexing and ControlledRealTimeReopenThread

Posted by Michael McCandless <lu...@mikemccandless.com>.

It is intended that there are two different stale times.

When a specific generation is requested, we wait for the minStaleSec
since the last reopen; this is to prevent too-frequent reopens when
specific gens are requested.

The maxStaleSec is how long we wait between reopens for the "normal"
periodic reopens, when the incoming request does not need a specific
generation.

This approach is only effective if most searches can just use the
current searcher, i.e. most searches do not need a specific
generation.  If you truly need "real-time" values for nearly all
searches then LiveFieldValues might be useful.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Feb 20, 2014 at 4:37 AM, Hans Lund <ha...@gmail.com> wrote:
> Hi all
>
> I'm a bit unsure about the intended function of
> the ControlledRealTimeReopenThread in a NRT context - especially regarding
> stale times.
>
> As of now if you are waiting for a generation to become refreshed, it looks
> like the stale time is either the min stale time or the max stale time. Is
> this the intended behavior?
>
> What i'm really looking for is 2 stale times with a slightly different
> semantics. a stale time for refreshing when no specific generation is
> needed, and another stale time for blocking acquiring of the blocked
> searcher, (well the last time can actually be avoided all together as I
> can't see any usage for a blocking acquiring should actually sleep at all
> It would be better to run the SearchManager.maybeRefreshBlocking(); in the
> thread needing the searcher @ a current generation.
>
>
> Cheers

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org