You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lenya.apache.org by Andreas Hartmann <an...@apache.org> on 2007/05/24 17:31:10 UTC

[1.4] "Indexer is busy" problem

Hi Lenya devs,

what should we do about this issue?

http://issues.apache.org/bugzilla/show_bug.cgi?id=42510

I wouldn't like to implement a queue for incremental indexing
events before 1.4 is out, because I think it's quite a lot of
work (especially in the testing department), and I can't predict
the consequences without giving it some thought. Maybe someone
has a solution at the ready (crossing my fingers ...).

Should we silently ignore the error and just don't trigger
the indexing, or should we continue to throw the exception?

I hope someone comes up with a better idea :)

TIA!

-- Andreas


-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: [1.4] "Indexer is busy" problem

Posted by Andreas Hartmann <an...@apache.org>.
Joern Nettingsmeier schrieb:
> Andreas Hartmann wrote:
>> Bob Harner schrieb:
>>> On 5/24/07, Andreas Hartmann <an...@apache.org> wrote:
>>>> Hi Lenya devs,
>>>>
>>>> what should we do about this issue?
>>>>
>>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=42510
>>>>
>>>> I wouldn't like to implement a queue for incremental indexing
>>>> events before 1.4 is out, because I think it's quite a lot of
>>>> work (especially in the testing department), and I can't predict
>>>> the consequences without giving it some thought. Maybe someone
>>>> has a solution at the ready (crossing my fingers ...).
>>>>
>>>> Should we silently ignore the error and just don't trigger
>>>> the indexing, or should we continue to throw the exception?
>>>>
>>>> I hope someone comes up with a better idea :)
>>
>> [...]
>>
>>> A better interrim solution might be to display a helpful warning
>>> message rather than an exception:
>>>
>>> "Warning:  this document can't be added to the search index yet
>>> because the indexer is currently busy with another document.  Please
>>> re-publish this document in a moment to ensure that it is indexed."
> 
> +1
> 
>> The problem with this approach is that we can't determine if the
>> indexer will be busy before we apply the change to the document.
>> Since the indexer is a shared resource, we'd have to lock it to
>> prevent concurrent tasks from starting an indexing process while
>> the publishing (or any other action which changes the document
>> content) is in progress.
> 
> why? bob's suggestion means it can fail, but the user will be given a
> workaround. sounds ok as an interim solution.

The advantage is that the user knows that un-indexed documents
exist (or are published), but she still has to deactivate and re-publish
them. Anyway, I don't think we will be able to achieve anything better
for the moment.

BTW, I don't see how to implement this since repo observation (which is
used for indexing) is asynchronous ...


>> I'd be interested how other systems handle this. Maybe the indexing
>> has to be part of the transaction, so the transaction can be rolled
>> back if the indexing fails. But maybe we shouldn't invest too much
>> research in this issue but rather choose a powerful back-end which
>> supports indexing for the next major version.
> 
> i'd say let's ignore concurrency issues for 1.4 and document the
> shortcomings. we need to get this one out. without being negative, i
> think that most users that are eagerly waiting for a release have small
> to medium-size deployments and will only rarely encounter concurrency
> issues - we just don't have the track record atm to be considered for
> very large scale projects. let's not starve our core users too much by
> delaying 1.4 any further.
> instead, we should put up a roadmap where concurrency is an important
> topic for 1.5. incremental improvements - otherwise we'll die of
> second-system syndrome.

I agree. If anyone has an idea how to implement Bob's suggestion,
feel free to go ahead or make a proposal. I'll try to think about
it too when I find the time.

-- Andreas


-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: [1.4] "Indexer is busy" problem

Posted by Joern Nettingsmeier <ne...@folkwang-hochschule.de>.
Andreas Hartmann wrote:
> Bob Harner schrieb:
>> On 5/24/07, Andreas Hartmann <an...@apache.org> wrote:
>>> Hi Lenya devs,
>>>
>>> what should we do about this issue?
>>>
>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=42510
>>>
>>> I wouldn't like to implement a queue for incremental indexing
>>> events before 1.4 is out, because I think it's quite a lot of
>>> work (especially in the testing department), and I can't predict
>>> the consequences without giving it some thought. Maybe someone
>>> has a solution at the ready (crossing my fingers ...).
>>>
>>> Should we silently ignore the error and just don't trigger
>>> the indexing, or should we continue to throw the exception?
>>>
>>> I hope someone comes up with a better idea :)
> 
> [...]
> 
>> A better interrim solution might be to display a helpful warning
>> message rather than an exception:
>>
>> "Warning:  this document can't be added to the search index yet
>> because the indexer is currently busy with another document.  Please
>> re-publish this document in a moment to ensure that it is indexed."

+1

> The problem with this approach is that we can't determine if the
> indexer will be busy before we apply the change to the document.
> Since the indexer is a shared resource, we'd have to lock it to
> prevent concurrent tasks from starting an indexing process while
> the publishing (or any other action which changes the document
> content) is in progress.

why? bob's suggestion means it can fail, but the user will be given a 
workaround. sounds ok as an interim solution.

> I'd be interested how other systems handle this. Maybe the indexing
> has to be part of the transaction, so the transaction can be rolled
> back if the indexing fails. But maybe we shouldn't invest too much
> research in this issue but rather choose a powerful back-end which
> supports indexing for the next major version.

i'd say let's ignore concurrency issues for 1.4 and document the 
shortcomings. we need to get this one out. without being negative, i 
think that most users that are eagerly waiting for a release have small 
to medium-size deployments and will only rarely encounter concurrency 
issues - we just don't have the track record atm to be considered for 
very large scale projects. let's not starve our core users too much by 
delaying 1.4 any further.
instead, we should put up a roadmap where concurrency is an important 
topic for 1.5. incremental improvements - otherwise we'll die of 
second-system syndrome.

just my thoughts,

jörn




-- 
jörn nettingsmeier

home://germany/45128 essen/lortzingstr. 11/
http://spunk.dnsalias.org
phone://+49/201/491621

Kurt is up in Heaven now.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: [1.4] "Indexer is busy" problem

Posted by Andreas Hartmann <an...@apache.org>.
Bob Harner schrieb:
> On 5/24/07, Andreas Hartmann <an...@apache.org> wrote:
>> Hi Lenya devs,
>>
>> what should we do about this issue?
>>
>> http://issues.apache.org/bugzilla/show_bug.cgi?id=42510
>>
>> I wouldn't like to implement a queue for incremental indexing
>> events before 1.4 is out, because I think it's quite a lot of
>> work (especially in the testing department), and I can't predict
>> the consequences without giving it some thought. Maybe someone
>> has a solution at the ready (crossing my fingers ...).
>>
>> Should we silently ignore the error and just don't trigger
>> the indexing, or should we continue to throw the exception?
>>
>> I hope someone comes up with a better idea :)

[...]

> A better interrim solution might be to display a helpful warning
> message rather than an exception:
> 
> "Warning:  this document can't be added to the search index yet
> because the indexer is currently busy with another document.  Please
> re-publish this document in a moment to ensure that it is indexed."

The problem with this approach is that we can't determine if the
indexer will be busy before we apply the change to the document.
Since the indexer is a shared resource, we'd have to lock it to
prevent concurrent tasks from starting an indexing process while
the publishing (or any other action which changes the document
content) is in progress.

I'd be interested how other systems handle this. Maybe the indexing
has to be part of the transaction, so the transaction can be rolled
back if the indexing fails. But maybe we shouldn't invest too much
research in this issue but rather choose a powerful back-end which
supports indexing for the next major version.

-- Andreas


-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: [1.4] "Indexer is busy" problem

Posted by Bob Harner <bo...@gmail.com>.
On 5/24/07, Andreas Hartmann <an...@apache.org> wrote:
> Hi Lenya devs,
>
> what should we do about this issue?
>
> http://issues.apache.org/bugzilla/show_bug.cgi?id=42510
>
> I wouldn't like to implement a queue for incremental indexing
> events before 1.4 is out, because I think it's quite a lot of
> work (especially in the testing department), and I can't predict
> the consequences without giving it some thought. Maybe someone
> has a solution at the ready (crossing my fingers ...).
>
> Should we silently ignore the error and just don't trigger
> the indexing, or should we continue to throw the exception?
>
> I hope someone comes up with a better idea :)
>
> TIA!
>
> -- Andreas
>
>
> --
> Andreas Hartmann, CTO
> BeCompany GmbH
> http://www.becompany.ch

A better interrim solution might be to display a helpful warning
message rather than an exception:

"Warning:  this document can't be added to the search index yet
because the indexer is currently busy with another document.  Please
re-publish this document in a moment to ensure that it is indexed."

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org