You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Jeroen Reijn <j....@onehippo.com> on 2011/09/01 23:22:25 UTC

Re: Lucene consistency in clustered environment

On Wed, Aug 31, 2011 at 3:16 PM, Dennis van der Laan
<d....@rug.nl> wrote:
> Ian, others,
>
> As with many 'bugs' that have a workaround, this bug has been lying
> around for about a year now. We still have the problem that the
> cluster-nodes have different lucene indexes. At first, we thought this
> happened over time. Recently we made a copy of our production database
> and used it with 4 new cluster nodes (we cleared the journal table and
> the local revisions table, first). We started them all, completely
> clean, at which point all nodes started to build the lucene index.
> Without making any changes to the contents, we see different results for
> jackrabbit search queries on these 4 cluster nodes. So it seems the
> lucene indexes might differ more over time, but could differ right from
> the start.
>
> Does anybody have a clue how this could happen? Are we missing something?

I'm wondering what you mean with the statement: "different results for
jackrabbit search queries".
Could you perhaps show some of those queries? This could also be
related to your indexing configuration.

I asume you do not have an index when starting one of the cluster nodes?

BTW which version of Jackrabbit are you experiencing this with?

>
> TIA
> Dennis
>
> On 29-9-2010 12:37, Ian Boston wrote:
>> On 29 Sep 2010, at 11:33, Dennis van der Laan wrote:
>>
>>> From your reply I
>>> understand that this should not be the case with Lucene, is it?
>>
>> Every JournalRecord should have been replayed on every machine (at some time later if the JVM was down). That *should* ensure that all documents are indexed on all machines.
>> Sounds like this is not happening in your environment.
>>
>> Ian
>>
>
>
> --
> Dennis van der Laan, MSc
> Centre for Information Technology
> University of Groningen
>
>



-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Re: Lucene consistency in clustered environment

Posted by Dennis van der Laan <d....@rug.nl>.
On 1-9-2011 23:22, Jeroen Reijn wrote:
> On Wed, Aug 31, 2011 at 3:16 PM, Dennis van der Laan
> <d....@rug.nl> wrote:
>> Ian, others,
>>
>> As with many 'bugs' that have a workaround, this bug has been lying
>> around for about a year now. We still have the problem that the
>> cluster-nodes have different lucene indexes. At first, we thought this
>> happened over time. Recently we made a copy of our production database
>> and used it with 4 new cluster nodes (we cleared the journal table and
>> the local revisions table, first). We started them all, completely
>> clean, at which point all nodes started to build the lucene index.
>> Without making any changes to the contents, we see different results for
>> jackrabbit search queries on these 4 cluster nodes. So it seems the
>> lucene indexes might differ more over time, but could differ right from
>> the start.
>>
>> Does anybody have a clue how this could happen? Are we missing something?
> I'm wondering what you mean with the statement: "different results for
> jackrabbit search queries".
When doing a fulltext search (xpath query with a 'contains' clause), on
some cluster nodes a document containing the queried text might show up
in the results, whereas on other cluster nodes it may not. When we
update such a document so it gets indexed again on all cluster nodes
(hopefully), it may show up on all cluster nodes again. I do not have
numbers on how many documents are not indexed on all cluster nodes, but
happened too often to speak of 'an incident'.
> Could you perhaps show some of those queries? This could also be
> related to your indexing configuration.
I don't quite understand what you mean with 'related to your indexing
configuration'. We roll out our cluster nodes from a single templating
server, so the configuration for all cluster nodes is exactly the same,
except for the cluster id.
An example of a query with might not return the same results on all
cluster nodes:

/jcr:root/cms/documents//element(*,
nt:file)/jcr:content[(jcr:contains(cms:searchData/@cms:title, 'academy
assistent') or jcr:contains(@jcr:data, 'academy assistent')) and
(@cms:type = 'article')]/(@jcr:lastModified|rep:excerpt()|@cms:type)
order by @cms:sortfield ascending
>
> I asume you do not have an index when starting one of the cluster nodes?
Not when we start a fresh cluster node for the first time, no.
>
> BTW which version of Jackrabbit are you experiencing this with?
We are currently still using Jackrabbit 1.6.1

Thanks for taking a look at our problem!
Best regards,
Dennis
>
>> TIA
>> Dennis
>>
>> On 29-9-2010 12:37, Ian Boston wrote:
>>> On 29 Sep 2010, at 11:33, Dennis van der Laan wrote:
>>>
>>>> From your reply I
>>>> understand that this should not be the case with Lucene, is it?
>>> Every JournalRecord should have been replayed on every machine (at some time later if the JVM was down). That *should* ensure that all documents are indexed on all machines.
>>> Sounds like this is not happening in your environment.
>>>
>>> Ian
>>>
>>
>> --
>> Dennis van der Laan, MSc
>> Centre for Information Technology
>> University of Groningen
>>
>>
>
>


-- 
Dennis van der Laan, MSc
Centre for Information Technology
University of Groningen