You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Baskakov Daniel <gd...@gmail.com> on 2016/06/23 11:47:31 UTC

Synchronous Lucene index update tests occasionally fail

Originally i've posted the question at stackoverflow.com but without any
reply. So I hope someone can help me in the official list.

I'm testing that dynamic changes of the domain model reflects at the Lucene
index. Special event listeners (synchronous, no multithreading here) are
executed when the domain model components change. Listeners update the
Lucene index:

Document doc = createDocumentForComponent(domainModelComponent);
indexWriter.updateDocument(docTerm, doc);
indexWriter.commit();

Then I perform searching by a query that contains recently added changes.
Most of the time tests work perfect, but sometimes they fail (especially in
automated builds).

I've tried to acquire an IndexSearcher by different ways: create a new
searcher on the same Directory or obtain it via SearcherManager.

Is there a way to made recent index changes available to index searcher
with 100% confidence?

Re: Synchronous Lucene index update tests occasionally fail

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK, thanks for bringing closure!

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 15, 2016 at 7:19 AM, Daniq <gd...@gmail.com> wrote:

> I've simplified the test and it works OK. Possibly it was very deep
> simplification) but likely the initial problem is in code from our side.
>
>
> Daniq wrote
> > Ok, I'll try it a bit later.
> > Thank you.
> > Michael McCandless-2 wrote
> >> No, casing issues would not be random; it was just what I noticed in
> your
> >> test case.
> >>
> >> Can you fix the casing issue in your test case and confirm it's still
> >> failing and if so, simplify the test case further, e.g. remove the
> >> context
> >> manager and create a Lucene Document directly, and then see if it's
> still
> >> failing?
> >>
> >> We just need to simplify it to the point where the bug is isolated.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Tue, Jun 28, 2016 at 4:04 AM, Daniq &lt;
>
> >> gdaniq@
>
> >> &gt; wrote:
> >>
> >>> Can casing issue appear randomly? Because tests not fail constantly,
> >>> they
> >>> are
> >>> blinking.
> >>>
> >>> Daniel.
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>> http://lucene.472066.n3.nabble.com/Synchronous-Lucene-
> index-update-tests-occasionally-fail-tp4283970p4284657.html
> >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail:
>
> >> java-user-unsubscribe@.apache
>
> >>> For additional commands, e-mail:
>
> >> java-user-help@.apache
>
> >>>
> >>>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Synchronous-Lucene-index-update-tests-occasionally-fail-
> tp4283970p4291776.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Synchronous Lucene index update tests occasionally fail

Posted by Daniq <gd...@gmail.com>.

I've simplified the test and it works OK. Possibly it was very deep
simplification) but likely the initial problem is in code from our side.


Daniq wrote
> Ok, I'll try it a bit later.
> Thank you.
> Michael McCandless-2 wrote
>> No, casing issues would not be random; it was just what I noticed in your
>> test case.
>> 
>> Can you fix the casing issue in your test case and confirm it's still
>> failing and if so, simplify the test case further, e.g. remove the
>> context
>> manager and create a Lucene Document directly, and then see if it's still
>> failing?
>> 
>> We just need to simplify it to the point where the bug is isolated.
>> 
>> Mike McCandless
>> 
>> http://blog.mikemccandless.com
>> 
>> On Tue, Jun 28, 2016 at 4:04 AM, Daniq &lt;

>> gdaniq@

>> &gt; wrote:
>> 
>>> Can casing issue appear randomly? Because tests not fail constantly,
>>> they
>>> are
>>> blinking.
>>>
>>> Daniel.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Synchronous-Lucene-index-update-tests-occasionally-fail-tp4283970p4284657.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: 

>> java-user-unsubscribe@.apache

>>> For additional commands, e-mail: 

>> java-user-help@.apache

>>>
>>>





--
View this message in context: http://lucene.472066.n3.nabble.com/Synchronous-Lucene-index-update-tests-occasionally-fail-tp4283970p4291776.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Synchronous Lucene index update tests occasionally fail

Posted by Daniq <gd...@gmail.com>.

Ok, I'll try it a bit later.
Thank you.


Michael McCandless-2 wrote
> No, casing issues would not be random; it was just what I noticed in your
> test case.
> 
> Can you fix the casing issue in your test case and confirm it's still
> failing and if so, simplify the test case further, e.g. remove the context
> manager and create a Lucene Document directly, and then see if it's still
> failing?
> 
> We just need to simplify it to the point where the bug is isolated.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Tue, Jun 28, 2016 at 4:04 AM, Daniq &lt;

> gdaniq@

> &gt; wrote:
> 
>> Can casing issue appear randomly? Because tests not fail constantly, they
>> are
>> blinking.
>>
>> Daniel.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Synchronous-Lucene-index-update-tests-occasionally-fail-tp4283970p4284657.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: 

> java-user-unsubscribe@.apache

>> For additional commands, e-mail: 

> java-user-help@.apache

>>
>>





--
View this message in context: http://lucene.472066.n3.nabble.com/Synchronous-Lucene-index-update-tests-occasionally-fail-tp4283970p4284698.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Synchronous Lucene index update tests occasionally fail

Posted by Michael McCandless <lu...@mikemccandless.com>.

No, casing issues would not be random; it was just what I noticed in your
test case.

Can you fix the casing issue in your test case and confirm it's still
failing and if so, simplify the test case further, e.g. remove the context
manager and create a Lucene Document directly, and then see if it's still
failing?

We just need to simplify it to the point where the bug is isolated.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2016 at 4:04 AM, Daniq <gd...@gmail.com> wrote:

> Can casing issue appear randomly? Because tests not fail constantly, they
> are
> blinking.
>
> Daniel.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Synchronous-Lucene-index-update-tests-occasionally-fail-tp4283970p4284657.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Synchronous Lucene index update tests occasionally fail

Posted by Daniq <gd...@gmail.com>.

Can casing issue appear randomly? Because tests not fail constantly, they are
blinking.

Daniel.



--
View this message in context: http://lucene.472066.n3.nabble.com/Synchronous-Lucene-index-update-tests-occasionally-fail-tp4283970p4284657.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Synchronous Lucene index update tests occasionally fail

Posted by Michael McCandless <lu...@mikemccandless.com>.

It looks like you have a casing issue maybe?

You indexed variableWithHelpString.

But searched for variablewithhelpstring*

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 27, 2016 at 5:07 AM, Baskakov Daniel <gd...@gmail.com> wrote:

> I've just noticed that not only dynamic adding/removing entities tests
> fail, but also a simple indexing.
>
> Here is a boiled down structure of the test:
>
>   @BeforeClass
>   public static void beforeClass() throws Exception
>   {
>     // ContextManager is a domain model
>     contextManager = createContextManager();
>
>     searcher = new
> ServerSearcher(Collections.singletonList(indexingSettings), false);
>
>     searcher.openRamDirectory();
>
>     // Context is a domain model item, it has variables. One of the
> contexts has "variableWithHelpString" variable.
>     // searcher.createContextDocument(aContext) creates a Lucene document
> with a field: new TextField(field, value, Field.Store.YES)
>     for (Context aContext : contextManager)
>     {
>       Document doc = searcher.createContextDocument(aContext);
>
>       final Term idTerm = new Term(ID_FIELD, doc.get(ID_FIELD));
>       // using updateDocument here because the domain model is dynamic
>       searcher.getIndexWriter().updateDocument(idTerm, doc);
>     }
>  }
>
>   @Test
>   public void testVariableName() throws Exception
>   {
>     searcher.commitNow();
>
>     String text = "variableWithHelpString";
>
>     final MultiFieldQueryParser queryParser = new
> MultiFieldQueryParser(ServerSearcher.ALL_SEARCH_FIELDS, searcher.analyzer);
>     queryParser.setDefaultOperator(QueryParser.Operator.AND);
>
>     text = text.replaceAll("(\\w+)", "$1\\*");
>
>     final Query query = queryParser.parse(text);
>
>     final IndexSearcher searcher = searcher.acquireIndexSearcher();
>
>     TopScoreDocCollector docCollector = TopScoreDocCollector.create(5000);
>
>     searcher.search(query, docCollector);
>
>     final ScoreDoc[] scoreDocs = docCollector.topDocs().scoreDocs;
>
>     assertThat(scoreDocs.length(), is(1));
>   }
>
> Also there is a comprehensive logging during the test invocation. And it
> can be seen that document for the context with 'variableWithHelpString' is
> properly created and added to IW:
> 12:34:23,499 DEBUG ag.context.search         Variable Definition document
> added to index:
>
> Document<stored,indexed,indexOptions=DOCS<id:rootcontext:variableWithHelpString>
>
> stored,indexed,tokenized,omitNorms,indexOptions=DOCS,numericType=INT,numericPrecisionStep=8<docType:1>
> stored,indexed,indexOptions=DOCS<contextPath:rootcontext>
> stored,indexed,tokenized<name:variableWithHelpString>
> stored,indexed,tokenized<description:variableWithHelpString>
> stored,indexed,tokenized<value:Two Beer Or Not Two Beer?, , >
> stored,indexed,tokenized<fields:Dummy Field (dummyField)>
> stored,indexed,tokenized<fields:Variable Field (variableField) [Variable
> Field Help]> stored,indexed,tokenized<fields:Dummy Field (dummyField1)>>
>   -     -     -     -     -     -     -     -     -     -     [main]
>
> Here is the later log output for search operation that returns no document:
> 12:34:23,796 DEBUG ag.context.search         Search
> 'name:variablewithhelpstring* type:variablewithhelpstring*
> description:variablewithhelpstring* help:variablewithhelpstring*
> fields:variablewithhelpstring* outputFields:variablewithhelpstring*
> value:variablewithhelpstring*' took '0' seconds and returned 0 hits     -
>   -     -     -     -     -     -     -     -     -     [main]
>
> Daniel.
>
> пн, 27 июн. 2016 г. в 11:12, Michael McCandless <lucene@mikemccandless.com
> >:
>
> > Can you boil this down to a small standalone test case showing the issue?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Mon, Jun 27, 2016 at 4:03 AM, Baskakov Daniel <gd...@gmail.com>
> wrote:
> >
> >> Thank you Mike.
> >>
> >> Commit is performed after each indexing op in unit tests only:
> >>
> >>   public void commitNow() throws IOException
> >>   {
> >>     if (getIndexWriter().hasUncommittedChanges())
> >>     {
> >>       getIndexWriter().commit();
> >>     }
> >>   }
> >>
> >> In production environment I have a timer that performs commit
> periodically
> >> if required.
> >>
> >> I do reopen near-real-time IR before every test search (thanks to your
> >> blog!):
> >>
> >>   private IndexSearcher acquireIndexSearcher() throws IOException
> >>   {
> >>     if (searcherManager == null)
> >>     {
> >>       searcherManager = new SearcherManager(getIndexWriter(), true,
> null);
> >>     }
> >>     searcherManager.maybeRefreshBlocking();
> >>     return searcherManager.acquire();
> >>   }
> >>
> >> But the problem is still there.
> >>
> >> Daniel.
> >>
> >> чт, 23 июн. 2016 г. в 17:19, Michael McCandless <
> >> lucene@mikemccandless.com>:
> >>
> >> > You must reopen your IndexReader to see recent changes to the index.
> >> >
> >> > But, IW.commit after each indexing op is very costly.
> >> >
> >> > It's much better to get near-real-time readers, e.g. from a
> >> > SearcherManager that you pass your IW instance too, after each set of
> >> > changes that you now need to search.
> >> >
> >> > As long as you call SearcherManager.maybeRefreshBlocking after changes
> >> to
> >> > the IW, the resulting reopened reader will reflect your index changes.
> >> >
> >> > Mike McCandless
> >> >
> >> > http://blog.mikemccandless.com
> >> >
> >> > On Thu, Jun 23, 2016 at 7:47 AM, Baskakov Daniel <gd...@gmail.com>
> >> wrote:
> >> >
> >> >> Originally i've posted the question at stackoverflow.com but without
> >> any
> >> >> reply. So I hope someone can help me in the official list.
> >> >>
> >> >> I'm testing that dynamic changes of the domain model reflects at the
> >> >> Lucene
> >> >> index. Special event listeners (synchronous, no multithreading here)
> >> are
> >> >> executed when the domain model components change. Listeners update
> the
> >> >> Lucene index:
> >> >>
> >> >> Document doc = createDocumentForComponent(domainModelComponent);
> >> >> indexWriter.updateDocument(docTerm, doc);
> >> >> indexWriter.commit();
> >> >>
> >> >> Then I perform searching by a query that contains recently added
> >> changes.
> >> >> Most of the time tests work perfect, but sometimes they fail
> >> (especially
> >> >> in
> >> >> automated builds).
> >> >>
> >> >> I've tried to acquire an IndexSearcher by different ways: create a
> new
> >> >> searcher on the same Directory or obtain it via SearcherManager.
> >> >>
> >> >> Is there a way to made recent index changes available to index
> searcher
> >> >> with 100% confidence?
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Synchronous Lucene index update tests occasionally fail

Posted by Baskakov Daniel <gd...@gmail.com>.

I've just noticed that not only dynamic adding/removing entities tests
fail, but also a simple indexing.

Here is a boiled down structure of the test:

  @BeforeClass
  public static void beforeClass() throws Exception
  {
    // ContextManager is a domain model
    contextManager = createContextManager();

    searcher = new
ServerSearcher(Collections.singletonList(indexingSettings), false);

    searcher.openRamDirectory();

    // Context is a domain model item, it has variables. One of the
contexts has "variableWithHelpString" variable.
    // searcher.createContextDocument(aContext) creates a Lucene document
with a field: new TextField(field, value, Field.Store.YES)
    for (Context aContext : contextManager)
    {
      Document doc = searcher.createContextDocument(aContext);

      final Term idTerm = new Term(ID_FIELD, doc.get(ID_FIELD));
      // using updateDocument here because the domain model is dynamic
      searcher.getIndexWriter().updateDocument(idTerm, doc);
    }
 }

  @Test
  public void testVariableName() throws Exception
  {
    searcher.commitNow();

    String text = "variableWithHelpString";

    final MultiFieldQueryParser queryParser = new
MultiFieldQueryParser(ServerSearcher.ALL_SEARCH_FIELDS, searcher.analyzer);
    queryParser.setDefaultOperator(QueryParser.Operator.AND);

    text = text.replaceAll("(\\w+)", "$1\\*");

    final Query query = queryParser.parse(text);

    final IndexSearcher searcher = searcher.acquireIndexSearcher();

    TopScoreDocCollector docCollector = TopScoreDocCollector.create(5000);

    searcher.search(query, docCollector);

    final ScoreDoc[] scoreDocs = docCollector.topDocs().scoreDocs;

    assertThat(scoreDocs.length(), is(1));
  }

Also there is a comprehensive logging during the test invocation. And it
can be seen that document for the context with 'variableWithHelpString' is
properly created and added to IW:
12:34:23,499 DEBUG ag.context.search         Variable Definition document
added to index:
Document<stored,indexed,indexOptions=DOCS<id:rootcontext:variableWithHelpString>
stored,indexed,tokenized,omitNorms,indexOptions=DOCS,numericType=INT,numericPrecisionStep=8<docType:1>
stored,indexed,indexOptions=DOCS<contextPath:rootcontext>
stored,indexed,tokenized<name:variableWithHelpString>
stored,indexed,tokenized<description:variableWithHelpString>
stored,indexed,tokenized<value:Two Beer Or Not Two Beer?, , >
stored,indexed,tokenized<fields:Dummy Field (dummyField)>
stored,indexed,tokenized<fields:Variable Field (variableField) [Variable
Field Help]> stored,indexed,tokenized<fields:Dummy Field (dummyField1)>>
  -     -     -     -     -     -     -     -     -     -     [main]

Here is the later log output for search operation that returns no document:
12:34:23,796 DEBUG ag.context.search         Search
'name:variablewithhelpstring* type:variablewithhelpstring*
description:variablewithhelpstring* help:variablewithhelpstring*
fields:variablewithhelpstring* outputFields:variablewithhelpstring*
value:variablewithhelpstring*' took '0' seconds and returned 0 hits     -
  -     -     -     -     -     -     -     -     -     [main]

Daniel.

пн, 27 июн. 2016 г. в 11:12, Michael McCandless <lu...@mikemccandless.com>:

> Can you boil this down to a small standalone test case showing the issue?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Jun 27, 2016 at 4:03 AM, Baskakov Daniel <gd...@gmail.com> wrote:
>
>> Thank you Mike.
>>
>> Commit is performed after each indexing op in unit tests only:
>>
>>   public void commitNow() throws IOException
>>   {
>>     if (getIndexWriter().hasUncommittedChanges())
>>     {
>>       getIndexWriter().commit();
>>     }
>>   }
>>
>> In production environment I have a timer that performs commit periodically
>> if required.
>>
>> I do reopen near-real-time IR before every test search (thanks to your
>> blog!):
>>
>>   private IndexSearcher acquireIndexSearcher() throws IOException
>>   {
>>     if (searcherManager == null)
>>     {
>>       searcherManager = new SearcherManager(getIndexWriter(), true, null);
>>     }
>>     searcherManager.maybeRefreshBlocking();
>>     return searcherManager.acquire();
>>   }
>>
>> But the problem is still there.
>>
>> Daniel.
>>
>> чт, 23 июн. 2016 г. в 17:19, Michael McCandless <
>> lucene@mikemccandless.com>:
>>
>> > You must reopen your IndexReader to see recent changes to the index.
>> >
>> > But, IW.commit after each indexing op is very costly.
>> >
>> > It's much better to get near-real-time readers, e.g. from a
>> > SearcherManager that you pass your IW instance too, after each set of
>> > changes that you now need to search.
>> >
>> > As long as you call SearcherManager.maybeRefreshBlocking after changes
>> to
>> > the IW, the resulting reopened reader will reflect your index changes.
>> >
>> > Mike McCandless
>> >
>> > http://blog.mikemccandless.com
>> >
>> > On Thu, Jun 23, 2016 at 7:47 AM, Baskakov Daniel <gd...@gmail.com>
>> wrote:
>> >
>> >> Originally i've posted the question at stackoverflow.com but without
>> any
>> >> reply. So I hope someone can help me in the official list.
>> >>
>> >> I'm testing that dynamic changes of the domain model reflects at the
>> >> Lucene
>> >> index. Special event listeners (synchronous, no multithreading here)
>> are
>> >> executed when the domain model components change. Listeners update the
>> >> Lucene index:
>> >>
>> >> Document doc = createDocumentForComponent(domainModelComponent);
>> >> indexWriter.updateDocument(docTerm, doc);
>> >> indexWriter.commit();
>> >>
>> >> Then I perform searching by a query that contains recently added
>> changes.
>> >> Most of the time tests work perfect, but sometimes they fail
>> (especially
>> >> in
>> >> automated builds).
>> >>
>> >> I've tried to acquire an IndexSearcher by different ways: create a new
>> >> searcher on the same Directory or obtain it via SearcherManager.
>> >>
>> >> Is there a way to made recent index changes available to index searcher
>> >> with 100% confidence?
>> >>
>> >
>> >
>>
>
>

Re: Synchronous Lucene index update tests occasionally fail

Posted by Michael McCandless <lu...@mikemccandless.com>.

Can you boil this down to a small standalone test case showing the issue?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 27, 2016 at 4:03 AM, Baskakov Daniel <gd...@gmail.com> wrote:

> Thank you Mike.
>
> Commit is performed after each indexing op in unit tests only:
>
>   public void commitNow() throws IOException
>   {
>     if (getIndexWriter().hasUncommittedChanges())
>     {
>       getIndexWriter().commit();
>     }
>   }
>
> In production environment I have a timer that performs commit periodically
> if required.
>
> I do reopen near-real-time IR before every test search (thanks to your
> blog!):
>
>   private IndexSearcher acquireIndexSearcher() throws IOException
>   {
>     if (searcherManager == null)
>     {
>       searcherManager = new SearcherManager(getIndexWriter(), true, null);
>     }
>     searcherManager.maybeRefreshBlocking();
>     return searcherManager.acquire();
>   }
>
> But the problem is still there.
>
> Daniel.
>
> чт, 23 июн. 2016 г. в 17:19, Michael McCandless <lucene@mikemccandless.com
> >:
>
> > You must reopen your IndexReader to see recent changes to the index.
> >
> > But, IW.commit after each indexing op is very costly.
> >
> > It's much better to get near-real-time readers, e.g. from a
> > SearcherManager that you pass your IW instance too, after each set of
> > changes that you now need to search.
> >
> > As long as you call SearcherManager.maybeRefreshBlocking after changes to
> > the IW, the resulting reopened reader will reflect your index changes.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Thu, Jun 23, 2016 at 7:47 AM, Baskakov Daniel <gd...@gmail.com>
> wrote:
> >
> >> Originally i've posted the question at stackoverflow.com but without
> any
> >> reply. So I hope someone can help me in the official list.
> >>
> >> I'm testing that dynamic changes of the domain model reflects at the
> >> Lucene
> >> index. Special event listeners (synchronous, no multithreading here) are
> >> executed when the domain model components change. Listeners update the
> >> Lucene index:
> >>
> >> Document doc = createDocumentForComponent(domainModelComponent);
> >> indexWriter.updateDocument(docTerm, doc);
> >> indexWriter.commit();
> >>
> >> Then I perform searching by a query that contains recently added
> changes.
> >> Most of the time tests work perfect, but sometimes they fail (especially
> >> in
> >> automated builds).
> >>
> >> I've tried to acquire an IndexSearcher by different ways: create a new
> >> searcher on the same Directory or obtain it via SearcherManager.
> >>
> >> Is there a way to made recent index changes available to index searcher
> >> with 100% confidence?
> >>
> >
> >
>

Re: Synchronous Lucene index update tests occasionally fail

Posted by Baskakov Daniel <gd...@gmail.com>.

Thank you Mike.

Commit is performed after each indexing op in unit tests only:

  public void commitNow() throws IOException
  {
    if (getIndexWriter().hasUncommittedChanges())
    {
      getIndexWriter().commit();
    }
  }

In production environment I have a timer that performs commit periodically
if required.

I do reopen near-real-time IR before every test search (thanks to your
blog!):

  private IndexSearcher acquireIndexSearcher() throws IOException
  {
    if (searcherManager == null)
    {
      searcherManager = new SearcherManager(getIndexWriter(), true, null);
    }
    searcherManager.maybeRefreshBlocking();
    return searcherManager.acquire();
  }

But the problem is still there.

Daniel.

чт, 23 июн. 2016 г. в 17:19, Michael McCandless <lu...@mikemccandless.com>:

> You must reopen your IndexReader to see recent changes to the index.
>
> But, IW.commit after each indexing op is very costly.
>
> It's much better to get near-real-time readers, e.g. from a
> SearcherManager that you pass your IW instance too, after each set of
> changes that you now need to search.
>
> As long as you call SearcherManager.maybeRefreshBlocking after changes to
> the IW, the resulting reopened reader will reflect your index changes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jun 23, 2016 at 7:47 AM, Baskakov Daniel <gd...@gmail.com> wrote:
>
>> Originally i've posted the question at stackoverflow.com but without any
>> reply. So I hope someone can help me in the official list.
>>
>> I'm testing that dynamic changes of the domain model reflects at the
>> Lucene
>> index. Special event listeners (synchronous, no multithreading here) are
>> executed when the domain model components change. Listeners update the
>> Lucene index:
>>
>> Document doc = createDocumentForComponent(domainModelComponent);
>> indexWriter.updateDocument(docTerm, doc);
>> indexWriter.commit();
>>
>> Then I perform searching by a query that contains recently added changes.
>> Most of the time tests work perfect, but sometimes they fail (especially
>> in
>> automated builds).
>>
>> I've tried to acquire an IndexSearcher by different ways: create a new
>> searcher on the same Directory or obtain it via SearcherManager.
>>
>> Is there a way to made recent index changes available to index searcher
>> with 100% confidence?
>>
>
>

Re: Preprocess input text before tokenizing

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.

Hi Jaime,

Please see o.a.l.analysis.custom.CustomAnalyzer.builder() to create custom analyzers using a builder-style API.

Ahmet

On Friday, June 24, 2016 10:54 AM, Jaime <j....@estructure.es> wrote:
Thank you very much, that seems to solve my issue.

However, I find this a little cumbersome. I need to filter the text 
before any tokenizing takes place, so I have to implement a filtered 
version of every analyzer I'm using (StandardAnalyzer and 
SpanishAnalyzer and a custom analyzer right now).

If I need to support another analyzer in the future (a very plausible 
possibility) I will need to create another version of that analyzer. 
Whenever any of those analyzer is changed, I will need to manually apply 
the changes.

Isn't there a better way to do this?

El 23/06/2016 a las 20:28, Ahmet Arslan escribió:
> Hi,
>
> Zero or more CharFilter(s) is the way to manipulate text before the tokenizer.
> I think init reader is the method you want to plug char filters.
> https://github.com/apache/lucene-solr/blob/master/lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java
>
> Ahmet
>
> On Thursday, June 23, 2016 6:47 PM, Jaime <j....@estructure.es> wrote:
> Hello,
>
> I want to change the input text before tokenizing. I think I just need
> to use some characters as word separators, and maybe remove some others
> completely.
>
> I was planning to use MappingCharFilterFactory to replace some chars
> with " " and others with "", but I feel like I'm not in the right track.
>
> First, I've implemented a custom analyzer to use my custom tokenizer. My
> idea was to inherit from StandardTokenizer and, in setReader, calling
> MappingCharFilterFactory.create(reader) from within.
>
> However, setReader is final, so I can't override it.
>
> Is there a better way to do this?
> In any case, how should I use MappingCharFilter in case I really needed it?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

-- 
Jaime Pardos
ESTRUCTURE MEDIA SYSTEMS, S.L.
Avda. de Madrid nº 120 nave 10, 28500, Arganda del Rey, MADRID,
j.pardos@estructure.es
910088429

AVISO LEGAL: Este mensaje y sus archivos adjuntos van dirigidos exclusivamente a su destinatario, pudiendo contener información confidencial sometida a secreto confidencial. No está permitida su reproducción o distribución sin la autorización expresa de ESTRUCTURE MEDIA SYSTEMS, S.L.. Si usted no es el destinatario final por favor elimínelo e infórmenos por esta vía. De acuerdo con lo establecido en la Ley Orgánica 15/1999, de 13 de diciembre, de Protección de Datos de Carácter Personal (LOPD), le informamos que sus datos están incorporados en un fichero del que es titular ESTRUCTURE MEDIA SYSTEMS, S.L. con la finalidad de realizar la gestión administrativa, contable, y fiscal, así como enviarle comunicaciones comerciales sobre nuestros productos y/o servicios. Asimismo, le informamos de la posibilidad de ejercer los derechos de acceso, rectificación, cancelación y oposición de sus datos en el domicilio de ESTRUCTURE MEDIA SYSTEMS, S.L., sito en Avda. de Madrid nº 120 nave 10, 28500, Arganda del Rey, MADRID, o a la dirección de correo electrónico info@estructure.es.

This message and its attachments are intended solely for the addressee and may contain confidential information submitted to confidential secret. It is not allowed its reproduction or distribution without the express permission of ESTRUCTURE MEDIA SYSTEMS, S.L. .. If you are not the intended recipient please delete it and inform us in this way. According to the provisions of Law 15/1999, of December 13, Protection of Personal Data (LOPD), we inform you that your data is incorporated into a file which is owned by ESTRUCTURE MEDIA SYSTEMS, S.L. in order to perform administrative, accounting and fiscal management, as well as send you communications about our products and / or services. Also we advised of the possibility of exercising rights of access, rectification, cancellation and opposition of their data at the home of ESTRUCTURE MEDIA SYSTEMS, SL, located in Avda. De Madrid # 120 ship 10 28500, Arganda del Rey, Madrid , or email address info@estructure.es.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Preprocess input text before tokenizing

Posted by Jaime <j....@estructure.es>.

Thank you very much, that seems to solve my issue.

However, I find this a little cumbersome. I need to filter the text 
before any tokenizing takes place, so I have to implement a filtered 
version of every analyzer I'm using (StandardAnalyzer and 
SpanishAnalyzer and a custom analyzer right now).

If I need to support another analyzer in the future (a very plausible 
possibility) I will need to create another version of that analyzer. 
Whenever any of those analyzer is changed, I will need to manually apply 
the changes.

Isn't there a better way to do this?

El 23/06/2016 a las 20:28, Ahmet Arslan escribió:
> Hi,
>
> Zero or more CharFilter(s) is the way to manipulate text before the tokenizer.
> I think init reader is the method you want to plug char filters.
> https://github.com/apache/lucene-solr/blob/master/lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java
>
> Ahmet
>
> On Thursday, June 23, 2016 6:47 PM, Jaime <j....@estructure.es> wrote:
> Hello,
>
> I want to change the input text before tokenizing. I think I just need
> to use some characters as word separators, and maybe remove some others
> completely.
>
> I was planning to use MappingCharFilterFactory to replace some chars
> with " " and others with "", but I feel like I'm not in the right track.
>
> First, I've implemented a custom analyzer to use my custom tokenizer. My
> idea was to inherit from StandardTokenizer and, in setReader, calling
> MappingCharFilterFactory.create(reader) from within.
>
> However, setReader is final, so I can't override it.
>
> Is there a better way to do this?
> In any case, how should I use MappingCharFilter in case I really needed it?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

-- 
Jaime Pardos
ESTRUCTURE MEDIA SYSTEMS, S.L.
Avda. de Madrid nº 120 nave 10, 28500, Arganda del Rey, MADRID,
j.pardos@estructure.es
910088429

AVISO LEGAL: Este mensaje y sus archivos adjuntos van dirigidos exclusivamente a su destinatario, pudiendo contener información confidencial sometida a secreto confidencial. No está permitida su reproducción o distribución sin la autorización expresa de ESTRUCTURE MEDIA SYSTEMS, S.L.. Si usted no es el destinatario final por favor elimínelo e infórmenos por esta vía. De acuerdo con lo establecido en la Ley Orgánica 15/1999, de 13 de diciembre, de Protección de Datos de Carácter Personal (LOPD), le informamos que sus datos están incorporados en un fichero del que es titular ESTRUCTURE MEDIA SYSTEMS, S.L. con la finalidad de realizar la gestión administrativa, contable, y fiscal, así como enviarle comunicaciones comerciales sobre nuestros productos y/o servicios. Asimismo, le informamos de la posibilidad de ejercer los derechos de acceso, rectificación, cancelación y oposición de sus datos en el domicilio de ESTRUCTURE MEDIA SYSTEMS, S.L., sito en Avda. de Madrid nº 120 nave 10, 28500, Arganda del Rey, MADRID, o a la dirección de correo electrónico info@estructure.es.

This message and its attachments are intended solely for the addressee and may contain confidential information submitted to confidential secret. It is not allowed its reproduction or distribution without the express permission of ESTRUCTURE MEDIA SYSTEMS, S.L. .. If you are not the intended recipient please delete it and inform us in this way. According to the provisions of Law 15/1999, of December 13, Protection of Personal Data (LOPD), we inform you that your data is incorporated into a file which is owned by ESTRUCTURE MEDIA SYSTEMS, S.L. in order to perform administrative, accounting and fiscal management, as well as send you communications about our products and / or services. Also we advised of the possibility of exercising rights of access, rectification, cancellation and opposition of their data at the home of ESTRUCTURE MEDIA SYSTEMS, SL, located in Avda. De Madrid # 120 ship 10 28500, Arganda del Rey, Madrid , or email address info@estructure.es.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Preprocess input text before tokenizing

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.

Hi,

Zero or more CharFilter(s) is the way to manipulate text before the tokenizer.
I think init reader is the method you want to plug char filters.
https://github.com/apache/lucene-solr/blob/master/lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java

Ahmet

On Thursday, June 23, 2016 6:47 PM, Jaime <j....@estructure.es> wrote:
Hello,

I want to change the input text before tokenizing. I think I just need 
to use some characters as word separators, and maybe remove some others 
completely.

I was planning to use MappingCharFilterFactory to replace some chars 
with " " and others with "", but I feel like I'm not in the right track.

First, I've implemented a custom analyzer to use my custom tokenizer. My 
idea was to inherit from StandardTokenizer and, in setReader, calling 
MappingCharFilterFactory.create(reader) from within.

However, setReader is final, so I can't override it.

Is there a better way to do this?
In any case, how should I use MappingCharFilter in case I really needed it?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Preprocess input text before tokenizing

Posted by Jaime <j....@estructure.es>.

Hello,

I want to change the input text before tokenizing. I think I just need 
to use some characters as word separators, and maybe remove some others 
completely.

I was planning to use MappingCharFilterFactory to replace some chars 
with " " and others with "", but I feel like I'm not in the right track.

First, I've implemented a custom analyzer to use my custom tokenizer. My 
idea was to inherit from StandardTokenizer and, in setReader, calling 
MappingCharFilterFactory.create(reader) from within.

However, setReader is final, so I can't override it.

Is there a better way to do this?
In any case, how should I use MappingCharFilter in case I really needed it?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Synchronous Lucene index update tests occasionally fail

Posted by Michael McCandless <lu...@mikemccandless.com>.

You must reopen your IndexReader to see recent changes to the index.

But, IW.commit after each indexing op is very costly.

It's much better to get near-real-time readers, e.g. from a SearcherManager
that you pass your IW instance too, after each set of changes that you now
need to search.

As long as you call SearcherManager.maybeRefreshBlocking after changes to
the IW, the resulting reopened reader will reflect your index changes.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jun 23, 2016 at 7:47 AM, Baskakov Daniel <gd...@gmail.com> wrote:

> Originally i've posted the question at stackoverflow.com but without any
> reply. So I hope someone can help me in the official list.
>
> I'm testing that dynamic changes of the domain model reflects at the Lucene
> index. Special event listeners (synchronous, no multithreading here) are
> executed when the domain model components change. Listeners update the
> Lucene index:
>
> Document doc = createDocumentForComponent(domainModelComponent);
> indexWriter.updateDocument(docTerm, doc);
> indexWriter.commit();
>
> Then I perform searching by a query that contains recently added changes.
> Most of the time tests work perfect, but sometimes they fail (especially in
> automated builds).
>
> I've tried to acquire an IndexSearcher by different ways: create a new
> searcher on the same Directory or obtain it via SearcherManager.
>
> Is there a way to made recent index changes available to index searcher
> with 100% confidence?
>