You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eduardo Costa Lopes <ed...@serpro.gov.br> on 2018/07/11 14:46:44 UTC

Recreating index lucene without stopping client applications

Hello, 

I have a Jboss application querying a lucene index to get some customer info. Sometimes the index are recreated while the application is running. Basically, the old index is erased and a new one is created. In the application side we have a scheduler calling org.apache.lucene.search.SearcherManager..maybeRefresh(), in order to get a new connection to the index. The issue is: today we have updated the index and looking for a certain name our command-line returns 4955 hits, but in the web app we got 4058 hits (three more). The correct hit number only is show if restart the jboss. I'd like to know how can we recreate the lucene index without need to restart the applications. 

Thanks in advance, 

Eduardo Lopes. 




-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada exclusivamente a seu destinatário e pode conter informações confidenciais, protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."

"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a government company established under Brazilian law (5.615/70) -- is directed exclusively to its addressee and may contain confidential data, protected under professional secrecy rules. Its unauthorized use is illegal and may subject the transgressor to the law's penalties. If you're not the addressee, please send it back, elucidating the failure."

Re: Recreating index lucene without stopping client applications

Posted by Michael McCandless <lu...@mikemccandless.com>.
If you use IndexWriter.deleteAll, and not any of the other delete by Query,
Term methods, it should be quite efficient to delete, as IndexWriter just
drops all segments.

That API is also transactional, so you could call IW.deleteAll, proceed to
reindex all your documents, and if somehow that crashes before finishing,
your index will still reflect the old index with nothing deleted or
updated.  Only once you successfully commit will the new index become
visible to maybeRefresh() calls on a non-NRT reader.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 17, 2018 at 5:15 PM, Michael Sokolov <ms...@gmail.com> wrote:

> If you create a completely new index, rather than applying updates to an
> existing index, you will not be able to see that by calling maybeRefresh(),
> I think, since that is looking for updates to an existing index.
> Conceivably you could open a writer on the existing index, delete all of
> its documents, and then write new ones and commit. After that, your refresh
> call would see the updates. But I wouldn't recommend this since it might be
> inefficient to do all those deletions. Instead I would suggest creating a
> new index directory, and having some process that watches for a new
> directory being created. Then when it sees that, it could open a new
> searcher using that directory, and replace your existing searcher. In other
> words, implement the refresh yourself, since you have taken over the
> process of writing new indexes outside of what Lucene manages. Another
> possibility would be to maintain a timestamp on your documents, write all
> your new documents, and then query-and-delete any documents with old
> timestamps.  But the key point here is that you can't just create a new
> index and expect your reader to know about it just because you stuck it in
> the same file system directory where the old one was.
>
> On Wed, Jul 11, 2018 at 11:46 AM Eduardo Costa Lopes <
> eduardo-costa.lopes@serpro.gov.br> wrote:
>
> > Hi Marco,
> >
> > Basically, the content of lucene index directory is deleted and after,
> the
> > index is recreated (under the same directory). Months ago, I've
> researched
> > how to "refresh" the lucene access to get the newest data withou
> restarting
> > the wep applications and, in the 6.1.0 version, it is available the class
> > SearchManager, which according to the documentation, should be called its
> > method maybeRefresh() periodically to reopen the index. Our "reopen
> > scheduler" runs hourly and even being executed with success it seems the
> > data wasn't the newest.
> >
> > Thanks.
> >
> >
> > ==================================
> > Eduardo Costa Lopes
> > SERPRO - SUPDE/DEPAE/DE009
> >
> > e-mail: eduardo-costa.lopes@serpro.gov.br
> > telefone: (51) 2129 - 1180
> >
> > ----- Mensagem original -----
> > De: "Marco Reis" <ma...@gmail.com>
> > Para: "java-user" <ja...@lucene.apache.org>
> > Enviadas: Quarta-feira, 11 de julho de 2018 12:06:18
> > Assunto: Re: Recreating index lucene without stopping client applications
> >
> > Hi Eduardo,
> >
> > It's not clear the index recreation process, but I think you have two
> > different SearcherManagers, one for the app and a different one for the
> > command line. At some point, one of them could see the document
> exclusion,
> > and the JBoss doen't. Maybe reopen the index directory could help.
> >
> >
> >
> >
> > On Wed, Jul 11, 2018 at 11:46 AM Eduardo Costa Lopes <
> > eduardo-costa.lopes@serpro.gov.br> wrote:
> >
> > > Hello,
> > >
> > > I have a Jboss application querying a lucene index to get some customer
> > > info. Sometimes the index are recreated while the application is
> running.
> > > Basically, the old index is erased and a new one is created. In the
> > > application side we have a scheduler calling
> > > org.apache.lucene.search.SearcherManager..maybeRefresh(), in order to
> > get a
> > > new connection to the index. The issue is: today we have updated the
> > index
> > > and looking for a certain name our command-line returns 4955 hits, but
> in
> > > the web app we got 4058 hits (three more). The correct hit number only
> is
> > > show if restart the jboss. I'd like to know how can we recreate the
> > lucene
> > > index without need to restart the applications.
> > >
> > > Thanks in advance,
> > >
> > > Eduardo Lopes.
> > >
> > >
> > >
> > >
> > > -
> > >
> > >
> > > "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO),
> > > empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é
> > > enviada exclusivamente a seu destinatário e pode conter informações
> > > confidenciais, protegidas por sigilo profissional. Sua utilização
> > > desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a
> > > recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente,
> > > esclarecendo o equívoco."
> > >
> > > "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO)
> --
> > a
> > > government company established under Brazilian law (5.615/70) -- is
> > > directed exclusively to its addressee and may contain confidential
> data,
> > > protected under professional secrecy rules. Its unauthorized use is
> > illegal
> > > and may subject the transgressor to the law's penalties. If you're not
> > the
> > > addressee, please send it back, elucidating the failure."
> > >
> > --
> > Marco Reis
> > Software Engineer
> > http://marcoreis.net
> > https://github.com/masreis
> > +55 61 9 81194620
> >
> > -
> >
> >
> > "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO),
> > empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é
> > enviada exclusivamente a seu destinatário e pode conter informações
> > confidenciais, protegidas por sigilo profissional. Sua utilização
> > desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a
> > recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente,
> > esclarecendo o equívoco."
> >
> > "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) --
> a
> > government company established under Brazilian law (5.615/70) -- is
> > directed exclusively to its addressee and may contain confidential data,
> > protected under professional secrecy rules. Its unauthorized use is
> illegal
> > and may subject the transgressor to the law's penalties. If you're not
> the
> > addressee, please send it back, elucidating the failure."
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: Recreating index lucene without stopping client applications

Posted by Michael Sokolov <ms...@gmail.com>.
If you create a completely new index, rather than applying updates to an
existing index, you will not be able to see that by calling maybeRefresh(),
I think, since that is looking for updates to an existing index.
Conceivably you could open a writer on the existing index, delete all of
its documents, and then write new ones and commit. After that, your refresh
call would see the updates. But I wouldn't recommend this since it might be
inefficient to do all those deletions. Instead I would suggest creating a
new index directory, and having some process that watches for a new
directory being created. Then when it sees that, it could open a new
searcher using that directory, and replace your existing searcher. In other
words, implement the refresh yourself, since you have taken over the
process of writing new indexes outside of what Lucene manages. Another
possibility would be to maintain a timestamp on your documents, write all
your new documents, and then query-and-delete any documents with old
timestamps.  But the key point here is that you can't just create a new
index and expect your reader to know about it just because you stuck it in
the same file system directory where the old one was.

On Wed, Jul 11, 2018 at 11:46 AM Eduardo Costa Lopes <
eduardo-costa.lopes@serpro.gov.br> wrote:

> Hi Marco,
>
> Basically, the content of lucene index directory is deleted and after, the
> index is recreated (under the same directory). Months ago, I've researched
> how to "refresh" the lucene access to get the newest data withou restarting
> the wep applications and, in the 6.1.0 version, it is available the class
> SearchManager, which according to the documentation, should be called its
> method maybeRefresh() periodically to reopen the index. Our "reopen
> scheduler" runs hourly and even being executed with success it seems the
> data wasn't the newest.
>
> Thanks.
>
>
> ==================================
> Eduardo Costa Lopes
> SERPRO - SUPDE/DEPAE/DE009
>
> e-mail: eduardo-costa.lopes@serpro.gov.br
> telefone: (51) 2129 - 1180
>
> ----- Mensagem original -----
> De: "Marco Reis" <ma...@gmail.com>
> Para: "java-user" <ja...@lucene.apache.org>
> Enviadas: Quarta-feira, 11 de julho de 2018 12:06:18
> Assunto: Re: Recreating index lucene without stopping client applications
>
> Hi Eduardo,
>
> It's not clear the index recreation process, but I think you have two
> different SearcherManagers, one for the app and a different one for the
> command line. At some point, one of them could see the document exclusion,
> and the JBoss doen't. Maybe reopen the index directory could help.
>
>
>
>
> On Wed, Jul 11, 2018 at 11:46 AM Eduardo Costa Lopes <
> eduardo-costa.lopes@serpro.gov.br> wrote:
>
> > Hello,
> >
> > I have a Jboss application querying a lucene index to get some customer
> > info. Sometimes the index are recreated while the application is running.
> > Basically, the old index is erased and a new one is created. In the
> > application side we have a scheduler calling
> > org.apache.lucene.search.SearcherManager..maybeRefresh(), in order to
> get a
> > new connection to the index. The issue is: today we have updated the
> index
> > and looking for a certain name our command-line returns 4955 hits, but in
> > the web app we got 4058 hits (three more). The correct hit number only is
> > show if restart the jboss. I'd like to know how can we recreate the
> lucene
> > index without need to restart the applications.
> >
> > Thanks in advance,
> >
> > Eduardo Lopes.
> >
> >
> >
> >
> > -
> >
> >
> > "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO),
> > empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é
> > enviada exclusivamente a seu destinatário e pode conter informações
> > confidenciais, protegidas por sigilo profissional. Sua utilização
> > desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a
> > recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente,
> > esclarecendo o equívoco."
> >
> > "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) --
> a
> > government company established under Brazilian law (5.615/70) -- is
> > directed exclusively to its addressee and may contain confidential data,
> > protected under professional secrecy rules. Its unauthorized use is
> illegal
> > and may subject the transgressor to the law's penalties. If you're not
> the
> > addressee, please send it back, elucidating the failure."
> >
> --
> Marco Reis
> Software Engineer
> http://marcoreis.net
> https://github.com/masreis
> +55 61 9 81194620
>
> -
>
>
> "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO),
> empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é
> enviada exclusivamente a seu destinatário e pode conter informações
> confidenciais, protegidas por sigilo profissional. Sua utilização
> desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a
> recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente,
> esclarecendo o equívoco."
>
> "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a
> government company established under Brazilian law (5.615/70) -- is
> directed exclusively to its addressee and may contain confidential data,
> protected under professional secrecy rules. Its unauthorized use is illegal
> and may subject the transgressor to the law's penalties. If you're not the
> addressee, please send it back, elucidating the failure."
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Recreating index lucene without stopping client applications

Posted by Eduardo Costa Lopes <ed...@serpro.gov.br>.
Hi Marco,

Basically, the content of lucene index directory is deleted and after, the index is recreated (under the same directory). Months ago, I've researched how to "refresh" the lucene access to get the newest data withou restarting the wep applications and, in the 6.1.0 version, it is available the class SearchManager, which according to the documentation, should be called its method maybeRefresh() periodically to reopen the index. Our "reopen scheduler" runs hourly and even being executed with success it seems the data wasn't the newest.     

Thanks.


================================== 
Eduardo Costa Lopes 
SERPRO - SUPDE/DEPAE/DE009 

e-mail: eduardo-costa.lopes@serpro.gov.br 
telefone: (51) 2129 - 1180

----- Mensagem original -----
De: "Marco Reis" <ma...@gmail.com>
Para: "java-user" <ja...@lucene.apache.org>
Enviadas: Quarta-feira, 11 de julho de 2018 12:06:18
Assunto: Re: Recreating index lucene without stopping client applications

Hi Eduardo,

It's not clear the index recreation process, but I think you have two
different SearcherManagers, one for the app and a different one for the
command line. At some point, one of them could see the document exclusion,
and the JBoss doen't. Maybe reopen the index directory could help.




On Wed, Jul 11, 2018 at 11:46 AM Eduardo Costa Lopes <
eduardo-costa.lopes@serpro.gov.br> wrote:

> Hello,
>
> I have a Jboss application querying a lucene index to get some customer
> info. Sometimes the index are recreated while the application is running.
> Basically, the old index is erased and a new one is created. In the
> application side we have a scheduler calling
> org.apache.lucene.search.SearcherManager..maybeRefresh(), in order to get a
> new connection to the index. The issue is: today we have updated the index
> and looking for a certain name our command-line returns 4955 hits, but in
> the web app we got 4058 hits (three more). The correct hit number only is
> show if restart the jboss. I'd like to know how can we recreate the lucene
> index without need to restart the applications.
>
> Thanks in advance,
>
> Eduardo Lopes.
>
>
>
>
> -
>
>
> "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO),
> empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é
> enviada exclusivamente a seu destinatário e pode conter informações
> confidenciais, protegidas por sigilo profissional. Sua utilização
> desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a
> recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente,
> esclarecendo o equívoco."
>
> "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a
> government company established under Brazilian law (5.615/70) -- is
> directed exclusively to its addressee and may contain confidential data,
> protected under professional secrecy rules. Its unauthorized use is illegal
> and may subject the transgressor to the law's penalties. If you're not the
> addressee, please send it back, elucidating the failure."
>
-- 
Marco Reis
Software Engineer
http://marcoreis.net
https://github.com/masreis
+55 61 9 81194620

-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada exclusivamente a seu destinatário e pode conter informações confidenciais, protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."

"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a government company established under Brazilian law (5.615/70) -- is directed exclusively to its addressee and may contain confidential data, protected under professional secrecy rules. Its unauthorized use is illegal and may subject the transgressor to the law's penalties. If you're not the addressee, please send it back, elucidating the failure."

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Recreating index lucene without stopping client applications

Posted by Marco Reis <ma...@gmail.com>.
Hi Eduardo,

It's not clear the index recreation process, but I think you have two
different SearcherManagers, one for the app and a different one for the
command line. At some point, one of them could see the document exclusion,
and the JBoss doen't. Maybe reopen the index directory could help.




On Wed, Jul 11, 2018 at 11:46 AM Eduardo Costa Lopes <
eduardo-costa.lopes@serpro.gov.br> wrote:

> Hello,
>
> I have a Jboss application querying a lucene index to get some customer
> info. Sometimes the index are recreated while the application is running.
> Basically, the old index is erased and a new one is created. In the
> application side we have a scheduler calling
> org.apache.lucene.search.SearcherManager..maybeRefresh(), in order to get a
> new connection to the index. The issue is: today we have updated the index
> and looking for a certain name our command-line returns 4955 hits, but in
> the web app we got 4058 hits (three more). The correct hit number only is
> show if restart the jboss. I'd like to know how can we recreate the lucene
> index without need to restart the applications.
>
> Thanks in advance,
>
> Eduardo Lopes.
>
>
>
>
> -
>
>
> "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO),
> empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é
> enviada exclusivamente a seu destinatário e pode conter informações
> confidenciais, protegidas por sigilo profissional. Sua utilização
> desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a
> recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente,
> esclarecendo o equívoco."
>
> "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a
> government company established under Brazilian law (5.615/70) -- is
> directed exclusively to its addressee and may contain confidential data,
> protected under professional secrecy rules. Its unauthorized use is illegal
> and may subject the transgressor to the law's penalties. If you're not the
> addressee, please send it back, elucidating the failure."
>
-- 
Marco Reis
Software Engineer
http://marcoreis.net
https://github.com/masreis
+55 61 9 81194620