You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pablo Guerrero <si...@gmail.com> on 2013/03/22 14:00:41 UTC

Lucene reliability as primary store

Hi all,

I'm evaluating using Lucene for some data that would not be stored anywhere
else, and I'm concerned about reliabilty. Having a database storing the
data in addition to Lucene would be a problem, and I want to know if Lucene
is reliable enough.

Reading this article,
http://blog.mikemccandless.com/2012/03/transactional-lucene.html I think
that all committed data would be safe (at least as safe as in, for example,
MySQL on the same machine) in the event of JVM crash or system crash. Is
that true?

As an example, if I have an index with some data already committed, A, and
the JVM crashes during a commit of data B, could the index be corrupted, or
will just ignore B? If it's corrupted, will CheckIndex be able to recover,
at least all data in A? Will it be also true in the case of a power
shutdown, where the OS buffers are lost, but there is no disk corruption?

Thank you in advance,
Pablo

Re: Lucene reliability as primary store

Posted by Pablo Guerrero <si...@gmail.com>.
Thanks Simon,

I'll see if I can implement some kind of transaction log to avoid
committing every change.

Cheers,
Pablo


On Fri, Mar 22, 2013 at 8:13 PM, Simon Willnauer
<si...@gmail.com>wrote:

> On Fri, Mar 22, 2013 at 2:00 PM, Pablo Guerrero <si...@gmail.com> wrote:
> > Hi all,
> >
> > I'm evaluating using Lucene for some data that would not be stored
> anywhere
> > else, and I'm concerned about reliabilty. Having a database storing the
> > data in addition to Lucene would be a problem, and I want to know if
> Lucene
> > is reliable enough.
> >
> > Reading this article,
> > http://blog.mikemccandless.com/2012/03/transactional-lucene.html I think
> > that all committed data would be safe (at least as safe as in, for
> example,
> > MySQL on the same machine) in the event of JVM crash or system crash. Is
> > that true?
>
> yes that is true. Yet, a commit in Lucene is still pretty expensive,
> apps like ElasticSearch or Solr us a Journal / TranactionLog to
> overcome this.
>
> >
> > As an example, if I have an index with some data already committed, A,
> and
> > the JVM crashes during a commit of data B, could the index be corrupted,
> or
> > will just ignore B? If it's corrupted, will CheckIndex be able to
> recover,
> > at least all data in A? Will it be also true in the case of a power
> > shutdown, where the OS buffers are lost, but there is no disk corruption?
>
> unless there is a bug, the index will not be corrupted and B is
> ignored / lost. CheckIndex will not be able to recover your lost docs
> it will only delete broken segments if you ask it to do so. Once you
> commit and lucene returned successfully you should also survice a
> power outage. If you disk is broken then your index will likely be
> broken too.
>
> simon
> >
> > Thank you in advance,
> > Pablo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene reliability as primary store

Posted by Simon Willnauer <si...@gmail.com>.
On Fri, Mar 22, 2013 at 2:00 PM, Pablo Guerrero <si...@gmail.com> wrote:
> Hi all,
>
> I'm evaluating using Lucene for some data that would not be stored anywhere
> else, and I'm concerned about reliabilty. Having a database storing the
> data in addition to Lucene would be a problem, and I want to know if Lucene
> is reliable enough.
>
> Reading this article,
> http://blog.mikemccandless.com/2012/03/transactional-lucene.html I think
> that all committed data would be safe (at least as safe as in, for example,
> MySQL on the same machine) in the event of JVM crash or system crash. Is
> that true?

yes that is true. Yet, a commit in Lucene is still pretty expensive,
apps like ElasticSearch or Solr us a Journal / TranactionLog to
overcome this.

>
> As an example, if I have an index with some data already committed, A, and
> the JVM crashes during a commit of data B, could the index be corrupted, or
> will just ignore B? If it's corrupted, will CheckIndex be able to recover,
> at least all data in A? Will it be also true in the case of a power
> shutdown, where the OS buffers are lost, but there is no disk corruption?

unless there is a bug, the index will not be corrupted and B is
ignored / lost. CheckIndex will not be able to recover your lost docs
it will only delete broken segments if you ask it to do so. Once you
commit and lucene returned successfully you should also survice a
power outage. If you disk is broken then your index will likely be
broken too.

simon
>
> Thank you in advance,
> Pablo

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org