You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Luis Cappa Banda <lu...@gmail.com> on 2011/10/05 19:21:52 UTC

Backup with lukeall XMLExporter.

Hello.

I´ve been looking for information trying to find an easy way to do index
backups with Solr and I´ve readed that lukeall has an application called
XMLExporter that creates a XML dump from a lucene index with it´s complete
information. I´ve got some questions about this alternative:

*1. *Do it also contains the information from fields configured as
stored=false?
*2. *Can I load with curl this XML file generated to reindex? If not, any
other solution?

Thank you very much.

Re: Backup with lukeall XMLExporter.

Posted by Erick Erickson <er...@gmail.com>.

You really only have a few options:
1> set up a Solr instance on some backup machine
     and either manually (i.e. by issuing an HTTP
     request) causing a replication to occur when
     you want (see:
     http://wiki.apache.org/solr/SolrReplication#HTTP_API)
2> suspend indexing and just copy your data/index
     directory somewhere (actually, I'd copy the
     entire data index and subdirectories).
3> Keep the original input around somewhere so you
     can re-index from scratch. Note that this is
     probably better than storing all your fields because
     in the unlikely event your index does get
     corrupted, you have *all* the original data
     around, and if you wanted to, for instance,
     change your schema you could make the
     changes from the original data which would
     be more robust.

Best
Erick

P.S. You are *too* using Lucene <G>. The search
engine that comes with Solr is exactly a corresponding
release of Lucene, you just don't use the Lucene
API directly, but Solr does.

On Wed, Oct 5, 2011 at 2:57 PM, Luis Cappa Banda <lu...@gmail.com> wrote:
> Hello, Andrzej.
>
> First of all thanks for your help. The thing is that I´m not using Lucene:
> I´m using Solr to index (well, I know that it envolves Lucene). I know about
> Solr replication, but the index is being modify in real time includying new
> documents with new petitions incoming. In resume, from the batch indexation
> we load a Solr index, but then the index is updated with new documents.
> That´s the reason that we need a daily backup to prevent corruption. Any
> other solution? I thought about setting all fields to stored=true and to
> develop an application with Solrj that reindexes, but I don´t like
> configuring all the fields as stored=true...
>
> Thanks.
>

Re: Backup with lukeall XMLExporter.

Posted by Luis Cappa Banda <lu...@gmail.com>.

Hello, Andrzej.

First of all thanks for your help. The thing is that I´m not using Lucene:
I´m using Solr to index (well, I know that it envolves Lucene). I know about
Solr replication, but the index is being modify in real time includying new
documents with new petitions incoming. In resume, from the batch indexation
we load a Solr index, but then the index is updated with new documents.
That´s the reason that we need a daily backup to prevent corruption. Any
other solution? I thought about setting all fields to stored=true and to
develop an application with Solrj that reindexes, but I don´t like
configuring all the fields as stored=true...

Thanks.

Re: Backup with lukeall XMLExporter.

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 05/10/2011 19:21, Luis Cappa Banda wrote:
> Hello.
>
> I´ve been looking for information trying to find an easy way to do index
> backups with Solr and I´ve readed that lukeall has an application called
> XMLExporter that creates a XML dump from a lucene index with it´s complete
> information. I´ve got some questions about this alternative:
>
> *1. *Do it also contains the information from fields configured as
> stored=false?
> *2. *Can I load with curl this XML file generated to reindex? If not, any
> other solution?
>
> Thank you very much.
>

It does not provide a complete copy of the index information, it only 
dumps general information about the index plus the stored fields of 
documents. Non-stored fields are not available. There is no counterpart 
tool to take this XML dump and turn it into an index.

I'm working on a tool like what you had in mind, and I will be 
presenting results of this work at the Eurocon in Barcelona. However, 
it's still very much incomplete, and it depends on cutting edge features 
(LUCENE-2621).

In any case, if you're using Lucene then you can safely take a backup of 
the index if it's open readonly. With Solr you can use the replication 
mechanism to pull in a copy of the index from a running Solr instance.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com