You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Norskog, Lance" <la...@divvio.com> on 2008/04/05 00:26:29 UTC

Merging Solr index

Hi-
 
http://wiki.apache.org/solr/MergingSolrIndexes recommends using the
Lucene contributed app IndexMergeTool to merge two Solr indexes. What
happens if both indexes have records with the same unique key? Will they
both go into the new index?
 
Is the implementation of unique IDs in the Solr java or in Lucene? If it
is in Solr, how would I hackup a Solr IndexMergeTool?
 
Cheers,
 
Lance Norskog
 

RE: Merging Solr index

Posted by Chris Hostetter <ho...@fucit.org>.
: I have learned Solr as a power user and written a couple of simple
: filters. I'm not a Lucene heavy. Where is this in Lucene?  Is it the
: default? I don't remember Lucene having the notion of a unique id
: (primary key).

I can't answer that question (because Yonik's answer suprised me too) but 
as for this one...

: In this merge code, with the latest Lucene 2.3, will the duplicates in
: solr/data1 override the records in solr/data0? Or the other way around?

neither.  duplicate overwritting is done when adding individual documents; 
when merging two indexes this logic doesn't come into play.

The easiest way i can think of to deal with this would be:
  1) merge the indexes (using the existing IndexMerger)
  2) iterate over a TermEnum for the uniqueKey field.
  3) if any term has a docFreq > 1, delete all but the lowest (or 
     highest) docid (depending on what order you merged the indexes in) 

BTW: Would you mind updating that wiki page with some more details based 
on your experience once you get it working?


-Hoss

Re: Merging Solr index

Posted by Yonik Seeley <yo...@apache.org>.
On Sat, Apr 5, 2008 at 6:27 PM, Norskog, Lance <la...@divvio.com> wrote:
> Where is this in Lucene?  Is it the
>  default? I don't remember Lucene having the notion of a unique id
>  (primary key).

It hasn't been around too long.
IndexWriter.updateDocument(Term term, Document doc)

>  In this merge code, with the latest Lucene 2.3, will the duplicates in
>  solr/data1 override the records in solr/data0? Or the other way around?

Neither.  Duplicates will not be removed in either case.

-Yonik

RE: Merging Solr index

Posted by "Norskog, Lance" <la...@divvio.com>.
Thanks!

I have learned Solr as a power user and written a couple of simple
filters. I'm not a Lucene heavy. Where is this in Lucene?  Is it the
default? I don't remember Lucene having the notion of a unique id
(primary key).

In this merge code, with the latest Lucene 2.3, will the duplicates in
solr/data1 override the records in solr/data0? Or the other way around?

How do I add the new Lucene implementation?

            try {
                  IndexWriter writer = new IndexWriter(new
File("solr/data0/index"),
                              new StandardAnalyzer(), false);
                  Directory[] dirs = new
Directory[]{FSDirectory.getDirectory(new File("solr/data1/index"))};
                  System.out.println(writer);
                  writer.addIndexes(dirs);
                  writer.close();
            } catch (Exception e) {
                  e.printStackTrace();
            }

Thanks,

Lance Norskog


-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Saturday, April 05, 2008 2:37 PM
To: solr-user@lucene.apache.org
Cc: Norskog, Lance
Subject: Re: Merging Solr index

On Fri, Apr 4, 2008 at 6:26 PM, Norskog, Lance <la...@divvio.com> wrote:
>  http://wiki.apache.org/solr/MergingSolrIndexes recommends using the  
> Lucene contributed app IndexMergeTool to merge two Solr indexes. What

> happens if both indexes have records with the same unique key? Will 
> they  both go into the new index?

Yes.

>  Is the implementation of unique IDs in the Solr java or in Lucene?

Both.  It was originally just in Solr, but Lucene now has an
implementation.
Neither implementation will prevent this as both just remember documents
(in memory) that were added and then periodically delete older documents
with the same id.

-Yonik

Re: Merging Solr index

Posted by Yonik Seeley <yo...@apache.org>.
On Fri, Apr 4, 2008 at 6:26 PM, Norskog, Lance <la...@divvio.com> wrote:
>  http://wiki.apache.org/solr/MergingSolrIndexes recommends using the
>  Lucene contributed app IndexMergeTool to merge two Solr indexes. What
>  happens if both indexes have records with the same unique key? Will they
>  both go into the new index?

Yes.

>  Is the implementation of unique IDs in the Solr java or in Lucene?

Both.  It was originally just in Solr, but Lucene now has an implementation.
Neither implementation will prevent this as both just remember
documents (in memory) that were added and then periodically delete
older documents with the same id.

-Yonik