You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Clegg <an...@gmail.com> on 2010/08/21 14:26:59 UTC

Duplicate docs when merging indices?

Hi,

First off, sorry about previous accidental post, had a sausage-fingered
moment.

Anyway...

If I merge two indices with CoreAdmin, as detailed here...

http://wiki.apache.org/solr/MergingSolrIndexes

What happens to duplicate documents between the two? i.e. those that have
the same unique key.

What decides which copy takes precedence? Will documents get indexed
multiple times, or will the second one just get skipped?

Also, does the behaviour vary between CoreAdmin and IndexMergeTool? This
thread from a couple of years ago:

http://web.archiveorange.com/archive/v/AAfXfQIiBU7vyQBt6qdk

suggests that IndexMergeTool can result in dupes, unless I'm
misinterpreting.

Thanks!

Andrew.



-- 
View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-docs-when-merging-indices-tp1262043p1262043.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Duplicate docs when merging indices?

Posted by Gora Mohanty <go...@srijan.in>.
On Sat, 21 Aug 2010 05:26:59 -0700 (PDT)
Andrew Clegg <an...@gmail.com> wrote:
[...]
> If I merge two indices with CoreAdmin, as detailed here...
> 
> http://wiki.apache.org/solr/MergingSolrIndexes
> 
> What happens to duplicate documents between the two? i.e. those
> that have the same unique key.
> 
> What decides which copy takes precedence? Will documents get
> indexed multiple times, or will the second one just get skipped?
[...]

Have not used CoreAdmin, but with MergeTool, know from personal
experience that there would be duplicates created. I imagine
that the same is the case for CoreAdmin as Solr/Lucene allows
duplicate IDs.

Regards,
Gora

Re: Duplicate docs when merging indices?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sat, Aug 21, 2010 at 5:56 PM, Andrew Clegg <an...@gmail.com>wrote:

>
> Hi,
>
> First off, sorry about previous accidental post, had a sausage-fingered
> moment.
>
> Anyway...
>
> If I merge two indices with CoreAdmin, as detailed here...
>
> http://wiki.apache.org/solr/MergingSolrIndexes
>
> What happens to duplicate documents between the two? i.e. those that have
> the same unique key.
>
> What decides which copy takes precedence? Will documents get indexed
> multiple times, or will the second one just get skipped?
>
> Also, does the behaviour vary between CoreAdmin and IndexMergeTool? This
> thread from a couple of years ago:
>
> http://web.archiveorange.com/archive/v/AAfXfQIiBU7vyQBt6qdk
>
> suggests that IndexMergeTool can result in dupes, unless I'm
> misinterpreting.
>
>
Yes, it will result in duplicate docs. CoreAdmin and IndexMergeTool both use
the IndexWriter#addIndexes method so the behavior will be same.

-- 
Regards,
Shalin Shekhar Mangar.