You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simon <si...@gallerysystems.com> on 2014/04/07 17:50:54 UTC

Duplicate Unique Key

Hi all,

I know someone has posted similar question before.  But my case is little
different as I don't have the schema set up issue mentioned in those posts
but still get duplicate records.

My unique key in schema is 

    <field name="id$"        type="string"   indexed="true"  stored="true" 
multiValued="false" required="true"/>

    
    <uniqueKey>id$</uniqueKey>



Search on Solr- admin UI:   id$:1

I got two documents
{
       "id$": "1",
       "_version_": 1464225014071951400,
        "_root_": 1
},
{
        "id$": "1",
        "_version_": 1464236728284872700,
        "_root_": 1
}

I use SolrJ api to add documents.  My understanding solr uniqueKey is like a
database primary key. I am wondering how could I end up with two documents
with same uniqueKey in the index.

Thanks,
Simon




--
View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-tp4129651.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Duplicate Unique Key

Posted by Erick Erickson <er...@gmail.com>.
Right, this is expected behavior. The real problem isn't data loss,
but how do you know which doc should "win"? Merging indexes is for a
rather narrowly-defined use-case, it was never intended to remove
duplicates.

Best,
Erick

On Tue, Apr 8, 2014 at 12:36 AM, Cihad Guzel <cg...@gmail.com> wrote:
> Hi.
>
> I have encountered a similar situation  when I tested solr merge index . (
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3CCAMrn6cOVWohxooRzZ8NmwYQUda2GW+gYD+edvC_b_kGT=f42pA@mail.gmail.com%3E
>  )
>
> I have had duplicates. But the duplicates are gone when I post same data
> for indexing. I think this was done in order to prevent data loss while
> merging index.
>
>
>
>
> 2014-04-07 23:04 GMT+03:00 Erick Erickson <er...@gmail.com>:
>
>> Oh my yes! I feel a great sense of relief every time an intermittent
>> problem becomes reproducible... The problem is not solved, but at
>> least I have a good feeling that once I don't see it any more it's
>> _really_ gone!
>>
>> One possibility is index merging, see:
>> https://wiki.apache.org/solr/MergingSolrIndexes. When you merge
>> indexes, there is no duplicate id checking performed, so you can well
>> have duplicates. That's a wild shot in the dark though.
>>
>> Best,
>> Erick
>>
>> On Mon, Apr 7, 2014 at 12:26 PM, Simon <si...@gallerysystems.com> wrote:
>> > Erick,
>> >
>> > It's indeed quite odd.  And after I trigger re-indexing all documents
>> (via
>> > the normal process of existing program). The duplication is gone.  It can
>> > not be reproduced easily.  But it did occur occasionally and that makes
>> it a
>> > frustrating task to troubleshoot.
>> >
>> > Thanks,
>> > Simon
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-tp4129651p4129701.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: Duplicate Unique Key

Posted by Cihad Guzel <cg...@gmail.com>.
Hi.

I have encountered a similar situation  when I tested solr merge index . (
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3CCAMrn6cOVWohxooRzZ8NmwYQUda2GW+gYD+edvC_b_kGT=f42pA@mail.gmail.com%3E
 )

I have had duplicates. But the duplicates are gone when I post same data
for indexing. I think this was done in order to prevent data loss while
merging index.




2014-04-07 23:04 GMT+03:00 Erick Erickson <er...@gmail.com>:

> Oh my yes! I feel a great sense of relief every time an intermittent
> problem becomes reproducible... The problem is not solved, but at
> least I have a good feeling that once I don't see it any more it's
> _really_ gone!
>
> One possibility is index merging, see:
> https://wiki.apache.org/solr/MergingSolrIndexes. When you merge
> indexes, there is no duplicate id checking performed, so you can well
> have duplicates. That's a wild shot in the dark though.
>
> Best,
> Erick
>
> On Mon, Apr 7, 2014 at 12:26 PM, Simon <si...@gallerysystems.com> wrote:
> > Erick,
> >
> > It's indeed quite odd.  And after I trigger re-indexing all documents
> (via
> > the normal process of existing program). The duplication is gone.  It can
> > not be reproduced easily.  But it did occur occasionally and that makes
> it a
> > frustrating task to troubleshoot.
> >
> > Thanks,
> > Simon
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-tp4129651p4129701.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Duplicate Unique Key

Posted by Simon <si...@gallerysystems.com>.
MergingIndex is not the case here as I am not doing that.  Even the issue is
gone for now, it is not a relief for me as I am not sure how to explain this
to others (peer, boss and user).  I am thinking of implement a watch dog to
check whenever the total Solr documents exceeds the number of items in
database, it will raise a flag so that I may do something before getting
complaints. 





--
View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-tp4129651p4129894.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Duplicate Unique Key

Posted by Erick Erickson <er...@gmail.com>.
Oh my yes! I feel a great sense of relief every time an intermittent
problem becomes reproducible... The problem is not solved, but at
least I have a good feeling that once I don't see it any more it's
_really_ gone!

One possibility is index merging, see:
https://wiki.apache.org/solr/MergingSolrIndexes. When you merge
indexes, there is no duplicate id checking performed, so you can well
have duplicates. That's a wild shot in the dark though.

Best,
Erick

On Mon, Apr 7, 2014 at 12:26 PM, Simon <si...@gallerysystems.com> wrote:
> Erick,
>
> It's indeed quite odd.  And after I trigger re-indexing all documents (via
> the normal process of existing program). The duplication is gone.  It can
> not be reproduced easily.  But it did occur occasionally and that makes it a
> frustrating task to troubleshoot.
>
> Thanks,
> Simon
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-tp4129651p4129701.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Duplicate Unique Key

Posted by Simon <si...@gallerysystems.com>.
Erick,

It's indeed quite odd.  And after I trigger re-indexing all documents (via
the normal process of existing program). The duplication is gone.  It can
not be reproduced easily.  But it did occur occasionally and that makes it a
frustrating task to troubleshoot. 

Thanks,
Simon



--
View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-tp4129651p4129701.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Duplicate Unique Key

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, that's odd. I just tried it (admittedly with post.jar rather
than SolrJ) and it works just fine.

what server are you using (e.g. CloudSolrServer)? And can you create a
self-contained program that illustrates the problem?

Best,
Erick

On Mon, Apr 7, 2014 at 8:50 AM, Simon <si...@gallerysystems.com> wrote:
> Hi all,
>
> I know someone has posted similar question before.  But my case is little
> different as I don't have the schema set up issue mentioned in those posts
> but still get duplicate records.
>
> My unique key in schema is
>
>     <field name="id$"        type="string"   indexed="true"  stored="true"
> multiValued="false" required="true"/>
>
>
>     <uniqueKey>id$</uniqueKey>
>
>
>
> Search on Solr- admin UI:   id$:1
>
> I got two documents
> {
>        "id$": "1",
>        "_version_": 1464225014071951400,
>         "_root_": 1
> },
> {
>         "id$": "1",
>         "_version_": 1464236728284872700,
>         "_root_": 1
> }
>
> I use SolrJ api to add documents.  My understanding solr uniqueKey is like a
> database primary key. I am wondering how could I end up with two documents
> with same uniqueKey in the index.
>
> Thanks,
> Simon
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-tp4129651.html
> Sent from the Solr - User mailing list archive at Nabble.com.