You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexander Aristov <al...@gmail.com> on 2011/12/13 20:34:30 UTC

solr ignore duplicate documents

People,

I am asking for your help with solr.

When a document is sent to solr and such document already exists in its
index (by its ID) then the new doc replaces the old one.

But I don't want to automatically replace documents. Just ignore and
proceed to the next. How can I configure solr to do so?

Of course I can query solr to check if it has the document already but it's
bad for me since I do bulk updates and this will complicate the process and
increase amount of request.

So are there any ways to configure solr to ignore duplicates? Just ignore.
I don't need any specific responses or actions.

Best Regards
Alexander Aristov

Re: solr ignore duplicate documents

Posted by Erick Erickson <er...@gmail.com>.
You're probably talking a custom update handler here. That
way you can do a document ID lookup, that is just see if the
incoming document ID is in the index already and throw
the document away if you find one. This should be very
efficient, much more efficient than making a separate query
for each one.

There's no way that I know of to do this out of the box in Solr though.

Best
Erick

On Tue, Dec 13, 2011 at 3:44 PM, Mikhail Khludnev
<mk...@griddynamics.com> wrote:
> Man,
>
> Does overwrite=false work for you?
>  http://wiki.apache.org/solr/UpdateXmlMessages#add.2BAC8-replace_documents
>
> Regards
>
> On Tue, Dec 13, 2011 at 11:34 PM, Alexander Aristov <
> alexander.aristov@gmail.com> wrote:
>
>> People,
>>
>> I am asking for your help with solr.
>>
>> When a document is sent to solr and such document already exists in its
>> index (by its ID) then the new doc replaces the old one.
>>
>> But I don't want to automatically replace documents. Just ignore and
>> proceed to the next. How can I configure solr to do so?
>>
>> Of course I can query solr to check if it has the document already but it's
>> bad for me since I do bulk updates and this will complicate the process and
>> increase amount of request.
>>
>> So are there any ways to configure solr to ignore duplicates? Just ignore.
>> I don't need any specific responses or actions.
>>
>> Best Regards
>> Alexander Aristov
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Developer
> Grid Dynamics
> tel. 1-415-738-8644
> Skype: mkhludnev
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>

Re: solr ignore duplicate documents

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Man,

Does overwrite=false work for you?
 http://wiki.apache.org/solr/UpdateXmlMessages#add.2BAC8-replace_documents

Regards

On Tue, Dec 13, 2011 at 11:34 PM, Alexander Aristov <
alexander.aristov@gmail.com> wrote:

> People,
>
> I am asking for your help with solr.
>
> When a document is sent to solr and such document already exists in its
> index (by its ID) then the new doc replaces the old one.
>
> But I don't want to automatically replace documents. Just ignore and
> proceed to the next. How can I configure solr to do so?
>
> Of course I can query solr to check if it has the document already but it's
> bad for me since I do bulk updates and this will complicate the process and
> increase amount of request.
>
> So are there any ways to configure solr to ignore duplicates? Just ignore.
> I don't need any specific responses or actions.
>
> Best Regards
> Alexander Aristov
>



-- 
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev
<http://www.griddynamics.com>
 <mk...@griddynamics.com>