You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bruno Mannina <bm...@matheo-software.com> on 2017/04/14 09:52:42 UTC
Shards, delete duplicates ?
Dear Solr users,
I have two collections C1 and C2
For C1 and C2 the unique key is ID.
ID in C1 are patent numbers normalized i.e US + 12 digits + A1
ID in C2 are patent numbers as I receive them. US + 13 digits + A1 (a
leading 0 is added)
My collection C2 has a field name ID12 which is not defined as a unique
field.
This ID12 is the copy of the field ID of C1. (US + 12 digits + A1)
Data in ID12 are unique in the whole C2 collection.
Data in C1_ID and C2_ID12 are the same.
I try to request these both collections using shards in the url.
It works fine but I get duplicate documents. Its normal I know.
Is exists a method, a parameter, or anything else that allows me to indicate
to solr to compare ID in C1 with ID12 in C2 to delete duplicates ?
Many thanks for your help,
Bruno Mannina
<http://www.matheo-software.com> www.matheo-software.com
<http://www.patent-pulse.com> www.patent-pulse.com
Tél. +33 0 430 650 788
Fax. +33 0 430 650 728
<https://www.facebook.com/PatentPulse> facebook (1)
<https://twitter.com/matheosoftware> 1425551717
<https://www.linkedin.com/company/matheo-software> 1425551737
<https://www.youtube.com/user/MatheoSoftware> 1425551760
---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus