You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lebin Sebastian <le...@codetheory.io> on 2017/05/31 16:06:32 UTC
SOLR | De-Duplication | Remove duplicate records based on their status
Hello,
I am indexing two different model with same data but different status.
Eg:
*Scenario -1*
{Model: "AAAA", name: "abc", status: "T"}
{Model: "BBBB", name: "abc", status: "A"}
Expected Output
{Model: "BBBB", name: "abc", status: "A"}
*Scenario -2 *
{Model: "AAAA", name: "abc", status: "A"}
{Model: "BBBB", name: "abc", status: "T"}
Expected Output
{Model: "AAAA", name: "abc", status: "A"}
*Scenario -3*
{Model: "AAAA", name: "abc", status: "A"}
{Model: "BBBB", name: "abc", status: "A"}
Expected Output
{Model: "AAAA", name: "abc", status: "A"} either one.
*Scenario -4*
{Model: "AAAA", name: "abc", status: "T"}
{Model: "BBBB", name: "abc", status: "T"}
Expected Output
{Model: "AAAA", name: "abc", status: "T"} either one.
.
Scenario 3 & 4 are working as expected with current configuration which I
have given below.
For Scenario 1 & 2 output should be based on the status of the record.
Please help me to fix scenario 1 & 2.
*Solr version : 5.3*
*Solrconfig.xml*
<requestHandler name="/update" class="solr.UpdateRequestHandler" >
<lst name="defaults">
<str name="update.chain">dedupe</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="dedupe">
<processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">signature</str>
<bool name="overwriteDupes">true</bool>
<str name="fields">id</str>
<str name="signatureClass">solr.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Thanks,
Lebin F
Re: SOLR | De-Duplication | Remove duplicate records based on their status
Posted by simon <mt...@gmail.com>.
Your updateRequestProcessorChain config snippet specifies the "id" field
to generate a signature, but the sample data doesn't contain an "id" field
... check that out first.
-Simon
On Wed, May 31, 2017 at 12:06 PM, Lebin Sebastian <le...@codetheory.io>
wrote:
> Hello,
>
> I am indexing two different model with same data but different status.
>
> Eg:
> *Scenario -1*
> {Model: "AAAA", name: "abc", status: "T"}
> {Model: "BBBB", name: "abc", status: "A"}
>
> Expected Output
> {Model: "BBBB", name: "abc", status: "A"}
>
> *Scenario -2 *
> {Model: "AAAA", name: "abc", status: "A"}
> {Model: "BBBB", name: "abc", status: "T"}
>
> Expected Output
> {Model: "AAAA", name: "abc", status: "A"}
>
> *Scenario -3*
> {Model: "AAAA", name: "abc", status: "A"}
> {Model: "BBBB", name: "abc", status: "A"}
>
> Expected Output
> {Model: "AAAA", name: "abc", status: "A"} either one.
>
>
> *Scenario -4*
> {Model: "AAAA", name: "abc", status: "T"}
> {Model: "BBBB", name: "abc", status: "T"}
>
> Expected Output
> {Model: "AAAA", name: "abc", status: "T"} either one.
>
> .
>
> Scenario 3 & 4 are working as expected with current configuration which I
> have given below.
>
> For Scenario 1 & 2 output should be based on the status of the record.
>
> Please help me to fix scenario 1 & 2.
>
>
> *Solr version : 5.3*
>
> *Solrconfig.xml*
>
> <requestHandler name="/update" class="solr.UpdateRequestHandler" >
> <lst name="defaults">
> <str name="update.chain">dedupe</str>
> </lst>
> </requestHandler>
>
> <updateRequestProcessorChain name="dedupe">
> <processor class="solr.processor.SignatureUpdateProcessorFactory">
> <bool name="enabled">true</bool>
> <str name="signatureField">signature</str>
> <bool name="overwriteDupes">true</bool>
> <str name="fields">id</str>
> <str name="signatureClass">solr.processor.Lookup3Signature</str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
>
> Thanks,
>
> Lebin F
>