You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lebin Sebastian <le...@codetheory.io> on 2017/05/31 16:06:32 UTC

SOLR | De-Duplication | Remove duplicate records based on their status

Hello,

I am indexing two different model with same data but different status.

Eg:
*Scenario -1*
{Model: "AAAA", name: "abc", status: "T"}
{Model: "BBBB", name: "abc", status: "A"}

Expected Output
{Model: "BBBB", name: "abc", status: "A"}

*Scenario -2 *
{Model: "AAAA", name: "abc", status: "A"}
{Model: "BBBB", name: "abc", status: "T"}

Expected Output
{Model: "AAAA", name: "abc", status: "A"}

*Scenario -3*
{Model: "AAAA", name: "abc", status: "A"}
{Model: "BBBB", name: "abc", status: "A"}

Expected Output
{Model: "AAAA", name: "abc", status: "A"} either one.


*Scenario -4*
{Model: "AAAA", name: "abc", status: "T"}
{Model: "BBBB", name: "abc", status: "T"}

Expected Output
{Model: "AAAA", name: "abc", status: "T"} either one.

.

Scenario 3 & 4 are working as expected with current configuration which I
have given below.

For Scenario 1 & 2 output should be based on the status of the record.

Please help me to fix scenario 1 & 2.


*Solr version : 5.3*

*Solrconfig.xml*

<requestHandler name="/update" class="solr.UpdateRequestHandler" >
  <lst name="defaults">
    <str name="update.chain">dedupe</str>
  </lst>
</requestHandler>

<updateRequestProcessorChain name="dedupe">
  <processor class="solr.processor.SignatureUpdateProcessorFactory">
    <bool name="enabled">true</bool>
    <str name="signatureField">signature</str>
    <bool name="overwriteDupes">true</bool>
    <str name="fields">id</str>
    <str name="signatureClass">solr.processor.Lookup3Signature</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>


Thanks,

Lebin F

Re: SOLR | De-Duplication | Remove duplicate records based on their status

Posted by simon <mt...@gmail.com>.
Your updateRequestProcessorChain config  snippet specifies the "id" field
to generate a signature, but the sample data doesn't contain an "id" field
... check that out first.

-Simon

On Wed, May 31, 2017 at 12:06 PM, Lebin Sebastian <le...@codetheory.io>
wrote:

> Hello,
>
> I am indexing two different model with same data but different status.
>
> Eg:
> *Scenario -1*
> {Model: "AAAA", name: "abc", status: "T"}
> {Model: "BBBB", name: "abc", status: "A"}
>
> Expected Output
> {Model: "BBBB", name: "abc", status: "A"}
>
> *Scenario -2 *
> {Model: "AAAA", name: "abc", status: "A"}
> {Model: "BBBB", name: "abc", status: "T"}
>
> Expected Output
> {Model: "AAAA", name: "abc", status: "A"}
>
> *Scenario -3*
> {Model: "AAAA", name: "abc", status: "A"}
> {Model: "BBBB", name: "abc", status: "A"}
>
> Expected Output
> {Model: "AAAA", name: "abc", status: "A"} either one.
>
>
> *Scenario -4*
> {Model: "AAAA", name: "abc", status: "T"}
> {Model: "BBBB", name: "abc", status: "T"}
>
> Expected Output
> {Model: "AAAA", name: "abc", status: "T"} either one.
>
> .
>
> Scenario 3 & 4 are working as expected with current configuration which I
> have given below.
>
> For Scenario 1 & 2 output should be based on the status of the record.
>
> Please help me to fix scenario 1 & 2.
>
>
> *Solr version : 5.3*
>
> *Solrconfig.xml*
>
> <requestHandler name="/update" class="solr.UpdateRequestHandler" >
>   <lst name="defaults">
>     <str name="update.chain">dedupe</str>
>   </lst>
> </requestHandler>
>
> <updateRequestProcessorChain name="dedupe">
>   <processor class="solr.processor.SignatureUpdateProcessorFactory">
>     <bool name="enabled">true</bool>
>     <str name="signatureField">signature</str>
>     <bool name="overwriteDupes">true</bool>
>     <str name="fields">id</str>
>     <str name="signatureClass">solr.processor.Lookup3Signature</str>
>   </processor>
>   <processor class="solr.LogUpdateProcessorFactory" />
>   <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
>
> Thanks,
>
> Lebin F
>