You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Martin Frank Hansen (MHQ)" <MH...@kmd.dk> on 2018/10/30 11:57:00 UTC

Merging data from different sources

Hi,

I am trying to merge files from different sources and with different content (except for one key-field) , how can this be done in Solr?

An example could be:

Document 1
    <Doc>
        <Id>0000001</Id>                                      Unique id for Document 1
        <Journalnumber>test-123</Journalnumber>
        …
        </Doc>

Document 2
    <Doc>
        <Id2>abcdefgh</Id2>                                   Unique id for Document 2
        <Journalnumber>test-123</Journalnumber>
        …
        </Doc>

In the above case I would like to merge on Journalnumber thus ending up with something like this:

     <Doc>
        <Id>0000001</Id>                                      Unique id for the merge
        <Journalnumber>test-123</Journalnumber>
        <Id2>abcdefgh</Id2>                                   Reference id for Document 2.
        …
        </Doc>

How would I go about this? I was thinking about embedded documents, but since I am not indexing the different data sources at the same time I don’t think it will work. The ideal result would be to have Document 2 imbedded in Document 1.

I am currently using a schema that contains all fields from Document 1 and Document 2.

I really hope that Solr can handle this, and any help/feedback is much appreciated.

Best regards

Martin




Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process your personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have received this message by mistake, please inform the sender of the mistake by sending a reply, then delete the message from your system without making, distributing or retaining any copies of it. Although we believe that the message and any attachments are free from viruses and other errors that might affect the computer or it-system where it is received and read, the recipient opens the message at his or her own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this message.

RE: Merging data from different sources

Posted by "Martin Frank Hansen (MHQ)" <MH...@kmd.dk>.
Hi Alex,

Thanks for your help. I will take a look at the update-request-processor.

I wonder if there is a way to link documents together, so that they always show up together should one of the documents match a search query?

-----Original Message-----
From: Alexandre Rafalovitch <ar...@gmail.com>
Sent: 30. oktober 2018 13:16
To: solr-user <so...@lucene.apache.org>
Subject: Re: Merging data from different sources

Maybe
https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory

Regards,
    Alex

On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ), <MH...@kmd.dk> wrote:

> Hi,
>
> I am trying to merge files from different sources and with different
> content (except for one key-field) , how can this be done in Solr?
>
> An example could be:
>
> Document 1
>     <Doc>
>         <Id>0000001</Id>                                      Unique id
> for Document 1
>         <Journalnumber>test-123</Journalnumber>
>         …
>         </Doc>
>
> Document 2
>     <Doc>
>         <Id2>abcdefgh</Id2>                                   Unique id
> for Document 2
>         <Journalnumber>test-123</Journalnumber>
>         …
>         </Doc>
>
> In the above case I would like to merge on Journalnumber thus ending
> up with something like this:
>
>      <Doc>
>         <Id>0000001</Id>                                      Unique id
> for the merge
>         <Journalnumber>test-123</Journalnumber>
>         <Id2>abcdefgh</Id2>                                   Reference id
> for Document 2.
>         …
>         </Doc>
>
> How would I go about this? I was thinking about embedded documents,
> but since I am not indexing the different data sources at the same
> time I don’t think it will work. The ideal result would be to have
> Document 2 imbedded in Document 1.
>
> I am currently using a schema that contains all fields from Document 1
> and Document 2.
>
> I really hope that Solr can handle this, and any help/feedback is much
> appreciated.
>
> Best regards
>
> Martin
>
>
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> finder du KMD’s
> Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how
> we process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
> beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
> er fri for virus og andre fejl, som kan påvirke computeren eller
> it-systemet, hvori den modtages og læses, åbnes den på modtagerens
> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er
> opstået i forbindelse med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If
> you have received this message by mistake, please inform the sender of
> the mistake by sending a reply, then delete the message from your
> system without making, distributing or retaining any copies of it.
> Although we believe that the message and any attachments are free from
> viruses and other errors that might affect the computer or it-system
> where it is received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the
> receipt or use of this message.
>

Re: Merging data from different sources

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Maybe
https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory

Regards,
    Alex

On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ), <MH...@kmd.dk> wrote:

> Hi,
>
> I am trying to merge files from different sources and with different
> content (except for one key-field) , how can this be done in Solr?
>
> An example could be:
>
> Document 1
>     <Doc>
>         <Id>0000001</Id>                                      Unique id
> for Document 1
>         <Journalnumber>test-123</Journalnumber>
>         …
>         </Doc>
>
> Document 2
>     <Doc>
>         <Id2>abcdefgh</Id2>                                   Unique id
> for Document 2
>         <Journalnumber>test-123</Journalnumber>
>         …
>         </Doc>
>
> In the above case I would like to merge on Journalnumber thus ending up
> with something like this:
>
>      <Doc>
>         <Id>0000001</Id>                                      Unique id
> for the merge
>         <Journalnumber>test-123</Journalnumber>
>         <Id2>abcdefgh</Id2>                                   Reference id
> for Document 2.
>         …
>         </Doc>
>
> How would I go about this? I was thinking about embedded documents, but
> since I am not indexing the different data sources at the same time I don’t
> think it will work. The ideal result would be to have Document 2 imbedded
> in Document 1.
>
> I am currently using a schema that contains all fields from Document 1 and
> Document 2.
>
> I really hope that Solr can handle this, and any help/feedback is much
> appreciated.
>
> Best regards
>
> Martin
>
>
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der
> fortæller, hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we
> process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi
> dig slette e-mailen i dit system uden at videresende eller kopiere den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
> for virus og andre fejl, som kan påvirke computeren eller it-systemet,
> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi
> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse
> med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If you
> have received this message by mistake, please inform the sender of the
> mistake by sending a reply, then delete the message from your system
> without making, distributing or retaining any copies of it. Although we
> believe that the message and any attachments are free from viruses and
> other errors that might affect the computer or it-system where it is
> received and read, the recipient opens the message at his or her own risk.
> We assume no responsibility for any loss or damage arising from the receipt
> or use of this message.
>