You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Martin Frank Hansen (MHQ)" <MH...@kmd.dk> on 2018/10/07 17:22:59 UTC

DIH for different levels of XML

Hi,

I am having some difficulties adding data from different levels of a xml document.

The xml can be as simple as this:

<Export>
  <Case>
    <id>2165432</id>
    <item>
      <Journalnummer>5</Journalnummer>
      <Journalnummer>10</Journalnummer>
    </item>
  </Case>
</Export>

The data-config-file looks like this.
<dataConfig>
  <dataSource name="myfilereader" type="FileDataSource" encoding="UTF-8"/>
    <document>
      <entity
        name="xml"
        pk="Id"
        stream="true"
        processor="XPathEntityProcessor"
        url="C:/Users/z6mhq/Desktop/data_import/test.xml"
        forEach="/Export/Case/ | /Export/Case/item/"
        transformer="DateFormatTransformer" >

        <field column="Id" xpath="/Export/Case/id" />
        <field column="Journalnummer" xpath="/Export/Case/item/Journalnummer" />

      </entity>
  </document>
</dataConfig>

The result is the following:
{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*",
      "_":"1538931455588"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "Id":"2165432",
        "_version_":1613686828885344256}]
  }}

While expecting something like this:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*",
      "_":"1538931455588"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "Id":"2165432",
        "Journalnummer":[5,10]}]
  }}


I have tried a lot of things to import the data correctly but to no avail, I really hope that someone can help me.

Thanks in advance, any help is much appreciated.

Martin Hansen


Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process your personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have received this message by mistake, please inform the sender of the mistake by sending a reply, then delete the message from your system without making, distributing or retaining any copies of it. Although we believe that the message and any attachments are free from viruses and other errors that might affect the computer or it-system where it is received and read, the recipient opens the message at his or her own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this message.

SV: DIH for different levels of XML

Posted by "Martin Frank Hansen (MHQ)" <MH...@kmd.dk>.
Hi Alex,

Thanks for your answer.

I think I made it work. The problem was actually in the schema.xml, where the field "Journalnummer" should have  multiValued="true".


Martin Frank Hansen



Lautrupparken 40-42, DK-2750 Ballerup
E-mail mhq@kmd.dk  Web www.kmd.dk
Mobil +4525571418

-----Oprindelig meddelelse-----
Fra: Alexandre Rafalovitch <ar...@gmail.com>
Sendt: 7. oktober 2018 20:18
Til: solr-user <so...@lucene.apache.org>
Emne: Re: DIH for different levels of XML

If your ID field comes from one XML level and your record details from another, they are processed as two separate records. Have a look at atom example that ships with DIH example set. Specifically, at commonField parameter, it may be useful for you:
https://lucene.apache.org/solr/guide/7_4/uploading-structured-data-store-data-with-the-data-import-handler.html

Regards,
   Alex.
On Sun, 7 Oct 2018 at 13:23, Martin Frank Hansen (MHQ) <MH...@kmd.dk> wrote:
>
> Hi,
>
> I am having some difficulties adding data from different levels of a xml document.
>
> The xml can be as simple as this:
>
> <Export>
>   <Case>
>     <id>2165432</id>
>     <item>
>       <Journalnummer>5</Journalnummer>
>       <Journalnummer>10</Journalnummer>
>     </item>
>   </Case>
> </Export>
>
> The data-config-file looks like this.
> <dataConfig>
>   <dataSource name="myfilereader" type="FileDataSource" encoding="UTF-8"/>
>     <document>
>       <entity
>         name="xml"
>         pk="Id"
>         stream="true"
>         processor="XPathEntityProcessor"
>         url="C:/Users/z6mhq/Desktop/data_import/test.xml"
>         forEach="/Export/Case/ | /Export/Case/item/"
>         transformer="DateFormatTransformer" >
>
>         <field column="Id" xpath="/Export/Case/id" />
>         <field column="Journalnummer"
> xpath="/Export/Case/item/Journalnummer" />
>
>       </entity>
>   </document>
> </dataConfig>
>
> The result is the following:
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0,
>     "params":{
>       "q":"*:*",
>       "_":"1538931455588"}},
>   "response":{"numFound":1,"start":0,"docs":[
>       {
>         "Id":"2165432",
>         "_version_":1613686828885344256}]
>   }}
>
> While expecting something like this:
>
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0,
>     "params":{
>       "q":"*:*",
>       "_":"1538931455588"}},
>   "response":{"numFound":1,"start":0,"docs":[
>       {
>         "Id":"2165432",
>         "Journalnummer":[5,10]}]
>   }}
>
>
> I have tried a lot of things to import the data correctly but to no avail, I really hope that someone can help me.
>
> Thanks in advance, any help is much appreciated.
>
> Martin Hansen
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If you have received this message by mistake, please inform the sender of the mistake by sending a reply, then delete the message from your system without making, distributing or retaining any copies of it. Although we believe that the message and any attachments are free from viruses and other errors that might affect the computer or it-system where it is received and read, the recipient opens the message at his or her own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this message.

Re: DIH for different levels of XML

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
If your ID field comes from one XML level and your record details from
another, they are processed as two separate records. Have a look at
atom example that ships with DIH example set. Specifically, at
commonField parameter, it may be useful for you:
https://lucene.apache.org/solr/guide/7_4/uploading-structured-data-store-data-with-the-data-import-handler.html

Regards,
   Alex.
On Sun, 7 Oct 2018 at 13:23, Martin Frank Hansen (MHQ) <MH...@kmd.dk> wrote:
>
> Hi,
>
> I am having some difficulties adding data from different levels of a xml document.
>
> The xml can be as simple as this:
>
> <Export>
>   <Case>
>     <id>2165432</id>
>     <item>
>       <Journalnummer>5</Journalnummer>
>       <Journalnummer>10</Journalnummer>
>     </item>
>   </Case>
> </Export>
>
> The data-config-file looks like this.
> <dataConfig>
>   <dataSource name="myfilereader" type="FileDataSource" encoding="UTF-8"/>
>     <document>
>       <entity
>         name="xml"
>         pk="Id"
>         stream="true"
>         processor="XPathEntityProcessor"
>         url="C:/Users/z6mhq/Desktop/data_import/test.xml"
>         forEach="/Export/Case/ | /Export/Case/item/"
>         transformer="DateFormatTransformer" >
>
>         <field column="Id" xpath="/Export/Case/id" />
>         <field column="Journalnummer" xpath="/Export/Case/item/Journalnummer" />
>
>       </entity>
>   </document>
> </dataConfig>
>
> The result is the following:
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0,
>     "params":{
>       "q":"*:*",
>       "_":"1538931455588"}},
>   "response":{"numFound":1,"start":0,"docs":[
>       {
>         "Id":"2165432",
>         "_version_":1613686828885344256}]
>   }}
>
> While expecting something like this:
>
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0,
>     "params":{
>       "q":"*:*",
>       "_":"1538931455588"}},
>   "response":{"numFound":1,"start":0,"docs":[
>       {
>         "Id":"2165432",
>         "Journalnummer":[5,10]}]
>   }}
>
>
> I have tried a lot of things to import the data correctly but to no avail, I really hope that someone can help me.
>
> Thanks in advance, any help is much appreciated.
>
> Martin Hansen
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If you have received this message by mistake, please inform the sender of the mistake by sending a reply, then delete the message from your system without making, distributing or retaining any copies of it. Although we believe that the message and any attachments are free from viruses and other errors that might affect the computer or it-system where it is received and read, the recipient opens the message at his or her own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this message.