You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jörg Agatz <jo...@googlemail.com> on 2010/11/15 09:22:57 UTC

XML to solr

hi Users.

I have a Question,

i have a lot of XML to indexing, at the Moment i have two XML files, one
original, and one for solr a (Search_xml)

for example:

<add>
    <doc>
        <SECTION type="FILE_ITEMS">
            <field name="MD5SUM">6483030ed18d8b7a58a701c8bb638d20</field>
            <field name="DATEINAME">0012_201011051119382060000.pdf</field>
            <field name="FILE_TYPE">PDM</field>
        </SECTION>
        <SECTION type="ERP">
            <SECTION type="ERP_FILE_ITEMS">
                <field name="ID">xxxxxxxxxx</field>
            </SECTION>
            <SECTION type="ERP_FILE_CONTENT">
                <field name="VORGANGSART">EK-Anfrage</field>
            </SECTION>
        </SECTION>
    </doc>
</add>



Search_xml :




<add>
    <doc>
        <field
name="FILE_ITEMS_MD5SUM">6483030ed18d8b7a58a701c8bb638d20</field>
        <field
name="FILE_ITEMS_DATEINAME">0012_201011051119382060000.pdf</field>
        <field name="FILE_ITEMS_FILE_TYPE">PDM</field>
        <field name="ERP_ERP_FILE_ITEMS_ID">xxxxxxxxxx</field>
        <field name="ERP_ERP_FILE_CONTENT_VORGANSART">EK-Anfrage</field>
    </doc>
</add>

My Question is now, (how) can i indexing the Original XML? without move the
XML to a special search XML?

Re: XML to solr

Posted by Lance Norskog <go...@gmail.com>.
The XPathEntityProcessor has a very limited grammar of path expressions. 
It has the ability to use an XSL script, which would then let you do 
anything, but I have not used it.

Chantal Ackermann wrote:
> Hi Jörg,
>
> you could use the DataImportHandler's XPathEntityProcessor. There you
> can specify for each sorl field the XPath at which its value is stored
> in the original file (your first example snippet).
>
> The value of field "FIEL_ITEMS_DATEINAME" for example would have the
> XPath //field[@name='DATEINAME'].
> (http://zvon.org/xxl/XPathTutorial/General_ger/examples.html has a very
> simple and good reference for xpath patterns.)
>
> Have a look at the DataImportHandler wiki page on how to call the
> XPathEntityProcessor.
>
> Cheers,
> Chantal
>
> On Mon, 2010-11-15 at 09:22 +0100, Jörg Agatz wrote:
>    
>> hi Users.
>>
>> I have a Question,
>>
>> i have a lot of XML to indexing, at the Moment i have two XML files, one
>> original, and one for solr a (Search_xml)
>>
>> for example:
>>
>> <add>
>>      <doc>
>>          <SECTION type="FILE_ITEMS">
>>              <field name="MD5SUM">6483030ed18d8b7a58a701c8bb638d20</field>
>>              <field name="DATEINAME">0012_201011051119382060000.pdf</field>
>>              <field name="FILE_TYPE">PDM</field>
>>          </SECTION>
>>          <SECTION type="ERP">
>>              <SECTION type="ERP_FILE_ITEMS">
>>                  <field name="ID">xxxxxxxxxx</field>
>>              </SECTION>
>>              <SECTION type="ERP_FILE_CONTENT">
>>                  <field name="VORGANGSART">EK-Anfrage</field>
>>              </SECTION>
>>          </SECTION>
>>      </doc>
>> </add>
>>
>>
>>
>> Search_xml :
>>
>>
>>
>>
>> <add>
>>      <doc>
>>          <field
>> name="FILE_ITEMS_MD5SUM">6483030ed18d8b7a58a701c8bb638d20</field>
>>          <field
>> name="FILE_ITEMS_DATEINAME">0012_201011051119382060000.pdf</field>
>>          <field name="FILE_ITEMS_FILE_TYPE">PDM</field>
>>          <field name="ERP_ERP_FILE_ITEMS_ID">xxxxxxxxxx</field>
>>          <field name="ERP_ERP_FILE_CONTENT_VORGANSART">EK-Anfrage</field>
>>      </doc>
>> </add>
>>
>> My Question is now, (how) can i indexing the Original XML? without move the
>> XML to a special search XML?
>>      
>
>    

Re: XML to solr

Posted by Chantal Ackermann <ch...@btelligent.de>.
Hi Jörg,

you could use the DataImportHandler's XPathEntityProcessor. There you
can specify for each sorl field the XPath at which its value is stored
in the original file (your first example snippet).

The value of field "FIEL_ITEMS_DATEINAME" for example would have the
XPath //field[@name='DATEINAME'].
(http://zvon.org/xxl/XPathTutorial/General_ger/examples.html has a very
simple and good reference for xpath patterns.)

Have a look at the DataImportHandler wiki page on how to call the
XPathEntityProcessor.

Cheers,
Chantal

On Mon, 2010-11-15 at 09:22 +0100, Jörg Agatz wrote:
> hi Users.
> 
> I have a Question,
> 
> i have a lot of XML to indexing, at the Moment i have two XML files, one
> original, and one for solr a (Search_xml)
> 
> for example:
> 
> <add>
>     <doc>
>         <SECTION type="FILE_ITEMS">
>             <field name="MD5SUM">6483030ed18d8b7a58a701c8bb638d20</field>
>             <field name="DATEINAME">0012_201011051119382060000.pdf</field>
>             <field name="FILE_TYPE">PDM</field>
>         </SECTION>
>         <SECTION type="ERP">
>             <SECTION type="ERP_FILE_ITEMS">
>                 <field name="ID">xxxxxxxxxx</field>
>             </SECTION>
>             <SECTION type="ERP_FILE_CONTENT">
>                 <field name="VORGANGSART">EK-Anfrage</field>
>             </SECTION>
>         </SECTION>
>     </doc>
> </add>
> 
> 
> 
> Search_xml :
> 
> 
> 
> 
> <add>
>     <doc>
>         <field
> name="FILE_ITEMS_MD5SUM">6483030ed18d8b7a58a701c8bb638d20</field>
>         <field
> name="FILE_ITEMS_DATEINAME">0012_201011051119382060000.pdf</field>
>         <field name="FILE_ITEMS_FILE_TYPE">PDM</field>
>         <field name="ERP_ERP_FILE_ITEMS_ID">xxxxxxxxxx</field>
>         <field name="ERP_ERP_FILE_CONTENT_VORGANSART">EK-Anfrage</field>
>     </doc>
> </add>
> 
> My Question is now, (how) can i indexing the Original XML? without move the
> XML to a special search XML?