You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sc...@asia.com on 2010/06/23 13:59:00 UTC

Import XML files different format?

Hi,

I'm new to solr. It looks great.

I would like to add a XML document in the following format in solr:

<?xml version="1.0" encoding="utf-8"?>
<race>
<go>
    <id><![CDATA[...]]></id>
    <title><![CDATA[...]]></title>
    <url><![CDATA[...]]></url>
    <content><![CDATA[...]]></content>
    <city><![CDATA[...]]></city>
    <postcode><![CDATA[...]]></postcode>
    <contract><![CDATA[...]]></contract>
    <category><![CDATA[...]]></category>
    <date><![CDATA[...]]></date>
    <time><![CDATA[...]]></time>
</go>

etc...
</race>



Is there a way to do this? If yes how?

Or i need to convert it with some scripts to this:

<add>
<doc>
   <field name="authors">Patrick Eagar</field>
   <field name="subject">Sports</field>
etc...


Thanks for your help

Regards

Re: Import XML files different format?

Posted by sc...@asia.com.
Thanks Eric for your answer.

I'll try to use DIH via data-config.xml as i might index other content with different XML structure in the futur... 

Will i need to have different data-config for each XML strucure content file? And then manualy cange between them?



 

 


 

 

-----Original Message-----
From: Erik Hatcher <er...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Wed, Jun 23, 2010 2:19 pm
Subject: Re: Import XML files different format?


You can use DataImportHandler's XML/XPath capabilities to do this: 
 
  <http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource > 
 
or you could, of course, convert your XML to Solr's XML format. 
 
Another fine option for what this data looks like, CSV format. 
 
I'd imagine you have the orginal data in a relational database though? 
 
   Erik 
 
On Jun 23, 2010, at 7:59 AM, scrapy@asia.com wrote: 
 
> Hi, 
> 
> I'm new to solr. It looks great. 
> 
> I would like to add a XML document in the following format in solr: 
> 
> <?xml version="1.0" encoding="utf-8"?> 
> <race> 
> <go> 
>    <id><![CDATA[...]]></id> 
>    <title><![CDATA[...]]></title> 
>    <url><![CDATA[...]]></url> 
>    <content><![CDATA[...]]></content> 
>    <city><![CDATA[...]]></city> 
>    <postcode><![CDATA[...]]></postcode> 
>    <contract><![CDATA[...]]></contract> 
>    <category><![CDATA[...]]></category> 
>    <date><![CDATA[...]]></date> 
>    <time><![CDATA[...]]></time> 
> </go> 
> 
> etc... 
> </race> 
> 
> 
> 
> Is there a way to do this? If yes how? 
> 
> Or i need to convert it with some scripts to this: 
> 
> <add> 
> <doc> 
>   <field name="authors">Patrick Eagar</field> 
>   <field name="subject">Sports</field> 
> etc... 
> 
> 
> Thanks for your help 
> 
> Regards 
 

 

Re: Import XML files different format?

Posted by Erik Hatcher <er...@gmail.com>.
You can use DataImportHandler's XML/XPath capabilities to do this:

   <http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource 
 >

or you could, of course, convert your XML to Solr's XML format.

Another fine option for what this data looks like, CSV format.

I'd imagine you have the orginal data in a relational database though?

	Erik


On Jun 23, 2010, at 7:59 AM, scrapy@asia.com wrote:

> Hi,
>
> I'm new to solr. It looks great.
>
> I would like to add a XML document in the following format in solr:
>
> <?xml version="1.0" encoding="utf-8"?>
> <race>
> <go>
>    <id><![CDATA[...]]></id>
>    <title><![CDATA[...]]></title>
>    <url><![CDATA[...]]></url>
>    <content><![CDATA[...]]></content>
>    <city><![CDATA[...]]></city>
>    <postcode><![CDATA[...]]></postcode>
>    <contract><![CDATA[...]]></contract>
>    <category><![CDATA[...]]></category>
>    <date><![CDATA[...]]></date>
>    <time><![CDATA[...]]></time>
> </go>
>
> etc...
> </race>
>
>
>
> Is there a way to do this? If yes how?
>
> Or i need to convert it with some scripts to this:
>
> <add>
> <doc>
>   <field name="authors">Patrick Eagar</field>
>   <field name="subject">Sports</field>
> etc...
>
>
> Thanks for your help
>
> Regards