You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sabman <sa...@gmail.com> on 2011/06/23 21:31:12 UTC

Updating the data-config file

So I have some RSS feeds that I want to index using Solr. I am using the
DataImportHandler and I have added the instructions on how to parse the
feeds in the data-config file. 

Now if a user wants to add more RSS feeds to index, do I have to
programatically instruct Solr to update the config file? Is there a HTTP
POST or GET I can send to update the data-config file?

--
View this message in context: http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3101241.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Updating the data-config file

Posted by sabman <sa...@gmail.com>.
Thanks. I will look into this and see how it goes.

--
View this message in context: http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3104470.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Updating the data-config file

Posted by Ahmet Arslan <io...@yahoo.com>.
> Ahh! Thats interesting!
> 
> I understand what you mean. Since RSS and Atom feeds have
> the same structure
> parsing them would be the same but I can do the for each
> different URLs.
> These URLs can be obtained from a db, a file or through the
> request
> parameters, right?

Exactly. You can register multiple <dataSource with different names. And then in each each <entity, you can select appropriate data source with dataSource="..." tag.

For a db, data-config.xml would be something like:

<dataSource type="HttpDataSource" name="http"/>
<dataSource type="JdbcDataSource" name="db" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/mydb" batchSize="-1"/>

<entity name="urls" dataSource="db" query="SELECT url FROM urls"> 
<entity name="slashdot" dataSource="http"
                        pk="link"
                        url="${urls.url}"
                        processor="XPathEntityProcessor"
                        forEach="/RDF/channel | /RDF/item"
                        transformer="DateFormatTransformer">

Re: Updating the data-config file

Posted by sabman <sa...@gmail.com>.
Ahh! Thats interesting!

I understand what you mean. Since RSS and Atom feeds have the same structure
parsing them would be the same but I can do the for each different URLs.
These URLs can be obtained from a db, a file or through the request
parameters, right?

--
View this message in context: http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3102225.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Updating the data-config file

Posted by Ahmet Arslan <io...@yahoo.com>.
> So you mean I cannot update the
> data-config programmatically? 

Yes you can update it, and reload it via command dataimport?command=reload-config. However there is no built-in mechanism for this in solr.

> I don't
> understand how the request parameters be of use to me.

May be you can use different ulr in each import request.
dataimport?command=full-import&clean=false&url=myNewlyAddedURL

 
> This is how my data-config file looks:
> 
> 
> <dataConfig>
>         <dataSource
> type="HttpDataSource" />
>         <document>
>                
> <entity name="slashdot"
>                
>         pk="link"
>                
>         url="http://rss.slashdot.org/Slashdot/slashdot"
>                
>        
> processor="XPathEntityProcessor"
>                
>         forEach="/RDF/channel |
> /RDF/item"
>                
>        
> transformer="DateFormatTransformer">
> 
>                
>         <field column="title" 
>       xpath="/RDF/item/title"
> />
>                
>         <field column="link" 
>        xpath="/RDF/item/link"
> />
>                
>         <field column="description" 
> xpath="/RDF/item/description" />
>                
>         <field column="date"
> xpath="/RDF/item/date"
> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
>                
> </entity>
>         </document>
> </dataConfig>
> 
> I am running a Flash based application as the front end UI
> to show the
> search results. Now I want the user to be able to add new
> RSS feed data
> sources. 

How about fetching urls ("http://rss.slashdot.org/Slashdot/slashdot") from an another data source, like database table, text file in a file system etc.

Re: Updating the data-config file

Posted by sabman <sa...@gmail.com>.
So you mean I cannot update the data-config programmatically? I don't
understand how the request parameters be of use to me.

This is how my data-config file looks:


<dataConfig>
        <dataSource type="HttpDataSource" />
        <document>
                <entity name="slashdot"
                        pk="link"
                        url="http://rss.slashdot.org/Slashdot/slashdot"
                        processor="XPathEntityProcessor"
                        forEach="/RDF/channel | /RDF/item"
                        transformer="DateFormatTransformer">

                        <field column="title"        xpath="/RDF/item/title"
/>
                        <field column="link"         xpath="/RDF/item/link"
/>
                        <field column="description" 
xpath="/RDF/item/description" />
                        <field column="date" xpath="/RDF/item/date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
                </entity>
        </document>
</dataConfig>

I am running a Flash based application as the front end UI to show the
search results. Now I want the user to be able to add new RSS feed data
sources. 


--
View this message in context: http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3101530.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Updating the data-config file

Posted by Ahmet Arslan <io...@yahoo.com>.
> So I have some RSS feeds that I want
> to index using Solr. I am using the
> DataImportHandler and I have added the instructions on how
> to parse the
> feeds in the data-config file. 
> 
> Now if a user wants to add more RSS feeds to index, do I
> have to
> programatically instruct Solr to update the config file? Is
> there a HTTP
> POST or GET I can send to update the data-config file?

AFAIK there is no such thing to edit data-config file.

However you can pass an argument when triggering import, if thats helps.
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters

Also you can save your rss url in a db, use multiple data sources. You only update the relevant table.