You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sabman <sa...@gmail.com> on 2011/06/23 21:31:12 UTC
Updating the data-config file
So I have some RSS feeds that I want to index using Solr. I am using the
DataImportHandler and I have added the instructions on how to parse the
feeds in the data-config file.
Now if a user wants to add more RSS feeds to index, do I have to
programatically instruct Solr to update the config file? Is there a HTTP
POST or GET I can send to update the data-config file?
--
View this message in context: http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3101241.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updating the data-config file
Posted by sabman <sa...@gmail.com>.
Thanks. I will look into this and see how it goes.
--
View this message in context: http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3104470.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updating the data-config file
Posted by Ahmet Arslan <io...@yahoo.com>.
> Ahh! Thats interesting!
>
> I understand what you mean. Since RSS and Atom feeds have
> the same structure
> parsing them would be the same but I can do the for each
> different URLs.
> These URLs can be obtained from a db, a file or through the
> request
> parameters, right?
Exactly. You can register multiple <dataSource with different names. And then in each each <entity, you can select appropriate data source with dataSource="..." tag.
For a db, data-config.xml would be something like:
<dataSource type="HttpDataSource" name="http"/>
<dataSource type="JdbcDataSource" name="db" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/mydb" batchSize="-1"/>
<entity name="urls" dataSource="db" query="SELECT url FROM urls">
<entity name="slashdot" dataSource="http"
pk="link"
url="${urls.url}"
processor="XPathEntityProcessor"
forEach="/RDF/channel | /RDF/item"
transformer="DateFormatTransformer">
Re: Updating the data-config file
Posted by sabman <sa...@gmail.com>.
Ahh! Thats interesting!
I understand what you mean. Since RSS and Atom feeds have the same structure
parsing them would be the same but I can do the for each different URLs.
These URLs can be obtained from a db, a file or through the request
parameters, right?
--
View this message in context: http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3102225.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updating the data-config file
Posted by Ahmet Arslan <io...@yahoo.com>.
> So you mean I cannot update the
> data-config programmatically?
Yes you can update it, and reload it via command dataimport?command=reload-config. However there is no built-in mechanism for this in solr.
> I don't
> understand how the request parameters be of use to me.
May be you can use different ulr in each import request.
dataimport?command=full-import&clean=false&url=myNewlyAddedURL
> This is how my data-config file looks:
>
>
> <dataConfig>
> <dataSource
> type="HttpDataSource" />
> <document>
>
> <entity name="slashdot"
>
> pk="link"
>
> url="http://rss.slashdot.org/Slashdot/slashdot"
>
>
> processor="XPathEntityProcessor"
>
> forEach="/RDF/channel |
> /RDF/item"
>
>
> transformer="DateFormatTransformer">
>
>
> <field column="title"
> xpath="/RDF/item/title"
> />
>
> <field column="link"
> xpath="/RDF/item/link"
> />
>
> <field column="description"
> xpath="/RDF/item/description" />
>
> <field column="date"
> xpath="/RDF/item/date"
> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
>
> </entity>
> </document>
> </dataConfig>
>
> I am running a Flash based application as the front end UI
> to show the
> search results. Now I want the user to be able to add new
> RSS feed data
> sources.
How about fetching urls ("http://rss.slashdot.org/Slashdot/slashdot") from an another data source, like database table, text file in a file system etc.
Re: Updating the data-config file
Posted by sabman <sa...@gmail.com>.
So you mean I cannot update the data-config programmatically? I don't
understand how the request parameters be of use to me.
This is how my data-config file looks:
<dataConfig>
<dataSource type="HttpDataSource" />
<document>
<entity name="slashdot"
pk="link"
url="http://rss.slashdot.org/Slashdot/slashdot"
processor="XPathEntityProcessor"
forEach="/RDF/channel | /RDF/item"
transformer="DateFormatTransformer">
<field column="title" xpath="/RDF/item/title"
/>
<field column="link" xpath="/RDF/item/link"
/>
<field column="description"
xpath="/RDF/item/description" />
<field column="date" xpath="/RDF/item/date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
</entity>
</document>
</dataConfig>
I am running a Flash based application as the front end UI to show the
search results. Now I want the user to be able to add new RSS feed data
sources.
--
View this message in context: http://lucene.472066.n3.nabble.com/Updating-the-data-config-file-tp3101241p3101530.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updating the data-config file
Posted by Ahmet Arslan <io...@yahoo.com>.
> So I have some RSS feeds that I want
> to index using Solr. I am using the
> DataImportHandler and I have added the instructions on how
> to parse the
> feeds in the data-config file.
>
> Now if a user wants to add more RSS feeds to index, do I
> have to
> programatically instruct Solr to update the config file? Is
> there a HTTP
> POST or GET I can send to update the data-config file?
AFAIK there is no such thing to edit data-config file.
However you can pass an argument when triggering import, if thats helps.
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
Also you can save your rss url in a db, use multiple data sources. You only update the relevant table.