You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom H <to...@limepepper.co.uk> on 2009/04/24 02:53:31 UTC

newbie question about indexing RSS feeds with SOLR

Hi,

I've just downloaded solr and got it working, it seems pretty cool.

I have a project which needs to maintain an index of articles that were
published on the web via rss feed.

Basically I need to watch some rss feeds, and search and index the items
to be searched.

Additionally, I need to run jobs based on particular keywords or events
during parsing.

is this something that I can do with SOLR? are their any related
projects using SOLR that are better suited to indexing specific xml
types like RSS?

I had a look at the project enormo which appears to be a property
lettings and sales listing aggregator. But I can see that they must have
solved some of the problems I am thinking of such as scheduled indexing
of remote resources, and writing a parser to get data fields from some
other sites templates.

Any advice would be welcome...

Many Thanks,

Tom




Re: newbie question about indexing RSS feeds with SOLR

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Just an FYI: I've never tried, but there seems to be RSS feed sample in DIH:

http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476

Koji

Tom H wrote:
> Hi,
>
> I've just downloaded solr and got it working, it seems pretty cool.
>
> I have a project which needs to maintain an index of articles that were
> published on the web via rss feed.
>
> Basically I need to watch some rss feeds, and search and index the items
> to be searched.
>
> Additionally, I need to run jobs based on particular keywords or events
> during parsing.
>
> is this something that I can do with SOLR? are their any related
> projects using SOLR that are better suited to indexing specific xml
> types like RSS?
>
> I had a look at the project enormo which appears to be a property
> lettings and sales listing aggregator. But I can see that they must have
> solved some of the problems I am thinking of such as scheduled indexing
> of remote resources, and writing a parser to get data fields from some
> other sites templates.
>
> Any advice would be welcome...
>
> Many Thanks,
>
> Tom
>
>
>
>
>