You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by pragya <pr...@gmail.com> on 2012/03/27 11:35:00 UTC

Re: Crawling blogs, feeds & comments

I also want to crawl rss feeds in nutch..
i have heard about a 'feed plugin'..
if anyone know this please let me know how it works??
thank you

--
View this message in context: http://lucene.472066.n3.nabble.com/Crawling-blogs-feeds-comments-tp618324p3860817.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Crawling blogs, feeds & comments

Posted by Lewis John Mcgibbney <le...@gmail.com>.
There is actually some nice Javadoc documentation within the FeedParser [1]
and FeedIndexingFilter [2] if you look there. Also have a look at the text
suite for this plugin, its pretty comprehensive. You can use it from the
command line for ease of use.

hth

[1]
http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/feed/src/java/org/apache/nutch/parse/feed/FeedParser.java
[2]
http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/feed/src/java/org/apache/nutch/indexer/feed/FeedIndexingFilter.java

On Tue, Mar 27, 2012 at 10:35 AM, pragya <pr...@gmail.com> wrote:

> I also want to crawl rss feeds in nutch..
> i have heard about a 'feed plugin'..
> if anyone know this please let me know how it works??
> thank you
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Crawling-blogs-feeds-comments-tp618324p3860817.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*