You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Berlin Brown <be...@gmail.com> on 2007/07/10 21:30:37 UTC

Database of article URLS for use with nutch, not dmoz

I have provided a set of URLs to crawl, about 40k so far that have
mostly been updated in the last couple of months.  May make for
interesting crawls.  I kind of got tired of using the "dmoz" dumps
because the links were so old.  They are a lot of them of course.
Anyway, if you are interested in internet crawls, you may try it out.

http://botspiritcompany.com/botlist/spring/pipes/rdf_nutch.html

-- 
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies