You are viewing a plain text version of this content. The canonical link for it is here.

Posted to agent@nutch.apache.org by Jason Todd Slack-Moehrle <ma...@MailNewsRSS.com> on 2009/04/20 01:24:23 UTC

Nutch Crawling Questions

Hi All,

I have some starting Nutch questions that I am hoping to gain insight  
about.

I want to start at Dmoz.org and follow links for entertainment (like  
concerts, art gallery events, etc) and examine the link to see if I  
should get data back about it and from it.

My questions:

1. Can Nutch start at a given URL and examine every link (based upon  
my criteria)? (obviously I can write Case or If/Else or While to do  
this)

2. If I find a link that has certain keywords that I find of interest,  
can I hit that link of interest and get information from that page?

3. How do I get the information about the link of interest and its  
content of interest into a MySQL database? (I know ColdFusion and  
MySQL and PHP). I think what I am asking is how do I get back to my  
database from a crawler?

4. As I know Nutch is Java, which is fine, I will need Tomcat running  
etc. Are there other java App Servers out there as well for OS X?

5. Does anyone have deployment instructions for OS X?

Am I making any sense?

-Jason