You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by LoneEagle70 <av...@e-djuster.com> on 2007/10/17 22:22:57 UTC

Evaluating Nutch - Some questions

Hi,

We want to know if Nutch could be used for our project:

1) While browsing Some sites requires the user to provide information such
as 'Country, Zip Code, Language'.
How should this information be handle ?

2) Dynamic links through javascript or form submit:
We need site specific rules to build the list of subsequent pages that
should be visited from a given page.

For example, many sites have an option list which should be selected prior
to moving to the next page.
Each option in the list goes to a different page.

On such a site, the rule would be: Subsequent pages are obtained by looping
though option field "z"
and building url=urlprefix + <value of z> + urlsuffix

How should this be handle ?

3) Once we have a page, how can we extract specific information?
If an element of interest is an image file, How can we download the image
file ?

4) We want to store the information gathered into our own PostgreSQL
database.
Do we need the Nutch database, can it be disabled ?

If it's needed to control the urls walkthrough, can it be setup not to save
pages content?

Can we disable the indexing step ?
-- 
View this message in context: http://www.nabble.com/Evaluating-Nutch---Some-questions-tf4643083.html#a13262171
Sent from the Nutch - User mailing list archive at Nabble.com.