You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by djames <dj...@supinfo.com> on 2007/03/08 14:10:11 UTC
external host link logging
Hello,
I'm working with nutch since 2 month now, and i'm very happy to see that
this project is so powerfull!!!!!
I need to crawl only a set of given website, so i set the parameter
db.ignore.external.links to false and it works perfectly.
But now i need to create a log file with the list of all links parsed or
fetched leading to external host for a human validation and reinjection in
the crawl db.
I don't now how to begin???
Could someone help me please
Thanks a lot
--
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9374136
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: [SOLVED] external host link logging
Posted by djames <dj...@supinfo.com>.
Finally i found the solution, if it interrest someone contacte me
--
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9471697
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: [SOLVED] external host link logging
Posted by djames <dj...@supinfo.com>.
Hello,
Could someone help me please???
thank you
--
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9450252
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: [SOLVED] external host link logging
Posted by djames <dj...@supinfo.com>.
Hello,
I've tried the solution you gave me, but she loges all the links that the
parser find.
In the conf file there is a parameter named db.ignore.external.links, do you
now where is this parameter treated in the code??? i think i juste bave to
add an if condition to log the outlinks in a file.
Thanks a lot
--
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9432387
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: [SOLVED] external host link logging
Posted by djames <dj...@supinfo.com>.
Hi,
For information i run nutch in a clusterof 5 Pc.
When i look in /nutch/logs any files containes external host link but
containes only normal system output....
Sorry if i'm not looking the good logs directory.
--
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9390292
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: [SOLVED] external host link logging
Posted by djames <dj...@supinfo.com>.
Thanks, i'm gonna try that.
I need a log of all externalhost link the fetcher find but not the normal
link.
for exemple if i'm on www.nabble.com web site and contains a link to
www.forecast.com i want to log it but dont log a link to
www.nabble.com/forecast
--
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9390732
Sent from the Nutch - User mailing list archive at Nabble.com.