You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by djames <dj...@supinfo.com> on 2007/03/08 14:10:11 UTC

external host link logging

Hello,

I'm working with nutch since 2 month now, and i'm very happy to see that
this project is so powerfull!!!!!

I need to crawl only a set of given website, so i set the parameter
db.ignore.external.links to false and it works perfectly.
But now i need to create a log file with the list of all links parsed or
fetched leading to external host for a human validation and reinjection in
the crawl db.
I don't now how to begin???

Could someone help me please 

Thanks a lot
-- 
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9374136
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: [SOLVED] external host link logging

Posted by djames <dj...@supinfo.com>.
Finally i found the solution, if it interrest someone contacte me


-- 
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9471697
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: [SOLVED] external host link logging

Posted by djames <dj...@supinfo.com>.
Hello,

Could someone help me please???

thank you
-- 
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9450252
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: [SOLVED] external host link logging

Posted by djames <dj...@supinfo.com>.
Hello,

I've tried the solution you gave me, but she loges all the links that the
parser find.
In the conf file there is a parameter named db.ignore.external.links, do you
now where is this parameter treated in the code??? i think i juste bave to
add an if condition to log the outlinks in a file.

Thanks a lot
-- 
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9432387
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: [SOLVED] external host link logging

Posted by djames <dj...@supinfo.com>.
Hi,

For information i run nutch in a clusterof 5 Pc.

When i look in /nutch/logs any files containes external host link but
containes only normal system output....
Sorry if i'm not looking the good logs directory.


-- 
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9390292
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: [SOLVED] external host link logging

Posted by djames <dj...@supinfo.com>.
Thanks, i'm gonna try that.

I need a log of all externalhost link the fetcher find but not the normal
link.
for exemple if i'm on www.nabble.com web site and contains a link to
www.forecast.com i want to log it but dont log a link to
www.nabble.com/forecast
-- 
View this message in context: http://www.nabble.com/external-host-link-logging-tf3369106.html#a9390732
Sent from the Nutch - User mailing list archive at Nabble.com.