You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/10/27 15:07:20 UTC
best way to load page components
Hi there,
I remember somehow a comment from Andrzej that one of the http
protocoll plugins now is able to download external files like java
script belong to the page itself.
But I was not able to find this comment again, is that comment done
or do I mix things?
Anyway I'm looking for a way to download external items that belongs
to a html page, like images, java script files or css files.
Since this require parsing the page anyway I was thinking I could
create a new fetchlist, fetch the content and than merge them
together again.. using map reduce.
But on the other hand it would be good to 'reuse' the thread that
already connected to the host to download the items.
Any comments and ideas?
Thanks!
Stefan