You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/10/27 15:07:20 UTC

best way to load page components

Hi there,
I remember somehow a comment from Andrzej that one of the http  
protocoll plugins now is able to download external files like java  
script belong to the page itself.
But I was not able to find this comment again, is that comment done  
or do I mix things?
Anyway I'm looking for a way to download external items that belongs  
to a html page, like images, java script files or css files.
Since this require parsing the page anyway I was thinking I could  
create a new fetchlist, fetch the content and than merge them  
together again.. using map reduce.
But on the other hand it would be good to 'reuse' the thread that  
already connected to the host to download the items.

Any comments and ideas?

Thanks!
Stefan