You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by caezar <ca...@gmail.com> on 2009/06/18 15:27:45 UTC

Re: Plugins: when to perform web service requests, on fetch or on index?

Hi,

Thank you for the response. Parsed data is not used in calls. Only page URL.
So performance will be better if perform this requests on parsing stage? 

Stefan Dlugolinsky wrote:
> 
> Hello,
> 
> I don't know how v 1.0 differs from v 0.9, but in v 0.9, I would do
> those service requests in the stage of indexation (extension point
> IndexingFilter), where you have several data prepared from previous
> stage (by parsers, etc.), so you can use this data in the requests.
> But it depends on what you exactly want, whether you want to use
> parsed data in the requests. If not, you can call webservice requests
> earlier from parsing stage (extension point Parse).
> 
> Here is something about core extension points:
> http://wiki.apache.org/nutch/AboutPlugins
> 
> Steve
> 
> 

-- 
View this message in context: http://www.nabble.com/Plugins%3A-when-to-perform-web-service-requests%2C-on-fetch-or-on-index--tp24089858p24092973.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Plugins: when to perform web service requests, on fetch or on index?

Posted by Stefan Dlugolinsky <s....@gmail.com>.
Hi,

well, I would say that indexing stage is better than parsing, because in
parsing stage there can be many parsing filters, which need to be execuded
and they need some system resources (there are several parallel threads
running), but generaly, there might not be any difference in performance
according to calling stage. Also there can be more indexing filters, which
also need some system resources. I would try both variants, measure
performance on some subset of documents, compare the results and choose
better. In addition of raising the performance, I would try to cache
webservice requests localy, it can save something on repeating calls.

Steve

2009/6/18 caezar <ca...@gmail.com>

>
> Hi,
>
> Thank you for the response. Parsed data is not used in calls. Only page
> URL.
> So performance will be better if perform this requests on parsing stage?
>
>