You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Roannel Fernández Hernández (JIRA)" <ji...@apache.org> on 2019/03/10 18:20:00 UTC

[jira] [Commented] (NUTCH-2334) Extension point for schedulers

    [ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789002#comment-16789002 ] 

Roannel Fernández Hernández commented on NUTCH-2334:
----------------------------------------------------

Returning to this issue after a long time (sorry for that), the strategy is as follow:

First of all, to ensure that this is not a difficult and abrupt change for users, we should mark the current properties as deprecated and both the current features as the new ones should live together for a while.

Secondly, a new property is born: voter.strategy (this couldn't be the final name) which indicates the strategy to follow when deciding whether a URL should be fetched or not. For this case the schedulers will act as voters following two mainly strategies: AND and OR. With the first strategy all voters (schedulers) have to say yes (TRUE) to finally indicate the URL has to be fetched. With the second strategy it's enough that one voter (scheduler) say yes. The schedulers can abstain to vote (returning null). In this case the vote is not used for final vote.

In other cases where schedulers enter the game, only the scheduler that is loaded first will be responsible for doing the task. One property is used to indicate the order this plugins must be loaded. The code is [here|https://github.com/r0ann3l/nutch/blob/NUTCH-2334/src/java/org/apache/nutch/crawl/FetchSchedulers.java]

I understand these changes could be a little complicated to understand for final users. That's why I have a second proposition: to make the shedulers pluggable and nothing else. The only thing I see in this case, is that several plugins for the same extension point could be loaded. So, we could use the first loaded plugin (maybe show a warning too) or throw a {{RuntimeException}}.

I need some feedback from you guys for continuing this work. Thanks a lot.

> Extension point for schedulers
> ------------------------------
>
>                 Key: NUTCH-2334
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2334
>             Project: Nutch
>          Issue Type: New Feature
>          Components: generator
>    Affects Versions: 1.12
>            Reporter: Roannel Fernández Hernández
>            Assignee: Roannel Fernández Hernández
>            Priority: Minor
>             Fix For: 1.16
>
>
> With an extension point for schedulers, the users should be able to create new schedulers that meet to their own needs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)