You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/12/11 09:04:00 UTC

[jira] [Commented] (NUTCH-2678) Allow for per-host configurable protocol plugin

    [ https://issues.apache.org/jira/browse/NUTCH-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716596#comment-16716596 ] 

Sebastian Nagel commented on NUTCH-2678:
----------------------------------------

Hi [~markus17], good idea to make the selection of the actual protocol implementation configurable per host. What about improving it?
 - having the map of hosts to protocol plugins configurable in the plugin.xml requires to recompile Nutch (at least, for distributed mode). Wouldn't it easier for users when the mapping is defined as usual in {{conf/}}? Could be a text file, each line {{<hostname> <tab> <plugin-name>}}. The PluginFactory gets the Configuration object passed in the constructor.
 - the method findExtension(...) is called for every URL, if there is no host-specific protocol found, even twice. It would be more efficient to cache the results in a map <hostname, cacheId> resp. <protocol, cacheId>.

> Allow for per-host configurable protocol plugin
> -----------------------------------------------
>
>                 Key: NUTCH-2678
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2678
>             Project: Nutch
>          Issue Type: Improvement
>          Components: protocol
>    Affects Versions: 1.15
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Major
>             Fix For: 1.16
>
>         Attachments: NUTCH-2678.patch
>
>
> Introduces new parameter for protocol plugins called host. It takes a comma separated set of host names. Protocols are resolved by hostname first, then by protocol as it is now.
> {code}
>    <extension id="org.apache.nutch.protocol.http"
>               name="HttpProtocol"
>               point="org.apache.nutch.protocol.Protocol">
>       <implementation id="org.apache.nutch.protocol.http.Http"
>                        class="org.apache.nutch.protocol.http.Http">
>          <parameter name="host" value="nutch.apache.org"/>
>          <parameter name="protocolName" value="http,https"/>
>       </implementation>
>    </extension>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)