You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2016/12/13 10:03:58 UTC

[jira] [Commented] (NUTCH-2338) URLNormalizerChecker to run as TCP Telnet service

    [ https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744738#comment-15744738 ] 

Sebastian Nagel commented on NUTCH-2338:
----------------------------------------

Hi Markus,
thanks! See the comments on NUTCH-2320 which apply to this patch as well. Two further points:

- With a normalizer given the telnet arguments are ignored and input is read from stdin:
{noformat}
% nutch org.apache.nutch.net.URLNormalizerChecker -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer -listen 1234 -keepClientCnxOpen
Checking URLNormalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
{noformat}
(other terminal)
{noformat}
% telnet localhost 1234
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
{noformat}
- If an invalid URL is passed via telnet, no exception is shown and nothing is returned. That's probably better than exiting with an error (the behavior when URLNormalizerChecker is reading from stdin), but it may make it difficult to localize external (network) problems.


> URLNormalizerChecker to run as TCP Telnet service
> -------------------------------------------------
>
>                 Key: NUTCH-2338
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2338
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.12
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.13
>
>         Attachments: NUTCH-2338.patch
>
>
> Similar to NUTCH-2320, but then for normalizer checker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)