You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2014/01/23 14:07:37 UTC

[jira] [Created] (NUTCH-1711) Normalizer does not encode exclamation mark

Markus Jelsma created NUTCH-1711:
------------------------------------

             Summary: Normalizer does not encode exclamation mark
                 Key: NUTCH-1711
                 URL: https://issues.apache.org/jira/browse/NUTCH-1711
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.7
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma
             Fix For: 1.8


{code}
$ bin/nutch org.apache.nutch.net.URLNormalizerChecker
Checking combination of all URLNormalizers available
http://nutch.apache.org/bla!
http://nutch.apache.org/bla!
{code}

I never noticed that many URL encoders do not encode the exclamation mark until just now. SolrCloud uses the character to delimit the composite ID in SolrCloud, if you end with the exclamation mark, you will get an error!

Any thoughts on this?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)