You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2015/01/29 17:35:36 UTC

[jira] [Assigned] (CONNECTORS-1155) Web connector should not be sending the port number in request header field Host

     [ https://issues.apache.org/jira/browse/CONNECTORS-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright reassigned CONNECTORS-1155:
---------------------------------------

    Assignee: Karl Wright

> Web connector should not be sending the port number in request header field Host
> --------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1155
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1155
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 1.7.2
>            Reporter: Denis Beck
>            Assignee: Karl Wright
>
> The web connector sends the port number in the request header field Host (e.g. Host: www.apache.org:443). This causes redirect rules for the host name to fail. The port number should not be part of the Host header.
> On the other hand RFC 2616 section 14.23 (http://tools.ietf.org/html/rfc2616#section-14.23) says “The Host request-header field specifies the Internet host and port number of the resource being requested [...]”.
> I encountered this issue while trying to crawl a customer’s website. The very first call to the seed URL caused a redirect which contained a link to the original URL itself and the job ended without fetching anything. The Simple History showed Status 301, that's it. Maybe the web connector does not follow the link in the redirect correctly?
> The redirect couldn't be triggered otherwise: I tried a browser and cURL. ManifoldCF's web connector was the only one sending the port number with the Host header and wasn't able to crawl the website due to this behavior.
> This issue could be worked around collaborating with the contractor which hosted the customer's website. He added an exception for these requests. But in general, I think this should be fixed, as such collaboration is not always possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)