You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2005/12/09 22:59:08 UTC

[jira] Commented: (NUTCH-135) http header meta data are case insensitive in the real world (e.g. Content-Type or content-type)

    [ http://issues.apache.org/jira/browse/NUTCH-135?page=comments#action_12359961 ] 

Andrzej Bialecki  commented on NUTCH-135:
-----------------------------------------

Since you already are working on this issue, I'd like to ask you to take a look at NUTCH-3, and see if you can solve this too. The problem described there is that if there are several headers with the same name, only the last value is preserved, but in some cases multiple headers make sense (see any of the existing Java models for handling HTTP or RFC822 mail messages - all of them handle multiple values per single key).

> http header meta data are case insensitive in the real world (e.g. Content-Type or content-type)
> ------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-135
>          URL: http://issues.apache.org/jira/browse/NUTCH-135
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.7.1, 0.7
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev, 0.7.2-dev
>  Attachments: contentProperties_patch.txt
>
> As described in issue nutch-133, some webservers return http header meta data not standard conform case insensitive.
> This provides many negative side effects, for example query thet content type from the meta data return null also in case the webserver returns a content type, but the key is not standard conform e.g. lower case. Also this has effects to the pdf parser that queries the content length etc.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira