You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Thibaut (JIRA)" <ji...@apache.org> on 2012/10/29 19:04:12 UTC

[jira] [Created] (HTTPCLIENT-1257) Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls

Thibaut created HTTPCLIENT-1257:
-----------------------------------

             Summary: Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls
                 Key: HTTPCLIENT-1257
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient
    Affects Versions: 4.2.2
            Reporter: Thibaut


I'm trying to fetch:

http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch

Which returns:

2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]" [main]
2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57 GMT[\r][\n]" [main]
2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981 08:52:00 GMT[\r][\n]" [main]
2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
2012-10-29 18:54:29,356 DEBUG http.wire: << "Location: http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]" [main]
2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]" [main]
2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving response: HTTP/1.1 303 See Other [main]
2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012 17:55:57 GMT [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981 08:52:00 GMT [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Location: http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]


Unfortunately I can't get the resolve Url through the following code:

Header locationHeader = response.getFirstHeader("location");
which will return http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch

The header has already been extracted in the wrong content encoding. I will never be able to get the redirect url!

I understand that this is not RFC normalised behavior, but the above url and redirect works fine in all browsers.

Is it possible to access the raw header (byte array) so that I can chose the encoding on my own? This would help a lot. Or a parameter to optionally specify the encoding when fetching a header value.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] [Resolved] (HTTPCLIENT-1257) Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HTTPCLIENT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski resolved HTTPCLIENT-1257.
-------------------------------------------

    Resolution: Invalid

One can force HttpClient to use a non-ASCII character set for protocol elements by using 'http.protocol.element-charset' parameter. 

http://hc.apache.org/httpcomponents-client-ga/tutorial/html/fundamentals.html#d5e338

At any rate this issue is not a bug.

Oleg
                
> Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1257
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.2.2
>            Reporter: Thibaut
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm trying to fetch:
> http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> Which returns:
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981 08:52:00 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Location: http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving response: HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012 17:55:57 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981 08:52:00 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Location: http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
> 2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]
> Unfortunately I can't get the resolve Url through the following code:
> Header locationHeader = response.getFirstHeader("location");
> which will return http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> The header has already been extracted in the wrong content encoding. I will never be able to get the redirect url!
> I understand that this is not RFC normalised behavior, but the above url and redirect works fine in all browsers.
> Is it possible to access the raw header (byte array) so that I can chose the encoding on my own? This would help a lot. Or a parameter to optionally specify the encoding when fetching a header value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] [Commented] (HTTPCLIENT-1257) Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495384#comment-13495384 ] 

Oleg Kalnichevski commented on HTTPCLIENT-1257:
-----------------------------------------------

One can force HttpClient to ignore malformed and unmappable characters using "http.malformed.input.action" and "http.unmappable.input.action" parameters

http://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/params/CoreProtocolPNames.html#HTTP_MALFORMED_INPUT_ACTION

Header is an interface, so you can have a custom implementation of it backed by a byte array instead of CharArrayBuffer used internally by HttpClient.

Oleg
                
> Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1257
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.2.2
>            Reporter: Thibaut
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm trying to fetch:
> http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> Which returns:
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981 08:52:00 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Location: http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving response: HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012 17:55:57 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981 08:52:00 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Location: http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
> 2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]
> Unfortunately I can't get the resolve Url through the following code:
> Header locationHeader = response.getFirstHeader("location");
> which will return http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> The header has already been extracted in the wrong content encoding. I will never be able to get the redirect url!
> I understand that this is not RFC normalised behavior, but the above url and redirect works fine in all browsers.
> Is it possible to access the raw header (byte array) so that I can chose the encoding on my own? This would help a lot. Or a parameter to optionally specify the encoding when fetching a header value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] [Commented] (HTTPCLIENT-1257) Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls

Posted by "Thibaut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486817#comment-13486817 ] 

Thibaut commented on HTTPCLIENT-1257:
-------------------------------------

It also fails when you request the encoded url, which is the one which is transfered in both variants over the wire.

http://handheld.vn/content.php?4052-%C4%90%C3%A1nh-gi%C3%A1-m%C3%A1y-t%C3%ADnh-b%E1%BA%A3ng-Kindle-Fire-HD-7-inch

2012-10-30 12:26:20,859 DEBUG http.wire: >> "GET /content.php?4052-%C4%90%C3%A1nh-gi%C3%A1-m%C3%A1y-t%C3%ADnh-b%E1%BA%A3ng-Kindle-Fire-HD-7-inch HTTP/1.1[\r][\n]" [main]
2012-10-30 12:26:20,860 DEBUG http.wire: >> "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[\r][\n]" [main]
2012-10-30 12:26:20,860 DEBUG http.wire: >> "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8[\r][\n]" [main]
2012-10-30 12:26:20,860 DEBUG http.wire: >> "Accept-Language: en-gb,en;q=0.5[\r][\n]" [main]
....



                
> Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1257
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.2.2
>            Reporter: Thibaut
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm trying to fetch:
> http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> Which returns:
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981 08:52:00 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Location: http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving response: HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012 17:55:57 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981 08:52:00 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Location: http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
> 2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]
> Unfortunately I can't get the resolve Url through the following code:
> Header locationHeader = response.getFirstHeader("location");
> which will return http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> The header has already been extracted in the wrong content encoding. I will never be able to get the redirect url!
> I understand that this is not RFC normalised behavior, but the above url and redirect works fine in all browsers.
> Is it possible to access the raw header (byte array) so that I can chose the encoding on my own? This would help a lot. Or a parameter to optionally specify the encoding when fetching a header value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] [Commented] (HTTPCLIENT-1257) Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls

Posted by "Thibaut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495483#comment-13495483 ] 

Thibaut commented on HTTPCLIENT-1257:
-------------------------------------

Thanks a lot! 
                
> Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1257
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.2.2
>            Reporter: Thibaut
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm trying to fetch:
> http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> Which returns:
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981 08:52:00 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Location: http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving response: HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012 17:55:57 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981 08:52:00 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Location: http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
> 2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]
> Unfortunately I can't get the resolve Url through the following code:
> Header locationHeader = response.getFirstHeader("location");
> which will return http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> The header has already been extracted in the wrong content encoding. I will never be able to get the redirect url!
> I understand that this is not RFC normalised behavior, but the above url and redirect works fine in all browsers.
> Is it possible to access the raw header (byte array) so that I can chose the encoding on my own? This would help a lot. Or a parameter to optionally specify the encoding when fetching a header value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] [Commented] (HTTPCLIENT-1257) Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls

Posted by "Jon Moore (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486784#comment-13486784 ] 

Jon Moore commented on HTTPCLIENT-1257:
---------------------------------------

Can you add the wire log section for the GET request HttpClient sends initially to receive the 303 response? I'm curious to see what the wire log has for the path segment of the initial request.

                
> Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1257
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.2.2
>            Reporter: Thibaut
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm trying to fetch:
> http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> Which returns:
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981 08:52:00 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Location: http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving response: HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012 17:55:57 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981 08:52:00 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Location: http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
> 2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]
> Unfortunately I can't get the resolve Url through the following code:
> Header locationHeader = response.getFirstHeader("location");
> which will return http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> The header has already been extracted in the wrong content encoding. I will never be able to get the redirect url!
> I understand that this is not RFC normalised behavior, but the above url and redirect works fine in all browsers.
> Is it possible to access the raw header (byte array) so that I can chose the encoding on my own? This would help a lot. Or a parameter to optionally specify the encoding when fetching a header value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] [Commented] (HTTPCLIENT-1257) Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls

Posted by "Thibaut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495352#comment-13495352 ] 

Thibaut commented on HTTPCLIENT-1257:
-------------------------------------

If I force httpclient to use a utf-8 encoding, I get another exception on the following url:
http://forum.tour-magazin.de/showthread.php?t=226439

The redirect works fine with US-ASCII parsing.


2012-11-12 16:46:01,216 DEBUG http.wire: << "HTTP/1.1 301 Moved Permanently[\r][\n]" [main]
2012-11-12 16:46:01,217 DEBUG http.wire: << "Date: Mon, 12 Nov 2012 15:45:51 GMT[\r][\n]" [main]
2012-11-12 16:46:01,217 DEBUG http.wire: << "Server: Apache/2.2.12 (Ubuntu)[\r][\n]" [main]
2012-11-12 16:46:01,217 DEBUG http.wire: << "X-Powered-By: PHP/5.2.10-2ubuntu6.10[\r][\n]" [main]
2012-11-12 16:46:01,217 DEBUG http.wire: << "Cache-Control: no-cache, must-revalidate[\r][\n]" [main]
2012-11-12 16:46:01,217 DEBUG http.wire: << "Expires: 0[\r][\n]" [main]
2012-11-12 16:46:01,218 DEBUG http.wire: << "Set-Cookie: bb_lastvisit=1352735151; expires=Tue, 12-Nov-2013 15:45:51 GMT; path=/[\r][\n]" [main]
2012-11-12 16:46:01,218 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 12-Nov-2013 15:45:51 GMT; path=/[\r][\n]" [main]
2012-11-12 16:46:01,218 DEBUG http.wire: << "Cache-Control: private, post-check=0, pre-check=0, max-age=0[\r][\n]" [main]
2012-11-12 16:46:01,218 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
2012-11-12 16:46:01,219 DEBUG conn.DefaultClientConnection: Connection 0.0.0.0:50130<->212.8.200.231:80 closed [main]
2012-11-12 16:46:01,219 DEBUG client.DefaultHttpClient: Closing the connection. [main]
2012-11-12 16:46:01,219 DEBUG conn.DefaultClientConnection: Connection 0.0.0.0:50130<->212.8.200.231:80 closed [main]
2012-11-12 16:46:01,219 INFO  client.DefaultHttpClient: I/O exception (java.nio.charset.MalformedInputException) caught when processing request: Input length = 1 [main]
2012-11-12 16:46:01,219 DEBUG client.DefaultHttpClient: Input length = 1 [main]
java.nio.charset.MalformedInputException: Input length = 1
	at java.nio.charset.CoderResult.throwException(Unknown Source)
	at org.apache.http.impl.io.AbstractSessionInputBuffer.handleDecodingResult(AbstractSessionInputBuffer.java:384)
	at org.apache.http.impl.io.AbstractSessionInputBuffer.appendDecoded(AbstractSessionInputBuffer.java:371)
	at org.apache.http.impl.io.AbstractSessionInputBuffer.lineFromReadBuffer(AbstractSessionInputBuffer.java:349)
	at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:268)
	at org.apache.http.impl.conn.LoggingSessionInputBuffer.readLine(LoggingSessionInputBuffer.java:115)
	at org.apache.http.impl.io.AbstractMessageParser.parseHeaders(AbstractMessageParser.java:186)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
	at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
	at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
	at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:219)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:712)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
	at com.trendiction.modules.fetch.WebFetch.fetchUrlInternal2(WebFetch.java:821)
	at com.trendiction.modules.fetch.WebFetch.fetchUrlInternal(WebFetch.java:605)
	at com.trendiction.modules.fetch.WebFetch.fetchUrlWithCRTest(WebFetch.java:502)
	at com.trendiction.modules.fetch.WebFetch.fetchUrl(WebFetch.java:474)
	at com.trendiction.modules.fetch.Fetch.fetchWithoutRedirect(Fetch.java:647)
	at com.trendiction.modules.fetch.Fetch.fetch(Fetch.java:239)
	at com.trendiction.modules.fetch.Fetch.fetch(Fetch.java:172)
	at com.trendiction.modules.fetch.Fetch.fetch(Fetch.java:149)
	at com.trendiction.modules.fetch.Fetch.fetch(Fetch.java:143)
	at com.trendiction.modules.fetch.Fetch.fetch(Fetch.java:137)
	at com.trendiction.modules.html.TestHtmlFetching.testTourMagazinUrlFilter(TestHtmlFetching.java:345)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)



So what do you suggest? use two different instances of httpclient and when one fails, try the same request with the other one?

Again I'm for having the headers saved as bytearray and being able to specify the encoding when accessing the header fields.





                
> Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1257
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 4.2.2
>            Reporter: Thibaut
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm trying to fetch:
> http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> Which returns:
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981 08:52:00 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Location: http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving response: HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012 17:55:57 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981 08:52:00 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Location: http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
> 2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]
> Unfortunately I can't get the resolve Url through the following code:
> Header locationHeader = response.getFirstHeader("location");
> which will return http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> The header has already been extracted in the wrong content encoding. I will never be able to get the redirect url!
> I understand that this is not RFC normalised behavior, but the above url and redirect works fine in all browsers.
> Is it possible to access the raw header (byte array) so that I can chose the encoding on my own? This would help a lot. Or a parameter to optionally specify the encoding when fetching a header value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org