You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@shindig.apache.org by "Brian Eaton (JIRA)" <ji...@apache.org> on 2008/02/01 23:02:10 UTC

[jira] Created: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

gadgets.io.makeRequest malfunctions on non-ASCII web sites.
-----------------------------------------------------------

                 Key: SHINDIG-46
                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
             Project: Shindig
          Issue Type: Bug
          Components: Gadgets Server - Java
            Reporter: Brian Eaton
            Assignee: John Hjelmstad


See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser

Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.

Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Artemy Tregubenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567286#action_12567286 ] 

Artemy Tregubenko commented on SHINDIG-46:
------------------------------------------

> When you convert a byte array to a string as UTF8

I do not convert byte array to utf-8 string. All strings in java are utf-16, afaik. I say: these bytes are utf-8 encoding, now Java, please use you InputStreamReader powers to convert this raw data in utf-8 encoding to a string in your natural utf16 string.

I don't know why we are arguing here, because my patch doesn't use Utf8InputStream, and still handles BOM — just give it a try. 

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Artemy Tregubenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567194#action_12567194 ] 

Artemy Tregubenko commented on SHINDIG-46:
------------------------------------------

This patch refers to both sides: it tries to detect charset of input by content-type header and also fixes output, as it's broken too. 

You may check this utf8-coded widget: 
http://muz.eu/misc/gadgets/translit.xml
http://localhost:8080/gadgets/ifr?url=http://muz.eu/misc/gadgets/translit.xml
When you open this url in your browser, there would be some cyrillic characters (this proves shindig won't have troubles with input). When you load it via shindig before my patch, these chars will transform to question marks. After applying patch cyrillic is back again.

You may even test it directly: line 
resp.getOutputStream().print("some non-ascii here"); 
will output question marks. 

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Artemy Tregubenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567210#action_12567210 ] 

Artemy Tregubenko commented on SHINDIG-46:
------------------------------------------

I had no idea about fetching binary data, sorry. Is it flash? 

Looks like my patch addressed all three points, failing only for handling raw bytes.

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Kevin Brown (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567228#action_12567228 ] 

Kevin Brown commented on SHINDIG-46:
------------------------------------

When you convert a byte array to a string as UTF8, it retains the BOM as a bunch of unprintable characters (they're valid UTF-8, but they're NOT valid XML; see the XML RFC for details). Every encoding of unicode has a different BOM. You're correct about the BOM for UTF-16, but UTF-16 isn't what I'm talking about. Those 3 bytes need to be removed if they are present in order to make XML parsing work correctly. You must remove them *before* converting the byte array to a string.

More details here:

http://en.wikipedia.org/wiki/UTF8#Windows

This was a major bug in the early development of Shindig that caused many problems.

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Artemy Tregubenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567219#action_12567219 ] 

Artemy Tregubenko commented on SHINDIG-46:
------------------------------------------

well, it does handle BOM:
+    if (out.length() >= 0 && out.codePointAt(0) == 0xFEFF) {
+        out.delete(0, 1);
+    }

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Kevin Brown (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Brown closed SHINDIG-46.
------------------------------

    Resolution: Fixed

Committed in rev 620115

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Artemy Tregubenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567226#action_12567226 ] 

Artemy Tregubenko commented on SHINDIG-46:
------------------------------------------

I was told that Java internal encoding is UTF-16 (http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html), and BOM for UTF-16 is 0xFEFF (http://en.wikipedia.org/wiki/Byte_order_mark). 

I do out.codePointAt(0) after I converted string to internal encoding. 

This works at least for particular BOM-marked widget, and it doesn't break some other widgets without BOM. 

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Kevin Brown (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567202#action_12567202 ] 

Kevin Brown commented on SHINDIG-46:
------------------------------------

The use of byte[] instead of String in RemoteContentFetcher's interface is intentional -- this is used to fetch binary data as well as text (see ProxyHandler.fetch)

java.io.StringReader does not appear to properly strip the BOM on utf8 files that cause the xml parsers to choke (which is what Utf8InputStream addresses). This is a common problem with gadgets authored on windows. If someone knows a cleaner way to do this I'm all for it.

Really, converting from encoding -> utf8 needs to happen at the point where we convert from a byte stream to text. This means:

- In the XML processing routines we should pass a String instead of a byte[], and we must require that the strings be UTF-8 with no BOM.
- RemoteContent should have support for detecting its own character encoding from the http headers and returning the content body as a string in that character set as well as the raw bytes. If we can't convert from the claimed encoding to UTF-8, we fail the request. We'll use these strings to pass to the XML parsing routines.
- GadgetRenderingServlet, JsonRpcServlet, and ProxyServlet and should explicitly set the utf-8 output encoding.



> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Kevin Brown (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567183#action_12567183 ] 

Kevin Brown commented on SHINDIG-46:
------------------------------------

The issue here isn't what we're outputting, it's how we're handling http responses from remote servers.

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Kevin Brown (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567217#action_12567217 ] 

Kevin Brown commented on SHINDIG-46:
------------------------------------

Anything that isn't text -- images, flash, or really anything where an XHR won't suffice.

The patch is pretty close to what we need, but it does still need to handle the BOM issue as well. 

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Artemy Tregubenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567219#action_12567219 ] 

arty.name edited comment on SHINDIG-46 at 2/8/08 1:55 PM:
------------------------------------------------------------------

well, it does handle BOM:
+    if (out.length() >= 0 && out.codePointAt(0) == 0xFEFF) {
+        out.delete(0, 1);
+    }

given example (http://muz.eu/misc/gadgets/translit.xml) has BOM 

      was (Author: arty.name):
    well, it does handle BOM:
+    if (out.length() >= 0 && out.codePointAt(0) == 0xFEFF) {
+        out.delete(0, 1);
+    }
  
> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Artemy Tregubenko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Artemy Tregubenko updated SHINDIG-46:
-------------------------------------

    Attachment: patch

I've got something working. 

Probably most important part is replacing 
resp.getOutputStream().print(markup.toString());
with 
resp.getWriter().print(markup.toString());
and setting utf8 charset in servlet.

Most of other changes are replacements of byte arrays with strings.

I'm new to Java, so my code most probably needs review.

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Kevin Brown (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567223#action_12567223 ] 

Kevin Brown commented on SHINDIG-46:
------------------------------------

That is not the BOM for UTF8.

The BOM for UTF8 is 0xEF 0xBB 0xBF. It's allowed according to the RFC, but it's not required and Java's XML parsers choke on it.

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-46) gadgets.io.makeRequest malfunctions on non-ASCII web sites.

Posted by "Kevin Brown (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567293#action_12567293 ] 

Kevin Brown commented on SHINDIG-46:
------------------------------------

My apologies; I believe we're talking past each other. The problem isn't the way that you're handling BOM in your patch now -- it's that you can't store the data as a string at all, and instead various consumers must handle the transformation.

The simplest solution is probably to detect the charset when the RemoteContent object is created (just inspecting headers is OK) and store it with the RemotContent, and expose a getResponseAsString() method that will convert on the fly (storing both is a waste of memory since most content is only ever read once anyway).

I'll go ahead and patch this simpler change in now.

> gadgets.io.makeRequest malfunctions on non-ASCII web sites.
> -----------------------------------------------------------
>
>                 Key: SHINDIG-46
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-46
>             Project: Shindig
>          Issue Type: Bug
>          Components: Gadgets Server - Java
>            Reporter: Brian Eaton
>            Assignee: John Hjelmstad
>         Attachments: patch
>
>
> See this thread for background: http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200802.mbox/browser
> Short term, we should change the HTTP proxy code to always use UTF-8 as the character set for converting remote content bytes to strings before returning them to clients.  We should do this ASAP to prevent anyone from becoming dependent on the current undefined behavior.
> Long term we might want to add some kind of character set detection, probably via the HTTP content-type header.  IE style charset content sniffing would probably not be a good idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.