You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@shindig.apache.org by "Jacky Wang (JIRA)" <ji...@apache.org> on 2008/08/01 13:46:31 UTC

[jira] Created: (SHINDIG-487) Fix encoding bugs

Fix encoding bugs
-----------------

                 Key: SHINDIG-487
                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
             Project: Shindig
          Issue Type: Bug
          Components: RESTful API (Java)
         Environment: Windows XP, China as region, Chinese as prefer language
            Reporter: Jacky Wang


For the input requests, we can't pre-determine its character encoding, thus we need to parse the input according to HttpServletRequest.
If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as default value.

For all outbound response, since they're under Shindig's control, we'll set their CharacterEncoding to "UTF-8".


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (SHINDIG-487) Fix encoding bugs

Posted by "Jacky Wang (王超)" <ch...@google.com>.
Thank you Louis!  I've updated my patch.

On Sat, Aug 2, 2008 at 12:22 AM, Louis Ryan <lr...@google.com> wrote:

> You can detect the encoding using the icu4j utility. This is something
> already done in the gadgets server. See oHttpResponse.detectEncoding
>
> On Fri, Aug 1, 2008 at 4:50 AM, Jacky Wang (JIRA) <ji...@apache.org> wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> >
> > Jacky Wang updated SHINDIG-487:
> > -------------------------------
> >
> >    Attachment: fix-socialapi-post-unicode-bug.patch
> >
> > Parses the request according to its charset.
> > Sets response's charset to "UTF-8".
> >
> >
> > > Fix encoding bugs
> > > -----------------
> > >
> > >                 Key: SHINDIG-487
> > >                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
> > >             Project: Shindig
> > >          Issue Type: Bug
> > >          Components: RESTful API (Java)
> > >         Environment: Windows XP, China as region, Chinese as prefer
> > language
> > >            Reporter: Jacky Wang
> > >         Attachments: fix-socialapi-post-unicode-bug.patch
> > >
> > >   Original Estimate: 0.25h
> > >  Remaining Estimate: 0.25h
> > >
> > > For the input requests, we can't pre-determine its character encoding,
> > thus we need to parse the input according to HttpServletRequest.
> > > If the HttpServletRequest doesn't have this header, we'll use "UTF-8"
> as
> > default value.
> > > For all outbound response, since they're under Shindig's control, we'll
> > set their CharacterEncoding to "UTF-8".
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>



-- 
Best Regards,

Jacky Wang
+86-10-6250-3316
+86-1381-0018-677
chaowang@google.com

Re: [jira] Updated: (SHINDIG-487) Fix encoding bugs

Posted by Louis Ryan <lr...@google.com>.
You can detect the encoding using the icu4j utility. This is something
already done in the gadgets server. See oHttpResponse.detectEncoding

On Fri, Aug 1, 2008 at 4:50 AM, Jacky Wang (JIRA) <ji...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Jacky Wang updated SHINDIG-487:
> -------------------------------
>
>    Attachment: fix-socialapi-post-unicode-bug.patch
>
> Parses the request according to its charset.
> Sets response's charset to "UTF-8".
>
>
> > Fix encoding bugs
> > -----------------
> >
> >                 Key: SHINDIG-487
> >                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
> >             Project: Shindig
> >          Issue Type: Bug
> >          Components: RESTful API (Java)
> >         Environment: Windows XP, China as region, Chinese as prefer
> language
> >            Reporter: Jacky Wang
> >         Attachments: fix-socialapi-post-unicode-bug.patch
> >
> >   Original Estimate: 0.25h
> >  Remaining Estimate: 0.25h
> >
> > For the input requests, we can't pre-determine its character encoding,
> thus we need to parse the input according to HttpServletRequest.
> > If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as
> default value.
> > For all outbound response, since they're under Shindig's control, we'll
> set their CharacterEncoding to "UTF-8".
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (SHINDIG-487) Fix encoding bugs

Posted by "Jacky Wang (王超)" <ch...@google.com>.
I think it might not be a performance impact to the social data server,
since it will only apply the detection process to those requests, which
doesn't indicate its char encoding.

On Sat, Sep 6, 2008 at 7:02 AM, Kevin Brown (JIRA) <ji...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628788#action_12628788]
>
> Kevin Brown commented on SHINDIG-487:
> -------------------------------------
>
> We actually do perform encoding detection if there's no other way to figure
> out the encoding, but that's because we have to support wonky legacy HTTP
> servers. There's no valid reason why the social data code should be forced
> to do this, however. Callers that don't use utf-8 can just tell us what
> encoding they are using.
>
> To reiterate, I'm strongly against doing the encoding detection here.
>
> > Fix encoding bugs
> > -----------------
> >
> >                 Key: SHINDIG-487
> >                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
> >             Project: Shindig
> >          Issue Type: Bug
> >          Components: RESTful API (Java)
> >         Environment: Windows XP, China as region, Chinese as prefer
> language
> >            Reporter: Jacky Wang
> >         Attachments: fix-java-input-encoding-detection-bug.patch,
> fix-socialapi-post-unicode-bug.patch, fix-socialapi-post-unicode-bug.patch
> >
> >   Original Estimate: 0.25h
> >  Remaining Estimate: 0.25h
> >
> > For the input requests, we can't pre-determine its character encoding,
> thus we need to parse the input according to HttpServletRequest.
> > If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as
> default value.
> > For all outbound response, since they're under Shindig's control, we'll
> set their CharacterEncoding to "UTF-8".
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
Best Regards,

Jacky Wang
+86-10-6250-3316
+86-1381-0018-677
chaowang@google.com

[jira] Commented: (SHINDIG-487) Fix encoding bugs

Posted by "Cassie Doll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620370#action_12620370 ] 

Cassie Doll commented on SHINDIG-487:
-------------------------------------

This looks great but it would be even better to have a test for it. 
Do you have any tests?

> Fix encoding bugs
> -----------------
>
>                 Key: SHINDIG-487
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
>             Project: Shindig
>          Issue Type: Bug
>          Components: RESTful API (Java)
>         Environment: Windows XP, China as region, Chinese as prefer language
>            Reporter: Jacky Wang
>         Attachments: fix-socialapi-post-unicode-bug.patch, fix-socialapi-post-unicode-bug.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> For the input requests, we can't pre-determine its character encoding, thus we need to parse the input according to HttpServletRequest.
> If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as default value.
> For all outbound response, since they're under Shindig's control, we'll set their CharacterEncoding to "UTF-8".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SHINDIG-487) Fix encoding bugs

Posted by "Jacky Wang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacky Wang updated SHINDIG-487:
-------------------------------

    Attachment: fix-socialapi-post-unicode-bug.patch

Parses the request according to its charset.
Sets response's charset to "UTF-8".


> Fix encoding bugs
> -----------------
>
>                 Key: SHINDIG-487
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
>             Project: Shindig
>          Issue Type: Bug
>          Components: RESTful API (Java)
>         Environment: Windows XP, China as region, Chinese as prefer language
>            Reporter: Jacky Wang
>         Attachments: fix-socialapi-post-unicode-bug.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> For the input requests, we can't pre-determine its character encoding, thus we need to parse the input according to HttpServletRequest.
> If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as default value.
> For all outbound response, since they're under Shindig's control, we'll set their CharacterEncoding to "UTF-8".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SHINDIG-487) Fix encoding bugs

Posted by "Cassie Doll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cassie Doll resolved SHINDIG-487.
---------------------------------

    Resolution: Fixed
      Assignee: Cassie Doll

> Fix encoding bugs
> -----------------
>
>                 Key: SHINDIG-487
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
>             Project: Shindig
>          Issue Type: Bug
>          Components: RESTful API (Java)
>         Environment: Windows XP, China as region, Chinese as prefer language
>            Reporter: Jacky Wang
>            Assignee: Cassie Doll
>         Attachments: fix-java-input-encoding-detection-bug.patch, fix-socialapi-post-unicode-bug.patch, fix-socialapi-post-unicode-bug.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> For the input requests, we can't pre-determine its character encoding, thus we need to parse the input according to HttpServletRequest.
> If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as default value.
> For all outbound response, since they're under Shindig's control, we'll set their CharacterEncoding to "UTF-8".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SHINDIG-487) Fix encoding bugs

Posted by "Jacky Wang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacky Wang updated SHINDIG-487:
-------------------------------

    Attachment: fix-java-input-encoding-detection-bug.patch

updated patch, test added and tested.

> Fix encoding bugs
> -----------------
>
>                 Key: SHINDIG-487
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
>             Project: Shindig
>          Issue Type: Bug
>          Components: RESTful API (Java)
>         Environment: Windows XP, China as region, Chinese as prefer language
>            Reporter: Jacky Wang
>         Attachments: fix-java-input-encoding-detection-bug.patch, fix-socialapi-post-unicode-bug.patch, fix-socialapi-post-unicode-bug.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> For the input requests, we can't pre-determine its character encoding, thus we need to parse the input according to HttpServletRequest.
> If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as default value.
> For all outbound response, since they're under Shindig's control, we'll set their CharacterEncoding to "UTF-8".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-487) Fix encoding bugs

Posted by "Cassie Doll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628782#action_12628782 ] 

Cassie Doll commented on SHINDIG-487:
-------------------------------------

After trying to get this patch to work for awhile I realized that I think this is doing something we don't want to do. Trying to parse out the encoding proves to be really slow and the gadget rendering code decided not to include it for performance reasons. Instead, they only support non-utf8 encoding if it is set explicitly. 

Because this is what the gadget rendering code does I think the social side should probably do the same. I am going to check in this patch without using the icu4j library.

If we want to discuss adding the library detection in we should probably start a mail thread on list to discuss pros and cons. It should be easy to add in if everyone decides it is a good thing.

> Fix encoding bugs
> -----------------
>
>                 Key: SHINDIG-487
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
>             Project: Shindig
>          Issue Type: Bug
>          Components: RESTful API (Java)
>         Environment: Windows XP, China as region, Chinese as prefer language
>            Reporter: Jacky Wang
>         Attachments: fix-java-input-encoding-detection-bug.patch, fix-socialapi-post-unicode-bug.patch, fix-socialapi-post-unicode-bug.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> For the input requests, we can't pre-determine its character encoding, thus we need to parse the input according to HttpServletRequest.
> If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as default value.
> For all outbound response, since they're under Shindig's control, we'll set their CharacterEncoding to "UTF-8".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SHINDIG-487) Fix encoding bugs

Posted by "Kevin Brown (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628788#action_12628788 ] 

Kevin Brown commented on SHINDIG-487:
-------------------------------------

We actually do perform encoding detection if there's no other way to figure out the encoding, but that's because we have to support wonky legacy HTTP servers. There's no valid reason why the social data code should be forced to do this, however. Callers that don't use utf-8 can just tell us what encoding they are using.

To reiterate, I'm strongly against doing the encoding detection here.

> Fix encoding bugs
> -----------------
>
>                 Key: SHINDIG-487
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
>             Project: Shindig
>          Issue Type: Bug
>          Components: RESTful API (Java)
>         Environment: Windows XP, China as region, Chinese as prefer language
>            Reporter: Jacky Wang
>         Attachments: fix-java-input-encoding-detection-bug.patch, fix-socialapi-post-unicode-bug.patch, fix-socialapi-post-unicode-bug.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> For the input requests, we can't pre-determine its character encoding, thus we need to parse the input according to HttpServletRequest.
> If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as default value.
> For all outbound response, since they're under Shindig's control, we'll set their CharacterEncoding to "UTF-8".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SHINDIG-487) Fix encoding bugs

Posted by "Jacky Wang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SHINDIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacky Wang updated SHINDIG-487:
-------------------------------

    Attachment: fix-socialapi-post-unicode-bug.patch

Thank you Louis!

I updated the patch into following manner:
1. if request's charset is set, all the following processing will use this to decode the inputstream.
2. if request's charset is not set, using icu4j's charset detection routine to set it at the very beginning.  Therefore all the following processing will use this charset.

3. all response's charset is "UTF-8".

> Fix encoding bugs
> -----------------
>
>                 Key: SHINDIG-487
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-487
>             Project: Shindig
>          Issue Type: Bug
>          Components: RESTful API (Java)
>         Environment: Windows XP, China as region, Chinese as prefer language
>            Reporter: Jacky Wang
>         Attachments: fix-socialapi-post-unicode-bug.patch, fix-socialapi-post-unicode-bug.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> For the input requests, we can't pre-determine its character encoding, thus we need to parse the input according to HttpServletRequest.
> If the HttpServletRequest doesn't have this header, we'll use "UTF-8" as default value.
> For all outbound response, since they're under Shindig's control, we'll set their CharacterEncoding to "UTF-8".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.