You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Lewis John McGibbney (Created) (JIRA)" <ji...@apache.org> on 2011/10/12 21:15:11 UTC

[jira] [Created] (ANY23-12) character are wrongly encoded in rdfxml output

character are wrongly encoded in rdfxml output 
-----------------------------------------------

                 Key: ANY23-12
                 URL: https://issues.apache.org/jira/browse/ANY23-12
             Project: Apache Any23
          Issue Type: Bug
            Reporter: Lewis John McGibbney
         Attachments: Soldering_iron_test.rdf

What steps will reproduce the problem?
1. open file Soldering_iron_test.rdf in your browser see that all characters are displayed correctly
espacially look for all rdfs:label  in different languages  
2. go to any23.org
3. copy the file content into content form 
4. set output to rdfxml 

What is the expected output? What do you see instead?
In the output rdfxml rdfs:label's are wrongly encoded   

What version of the product are you using?


Please provide any additional information below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ANY23-12) character are wrongly encoded in rdfxml output

Posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ANY23-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated ANY23-12:
--------------------------------------

    Attachment: Soldering_iron_test.rdf
    
> character are wrongly encoded in rdfxml output 
> -----------------------------------------------
>
>                 Key: ANY23-12
>                 URL: https://issues.apache.org/jira/browse/ANY23-12
>             Project: Apache Any23
>          Issue Type: Bug
>            Reporter: Lewis John McGibbney
>         Attachments: Soldering_iron_test.rdf
>
>
> What steps will reproduce the problem?
> 1. open file Soldering_iron_test.rdf in your browser see that all characters are displayed correctly
> espacially look for all rdfs:label  in different languages  
> 2. go to any23.org
> 3. copy the file content into content form 
> 4. set output to rdfxml 
> What is the expected output? What do you see instead?
> In the output rdfxml rdfs:label's are wrongly encoded   
> What version of the product are you using?
> Please provide any additional information below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ANY23-12) character are wrongly encoded in rdfxml output

Posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ANY23-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126062#comment-13126062 ] 

Lewis John McGibbney commented on ANY23-12:
-------------------------------------------

Brief exploration:

1. The attached file is indeed utf-8 encoded and correctly marked as such in the header

2. On the command line, parsing and re-serializing it with "any23 -f rdfxml" produces a correctly utf-8 encoded file, no encoding problems

3. I uploaded a copy of the file here: http://richard.cyganiak.de/2011/test/Soldering_iron_test.rdf

4. Parsing and re-serializing this uploaded file with any23.org produces a correctly utf-8 encoded response, no encoding problems:
http://any23.org/any23/?format=rdfxml&uri=http%3A%2F%2Frichard.cyganiak.de%2F2011%2Ftest%2FSoldering_iron_test.rdf

5. Copy-pasting the file's contents into the textarea on any23.org produces a broken double utf-8 encoded response, as indicated by the reporter

So the problem seems to be related to the processing of a submitted textarea.

Hypothesis, without having looked at the any23 servlet's code: the textarea's content is correctly submitted and sent over the wire as utf-8, but the servlet messes up the encoding before sending it to the any23 parser.

This seems relevant:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

It states that by default, POST bodies are assumed to be ISO-8859-1. It can be overridden by setting Content-Type on the HTTP request, but most browsers don't do that when submitting form posts, so it doesn't appear to be an option. The solution proposed there is to include a filter before the servlet that fixes the encoding. Apparently, ready-made code for doing that could be lifted from Tomcat.
                
> character are wrongly encoded in rdfxml output 
> -----------------------------------------------
>
>                 Key: ANY23-12
>                 URL: https://issues.apache.org/jira/browse/ANY23-12
>             Project: Apache Any23
>          Issue Type: Bug
>            Reporter: Lewis John McGibbney
>         Attachments: Soldering_iron_test.rdf
>
>
> What steps will reproduce the problem?
> 1. open file Soldering_iron_test.rdf in your browser see that all characters are displayed correctly
> espacially look for all rdfs:label  in different languages  
> 2. go to any23.org
> 3. copy the file content into content form 
> 4. set output to rdfxml 
> What is the expected output? What do you see instead?
> In the output rdfxml rdfs:label's are wrongly encoded   
> What version of the product are you using?
> Please provide any additional information below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ANY23-12) character are wrongly encoded in rdfxml output

Posted by "Lewis John McGibbney (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ANY23-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated ANY23-12:
--------------------------------------

    Affects Version/s: 0.7.0
        Fix Version/s: 0.8.0
    
> character are wrongly encoded in rdfxml output 
> -----------------------------------------------
>
>                 Key: ANY23-12
>                 URL: https://issues.apache.org/jira/browse/ANY23-12
>             Project: Apache Any23
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Lewis John McGibbney
>             Fix For: 0.8.0
>
>         Attachments: Soldering_iron_test.rdf
>
>
> What steps will reproduce the problem?
> 1. open file Soldering_iron_test.rdf in your browser see that all characters are displayed correctly
> espacially look for all rdfs:label  in different languages  
> 2. go to any23.org
> 3. copy the file content into content form 
> 4. set output to rdfxml 
> What is the expected output? What do you see instead?
> In the output rdfxml rdfs:label's are wrongly encoded   
> What version of the product are you using?
> Please provide any additional information below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira