You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2011/10/12 21:15:11 UTC

[jira] [Commented] (ANY23-12) character are wrongly encoded in rdfxml output

    [ https://issues.apache.org/jira/browse/ANY23-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126062#comment-13126062 ] 

Lewis John McGibbney commented on ANY23-12:
-------------------------------------------

Brief exploration:

1. The attached file is indeed utf-8 encoded and correctly marked as such in the header

2. On the command line, parsing and re-serializing it with "any23 -f rdfxml" produces a correctly utf-8 encoded file, no encoding problems

3. I uploaded a copy of the file here: http://richard.cyganiak.de/2011/test/Soldering_iron_test.rdf

4. Parsing and re-serializing this uploaded file with any23.org produces a correctly utf-8 encoded response, no encoding problems:
http://any23.org/any23/?format=rdfxml&uri=http%3A%2F%2Frichard.cyganiak.de%2F2011%2Ftest%2FSoldering_iron_test.rdf

5. Copy-pasting the file's contents into the textarea on any23.org produces a broken double utf-8 encoded response, as indicated by the reporter

So the problem seems to be related to the processing of a submitted textarea.

Hypothesis, without having looked at the any23 servlet's code: the textarea's content is correctly submitted and sent over the wire as utf-8, but the servlet messes up the encoding before sending it to the any23 parser.

This seems relevant:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

It states that by default, POST bodies are assumed to be ISO-8859-1. It can be overridden by setting Content-Type on the HTTP request, but most browsers don't do that when submitting form posts, so it doesn't appear to be an option. The solution proposed there is to include a filter before the servlet that fixes the encoding. Apparently, ready-made code for doing that could be lifted from Tomcat.
                
> character are wrongly encoded in rdfxml output 
> -----------------------------------------------
>
>                 Key: ANY23-12
>                 URL: https://issues.apache.org/jira/browse/ANY23-12
>             Project: Apache Any23
>          Issue Type: Bug
>            Reporter: Lewis John McGibbney
>         Attachments: Soldering_iron_test.rdf
>
>
> What steps will reproduce the problem?
> 1. open file Soldering_iron_test.rdf in your browser see that all characters are displayed correctly
> espacially look for all rdfs:label  in different languages  
> 2. go to any23.org
> 3. copy the file content into content form 
> 4. set output to rdfxml 
> What is the expected output? What do you see instead?
> In the output rdfxml rdfs:label's are wrongly encoded   
> What version of the product are you using?
> Please provide any additional information below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira