You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org> on 2011/10/12 21:15:11 UTC
[jira] [Commented] (ANY23-12) character are wrongly encoded in
rdfxml output
[ https://issues.apache.org/jira/browse/ANY23-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126062#comment-13126062 ]
Lewis John McGibbney commented on ANY23-12:
-------------------------------------------
Brief exploration:
1. The attached file is indeed utf-8 encoded and correctly marked as such in the header
2. On the command line, parsing and re-serializing it with "any23 -f rdfxml" produces a correctly utf-8 encoded file, no encoding problems
3. I uploaded a copy of the file here: http://richard.cyganiak.de/2011/test/Soldering_iron_test.rdf
4. Parsing and re-serializing this uploaded file with any23.org produces a correctly utf-8 encoded response, no encoding problems:
http://any23.org/any23/?format=rdfxml&uri=http%3A%2F%2Frichard.cyganiak.de%2F2011%2Ftest%2FSoldering_iron_test.rdf
5. Copy-pasting the file's contents into the textarea on any23.org produces a broken double utf-8 encoded response, as indicated by the reporter
So the problem seems to be related to the processing of a submitted textarea.
Hypothesis, without having looked at the any23 servlet's code: the textarea's content is correctly submitted and sent over the wire as utf-8, but the servlet messes up the encoding before sending it to the any23 parser.
This seems relevant:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
It states that by default, POST bodies are assumed to be ISO-8859-1. It can be overridden by setting Content-Type on the HTTP request, but most browsers don't do that when submitting form posts, so it doesn't appear to be an option. The solution proposed there is to include a filter before the servlet that fixes the encoding. Apparently, ready-made code for doing that could be lifted from Tomcat.
> character are wrongly encoded in rdfxml output
> -----------------------------------------------
>
> Key: ANY23-12
> URL: https://issues.apache.org/jira/browse/ANY23-12
> Project: Apache Any23
> Issue Type: Bug
> Reporter: Lewis John McGibbney
> Attachments: Soldering_iron_test.rdf
>
>
> What steps will reproduce the problem?
> 1. open file Soldering_iron_test.rdf in your browser see that all characters are displayed correctly
> espacially look for all rdfs:label in different languages
> 2. go to any23.org
> 3. copy the file content into content form
> 4. set output to rdfxml
> What is the expected output? What do you see instead?
> In the output rdfxml rdfs:label's are wrongly encoded
> What version of the product are you using?
> Please provide any additional information below.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira