You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "stack@archive.org (JIRA)" <ji...@apache.org> on 2005/11/10 23:34:03 UTC

[jira] Commented: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

    [ http://issues.apache.org/jira/browse/NUTCH-110?page=comments#action_12357300 ] 

stack@archive.org commented on NUTCH-110:
-----------------------------------------

Scrub NUTCH-110-version2.patch. This patch double-encode certain entities (First by the new toValidXmlText method, second by the javax.xml.transform.Transformer transformer used by OpenSearchServlet). 

Use the original patch, fixIllegalXmlChars.patch, to address the problem described in this issue.

> OpenSearchServlet outputs illegal xml characters
> ------------------------------------------------
>
>          Key: NUTCH-110
>          URL: http://issues.apache.org/jira/browse/NUTCH-110
>      Project: Nutch
>         Type: Bug
>   Components: searcher
>     Versions: 0.7
>  Environment: linux, jdk 1.5
>     Reporter: stack@archive.org
>  Attachments: NUTCH-110-version2.patch, fixIllegalXmlChars.patch
>
> OpenSearchServlet does not check text-to-output for illegal xml characters; dependent on  search result, its possible for OSS to output xml that is not well-formed.  For example, if text has the character FF character in it -- -- i.e. the ascii character at position (decimal) 12 --  the produced XML will show the FF character as '&#12;' The character/entity '&#12;' is not legal in XML according to http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira