You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Lisa Smith <li...@simplymeasured.com> on 2013/08/07 18:32:36 UTC

Transforming XML and preserving Unicode characters (emoji)

Hello,

My XSLT transformations have been successful for months until I ran across
an XML file with Unicode characters (emoji characters). I need to preserve
the Unicode but XSLT is converting it to HTML Entities. I thought that
setting the encoding to UTF-8 would solve my problem but I'm still having
issues.

If I set the output property OutputKeys.METHOD to "text" the emojis remain,
however all of my XML elements are stripped.  When I set OutputKeys.METHOD
to "xml" the emoji is transformed to HTML Entities.

Any help appreciated. Code:

private byte[] transform(InputStream stream) throws Exception{
    System.setProperty("javax.xml.transform.TransformerFactory",
"org.apache.xalan.processor.TransformerFactoryImpl");

    Transformer xmlTransformer;

    xmlTransformer = (TransformerImpl)
TransformerFactory.newInstance().newTransformer(new
StreamSource(createXsltStylesheet()));
    xmlTransformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");

    XMLStreamReader reader =
XMLInputFactory.newInstance().createXMLStreamReader(stream,"UTF-8");
    Source staxSource = new StAXSource(reader, true);
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    Writer writer = new OutputStreamWriter(outputStream, "UTF-8");
    xmlTransformer.transform(staxSource, new StreamResult(writer));


    return outputStream.toByteArray();}


thanks!,

Lisa

Re: Transforming XML and preserving Unicode characters (emoji)

Posted by "USHAKOV, Sergey" <s-...@yandex.ru>.
Hi Lisa,

it may be a good idea to have a look at the 'xalan:entities' serializer 
property:
http://xml.apache.org/xalan-j/usagepatterns.html#outputprops
http://permalink.gmane.org/gmane.text.docbook.apps/20285

HTH...

Regards,
Sergey


On 07.08.13 20:32, Lisa Smith wrote:
>
> Hello,
>
> My XSLT transformations have been successful for months until I ran 
> across an XML file with Unicode characters (emoji characters). I need 
> to preserve the Unicode but XSLT is converting it to HTML Entities. I 
> thought that setting the encoding to UTF-8 would solve my problem but 
> I'm still having issues.
>
> If I set the output property OutputKeys.METHOD to "text" the emojis 
> remain, however all of my XML elements are stripped.  When I set 
> OutputKeys.METHOD to "xml" the emoji is transformed to HTML Entities.
>
> Any help appreciated. Code:
>
> |private  byte[]  transform(InputStream  stream)  throws  Exception{
>      System.setProperty("javax.xml.transform.TransformerFactory",  "org.apache.xalan.processor.TransformerFactoryImpl");  
>
>      Transformer  xmlTransformer;
>
>      xmlTransformer=  (TransformerImpl)  TransformerFactory.newInstance().newTransformer(new    StreamSource(createXsltStylesheet()));
>      xmlTransformer.setOutputProperty(OutputKeys.ENCODING,  "UTF-8");
>
>      XMLStreamReader  reader=  XMLInputFactory.newInstance().createXMLStreamReader(stream,"UTF-8");
>      Source  staxSource=  new  StAXSource(reader,  true);  
>      ByteArrayOutputStream  outputStream=  new  ByteArrayOutputStream();
>      Writer  writer=  new  OutputStreamWriter(outputStream,  "UTF-8");
>      xmlTransformer.transform(staxSource,  new  StreamResult(writer));
>
>
>      return  outputStream.toByteArray();
> }|
>
>
> thanks!,
>
> Lisa
>