You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Andreas Hartmann <an...@apache.org> on 2007/05/10 16:37:28 UTC

Un-Escaping XML in transformer

Hi Cocooners,

I have a SAX stream containing fragments of escaped XML, e.g.

  <p> this is a &lt;a href="..."&gt;link&lt;/a&gt; </p>

and want to convert the characters into SAX events:

  <p> this is a <a href="...">link</a> </p>

I collect and assemble the character events, but I don't know how
to parse the resulting string and generate SAX events without
too much effort.

I tried StringXMLizable and XMLByteStreamInterpreter, but ran
into problems because contentHandler.startElement() is called
or the prolog is not correct.

What's the best way to do this?

TIA for any pointers!

-- Andreas



-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Un-Escaping XML in transformer

Posted by Andreas Hartmann <an...@apache.org>.
Grzegorz Kossakowski schrieb:
> Andreas Hartmann pisze:
>> Hi Cocooners,
>>
>> I have a SAX stream containing fragments of escaped XML, e.g.
>>
>>   <p> this is a &lt;a href="..."&gt;link&lt;/a&gt; </p>
>>
>> and want to convert the characters into SAX events:
>>
>>   <p> this is a <a href="...">link</a> </p>
>>
>> I collect and assemble the character events, but I don't know how
>> to parse the resulting string and generate SAX events without
>> too much effort.
>>
>> I tried StringXMLizable and XMLByteStreamInterpreter, but ran
>> into problems because contentHandler.startElement() is called
>> or the prolog is not correct.
>>
>> What's the best way to do this?
>>
>> TIA for any pointers!
> 
> I fear that your only option is to serialize XML, replace all escaped
> characters and parse it again. Serializing and parsing is really easy
> even inside transformer.

Thanks!

Here's something that doesn't look nice, but basically seems to work:

String string = "<unescape:wrap
xmlns:unescape=\"http://apache.org/lenya/unescape/1.0\">"
    + this.buffer.toString() + "</unescape:wrap>";

StringXMLizable xml = new StringXMLizable(string);
FragmentHandler fragmentHandler = new FragmentHandler(this.contentHandler);
xml.toSAX(fragmentHandler);


The FragmentHandler filters the startDocument() and endDocument() events
and the start/end events for the <wrap> element.
I'll do some more testing.

-- Andreas


-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Un-Escaping XML in transformer

Posted by Grzegorz Kossakowski <gk...@apache.org>.
Andreas Hartmann pisze:
> Hi Cocooners,
> 
> I have a SAX stream containing fragments of escaped XML, e.g.
> 
>   <p> this is a &lt;a href="..."&gt;link&lt;/a&gt; </p>
> 
> and want to convert the characters into SAX events:
> 
>   <p> this is a <a href="...">link</a> </p>
> 
> I collect and assemble the character events, but I don't know how
> to parse the resulting string and generate SAX events without
> too much effort.
> 
> I tried StringXMLizable and XMLByteStreamInterpreter, but ran
> into problems because contentHandler.startElement() is called
> or the prolog is not correct.
> 
> What's the best way to do this?
> 
> TIA for any pointers!

I fear that your only option is to serialize XML, replace all escaped characters and parse it again. Serializing and parsing is really easy 
even inside transformer.

-- 
Grzegorz Kossakowski
http://reflectingonthevicissitudes.wordpress.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org