You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@synapse.apache.org by Paul Fremantle <pz...@gmail.com> on 2007/12/28 10:16:49 UTC

Re: Possible improvements in XSLTMediator

> > One more improvement.... I think we should make it possible to change
> > the default size that triggers a file using a config file (e.g.
> > synapse.properties).
>
> I agree. Please raise a JIRA :)

Done!
https://issues.apache.org/jira/browse/SYNAPSE-217

>
> > Anything else?
>
> I had a look at the code that handles the case where the output of the
> transformation is text rather than XML. I think there are multiple
> issues:
>
> 1) There are multiple places where character streams are converted to
> byte streams and vice versa:
>
> * Since the XSLT processor is configured with a StreamResult writing
> to an OutputStream (ByteArrayOutputStream or FileOutputStream), it
> will convert the output to a byte stream.
> * The output is then converted back to a character stream using
> ByteArrayOutputStream#toString or using TextFileDataSource.
> * In VFSTransportSender it is converted back to a byte stream using
> String#getBytes or OMNode#serializeAndConsume.
>
> The problem is that nowhere the code cares about the character
> encoding that is used in these conversions. I opened SYNAPSE-215 to
> describe the issue with ByteArrayOutputStream#toString. Probably in
> many cases the different issues tend to compensate each other so that
> the end result is correct. For example, ByteArrayOutputStream#toString
> and String#getBytes both use the platform's default encoding, so that
> the original byte stream is reconstructed. However this will fail if
> the byte stream contains sequences that are not valid in the default
> encoding (this may happen e.g. in UTF-8). Anyway, Synapse should be
> fixed to handle character encodings properly from end to end.

I agree we need to think this through more thoroughly.


> 2) There are specific issues with TextFileDataSource:
>
> * When a ByteArrayOutputStream is used, the result is parsed as plain
> text (since an OMText object is created directly from the result of
> ByteArrayOutputStream#toString). On the other hand, when
> TextFileDataSource is used, the result is parsed as XML (more
> precisely as an external parsed general entity). For example, the
> ampersand (&) is considered as the start character for an XML entity.
> I opened SYNAPSE-216 for this issue. Note however that when the data
> is consumed by VFSTransportSender, this problem is circumvented by the
> fact that the serialize method bypasses the XML parsing...

Well spotted. Thanks!

> * TextFileDataSource implements OMDataSource but doesn't respect the
> contract (the Javadoc of OMDataSource is not very explicit but this
> can be seen from various examples in the Axis 2 source code):
> - serialize(OutputStream, OMOutputFormat) doesn't output the <text>
> wrapper element (actually the code is commented out) and doesn't take
> into account the character encoding specified by the OMOutputFormat.
> - serialize(Writer, OMOutputFormat) only outputs an empty <text>
> element.
> While this is exactly what is expected by VFSTransportSender, this
> might lead to unexpected results in other situations.

I agree this needs to be sorted out. The OMDataSource model is all
about consistency and hiding the details of an underlying source, so
it needs to be right. I'm guessing this was done in a hurry and never
completed properly.


> The purpose of TextFileDataSource is actually to implement a text node
> (+ <text> wrapper element) that is backed by a temporary file rather
> than a String/char[] object, thereby avoiding to load the entire file
> into memory. Maybe we should consider another solution that avoids the
> problems described above. The idea would be to use a custom
> implementation of OMText (that again is backed by a temporary file). I
> think that if the custom implementation extends OMNodeImpl and
> implements OMText, an instance can be added to the Axiom tree without
> problem, given that the Axiom code never casts OMText to OMTextImpl.
> An alternative (but less clean) solution would be to extend OMTextImpl.

I agree these would work, but I'm not sure why this would be better
than fixing up the DataSource model. The DataSource approach was
designed to solve the problem of creating new forms of data-backed
OMNodes. If it isn't working well enough we should raise this issue
with the axiom dev team.

> 3) Before solving some of the issues described above, another question
> needs to be addressed. XSLTMediator actually uses the following
> strategy to handle text output: it first tries to parse the output as
> an XML document and when this fails it will consider the output as
> text. There are however two different cases where this happens:
> While it is probably possible to handle both cases correctly, this
> would introduce unnecessary complexity to the code. I think it is not
> necessary to support the second case. I would expect that when a
> stylesheet specifies XML as output method but fails to produce a well
> formed XML document, the mediation fails with an error. However, some
> of the example stylesheets (see java/repository/conf/sample/resources/
> transform/transform_load.xml) that come with the Synapse source code
> don't specify "text" as output method but produce text only.

I completely agree.... the output should be as stated in the XSLT, not
randomly guessed at by the mediator.
+1 to fixing it.

Best

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-dev-help@ws.apache.org


Re: Possible improvements in XSLTMediator

Posted by Andreas Veithen <an...@skynet.be>.
Paul

I did an initial implementation (see attachment). When I use this in  
TextFileDataSource, the test case I submitted on SYNAPSE-216 passes.  
Obviously it requires some more testing and I will also add some more  
comments to the code. I will submit a complete patch later.

Andreas


Re: Possible improvements in XSLTMediator

Posted by Andreas Veithen <an...@skynet.be>.
Paul

I did an initial implementation (see attachment). When I use this in  
TextFileDataSource, the test case I submitted on SYNAPSE-216 passes.  
Obviously it requires some more testing and I will also add some more  
comments to the code. I will submit a complete patch later.

Andreas


Re: Possible improvements in XSLTMediator

Posted by Paul Fremantle <pz...@gmail.com>.
Andreas

I agree this is a pretty simple solution... I like it.

Let me take a proper look later today when I get some time.

Paul

On Dec 28, 2007 10:23 AM, Andreas Veithen <an...@skynet.be> wrote:
>
> On 28 Dec 2007, at 10:16, Paul Fremantle wrote:
>
> >> The purpose of TextFileDataSource is actually to implement a text
> >> node
> >> (+ <text> wrapper element) that is backed by a temporary file rather
> >> than a String/char[] object, thereby avoiding to load the entire file
> >> into memory. Maybe we should consider another solution that avoids
> >> the
> >> problems described above. The idea would be to use a custom
> >> implementation of OMText (that again is backed by a temporary
> >> file). I
> >> think that if the custom implementation extends OMNodeImpl and
> >> implements OMText, an instance can be added to the Axiom tree without
> >> problem, given that the Axiom code never casts OMText to OMTextImpl.
> >> An alternative (but less clean) solution would be to extend
> >> OMTextImpl.
> >
> > I agree these would work, but I'm not sure why this would be better
> > than fixing up the DataSource model. The DataSource approach was
> > designed to solve the problem of creating new forms of data-backed
> > OMNodes. If it isn't working well enough we should raise this issue
> > with the axiom dev team.
>
> The main argument in favor of the custom OMText solution was
> simplicity. There is however another solution that is a bit more
> complicated but that strictly adheres to the OMDataSource model. The
> idea is as follows. What TextFileDataSource does is to construct an
> InputStream that represents the concatenation <text> + text + </text>.
> It then uses this input stream to build an XMLStreamReader. The
> problem comes from the fact that "text" is plain text, not XML text. A
> quick fix could be to wrap the text in a CDATA section (i.e. <text><!
> [CDATA[ + text + ]]></text>), but this doesn't solve the character
> encoding problem. A better solution is then to implement a custom
> XMLStreamReader that synthesizes the following sequence of events:
>
> * START_DOCUMENT
> * START_ELEMENT
> * (CHARACTERS)*n
> * END_ELEMENT
> * END_DOCUMENT
>
> Since the output of XMLStreamReader for the CHARACTER events is pure
> character data, we don't need to care about XML entities and character
> encoding at that level.
>
> To summarize, the solution would be to replace the custom InputStream
> implementation by a custom XMLStreamReader implementation. This
> XMLStreamReader implementation would have more or less the same
> complexity as the existing InputStream implementation.
>
>
> Regards,
>
> Andreas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-user-help@ws.apache.org
>
>



-- 
Paul Fremantle
Co-Founder and VP of Technical Sales, WSO2
OASIS WS-RX TC Co-chair

blog: http://pzf.fremantle.org
paul@wso2.com

"Oxygenating the Web Service Platform", www.wso2.com

---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org


Re: Possible improvements in XSLTMediator

Posted by Paul Fremantle <pz...@gmail.com>.
Andreas

I agree this is a pretty simple solution... I like it.

Let me take a proper look later today when I get some time.

Paul

On Dec 28, 2007 10:23 AM, Andreas Veithen <an...@skynet.be> wrote:
>
> On 28 Dec 2007, at 10:16, Paul Fremantle wrote:
>
> >> The purpose of TextFileDataSource is actually to implement a text
> >> node
> >> (+ <text> wrapper element) that is backed by a temporary file rather
> >> than a String/char[] object, thereby avoiding to load the entire file
> >> into memory. Maybe we should consider another solution that avoids
> >> the
> >> problems described above. The idea would be to use a custom
> >> implementation of OMText (that again is backed by a temporary
> >> file). I
> >> think that if the custom implementation extends OMNodeImpl and
> >> implements OMText, an instance can be added to the Axiom tree without
> >> problem, given that the Axiom code never casts OMText to OMTextImpl.
> >> An alternative (but less clean) solution would be to extend
> >> OMTextImpl.
> >
> > I agree these would work, but I'm not sure why this would be better
> > than fixing up the DataSource model. The DataSource approach was
> > designed to solve the problem of creating new forms of data-backed
> > OMNodes. If it isn't working well enough we should raise this issue
> > with the axiom dev team.
>
> The main argument in favor of the custom OMText solution was
> simplicity. There is however another solution that is a bit more
> complicated but that strictly adheres to the OMDataSource model. The
> idea is as follows. What TextFileDataSource does is to construct an
> InputStream that represents the concatenation <text> + text + </text>.
> It then uses this input stream to build an XMLStreamReader. The
> problem comes from the fact that "text" is plain text, not XML text. A
> quick fix could be to wrap the text in a CDATA section (i.e. <text><!
> [CDATA[ + text + ]]></text>), but this doesn't solve the character
> encoding problem. A better solution is then to implement a custom
> XMLStreamReader that synthesizes the following sequence of events:
>
> * START_DOCUMENT
> * START_ELEMENT
> * (CHARACTERS)*n
> * END_ELEMENT
> * END_DOCUMENT
>
> Since the output of XMLStreamReader for the CHARACTER events is pure
> character data, we don't need to care about XML entities and character
> encoding at that level.
>
> To summarize, the solution would be to replace the custom InputStream
> implementation by a custom XMLStreamReader implementation. This
> XMLStreamReader implementation would have more or less the same
> complexity as the existing InputStream implementation.
>
>
> Regards,
>
> Andreas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-user-help@ws.apache.org
>
>



-- 
Paul Fremantle
Co-Founder and VP of Technical Sales, WSO2
OASIS WS-RX TC Co-chair

blog: http://pzf.fremantle.org
paul@wso2.com

"Oxygenating the Web Service Platform", www.wso2.com

---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-dev-help@ws.apache.org


Re: Possible improvements in XSLTMediator

Posted by Andreas Veithen <an...@skynet.be>.
On 28 Dec 2007, at 10:16, Paul Fremantle wrote:

>> The purpose of TextFileDataSource is actually to implement a text  
>> node
>> (+ <text> wrapper element) that is backed by a temporary file rather
>> than a String/char[] object, thereby avoiding to load the entire file
>> into memory. Maybe we should consider another solution that avoids  
>> the
>> problems described above. The idea would be to use a custom
>> implementation of OMText (that again is backed by a temporary  
>> file). I
>> think that if the custom implementation extends OMNodeImpl and
>> implements OMText, an instance can be added to the Axiom tree without
>> problem, given that the Axiom code never casts OMText to OMTextImpl.
>> An alternative (but less clean) solution would be to extend  
>> OMTextImpl.
>
> I agree these would work, but I'm not sure why this would be better
> than fixing up the DataSource model. The DataSource approach was
> designed to solve the problem of creating new forms of data-backed
> OMNodes. If it isn't working well enough we should raise this issue
> with the axiom dev team.

The main argument in favor of the custom OMText solution was  
simplicity. There is however another solution that is a bit more  
complicated but that strictly adheres to the OMDataSource model. The  
idea is as follows. What TextFileDataSource does is to construct an  
InputStream that represents the concatenation <text> + text + </text>.  
It then uses this input stream to build an XMLStreamReader. The  
problem comes from the fact that "text" is plain text, not XML text. A  
quick fix could be to wrap the text in a CDATA section (i.e. <text><! 
[CDATA[ + text + ]]></text>), but this doesn't solve the character  
encoding problem. A better solution is then to implement a custom  
XMLStreamReader that synthesizes the following sequence of events:

* START_DOCUMENT
* START_ELEMENT
* (CHARACTERS)*n
* END_ELEMENT
* END_DOCUMENT

Since the output of XMLStreamReader for the CHARACTER events is pure  
character data, we don't need to care about XML entities and character  
encoding at that level.

To summarize, the solution would be to replace the custom InputStream  
implementation by a custom XMLStreamReader implementation. This  
XMLStreamReader implementation would have more or less the same  
complexity as the existing InputStream implementation.

Regards,

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-dev-help@ws.apache.org


Re: Possible improvements in XSLTMediator

Posted by Andreas Veithen <an...@skynet.be>.
On 28 Dec 2007, at 10:16, Paul Fremantle wrote:

>> The purpose of TextFileDataSource is actually to implement a text  
>> node
>> (+ <text> wrapper element) that is backed by a temporary file rather
>> than a String/char[] object, thereby avoiding to load the entire file
>> into memory. Maybe we should consider another solution that avoids  
>> the
>> problems described above. The idea would be to use a custom
>> implementation of OMText (that again is backed by a temporary  
>> file). I
>> think that if the custom implementation extends OMNodeImpl and
>> implements OMText, an instance can be added to the Axiom tree without
>> problem, given that the Axiom code never casts OMText to OMTextImpl.
>> An alternative (but less clean) solution would be to extend  
>> OMTextImpl.
>
> I agree these would work, but I'm not sure why this would be better
> than fixing up the DataSource model. The DataSource approach was
> designed to solve the problem of creating new forms of data-backed
> OMNodes. If it isn't working well enough we should raise this issue
> with the axiom dev team.

The main argument in favor of the custom OMText solution was  
simplicity. There is however another solution that is a bit more  
complicated but that strictly adheres to the OMDataSource model. The  
idea is as follows. What TextFileDataSource does is to construct an  
InputStream that represents the concatenation <text> + text + </text>.  
It then uses this input stream to build an XMLStreamReader. The  
problem comes from the fact that "text" is plain text, not XML text. A  
quick fix could be to wrap the text in a CDATA section (i.e. <text><! 
[CDATA[ + text + ]]></text>), but this doesn't solve the character  
encoding problem. A better solution is then to implement a custom  
XMLStreamReader that synthesizes the following sequence of events:

* START_DOCUMENT
* START_ELEMENT
* (CHARACTERS)*n
* END_ELEMENT
* END_DOCUMENT

Since the output of XMLStreamReader for the CHARACTER events is pure  
character data, we don't need to care about XML entities and character  
encoding at that level.

To summarize, the solution would be to replace the custom InputStream  
implementation by a custom XMLStreamReader implementation. This  
XMLStreamReader implementation would have more or less the same  
complexity as the existing InputStream implementation.

Regards,

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org