You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@synapse.apache.org by "Andreas Veithen (JIRA)" <ji...@apache.org> on 2008/05/10 23:37:55 UTC

[jira] Created: (SYNAPSE-304) BaseUtils uses incorrect strategy to distinguish between XML, text and binary

BaseUtils uses incorrect strategy to distinguish between XML, text and binary
-----------------------------------------------------------------------------

                 Key: SYNAPSE-304
                 URL: https://issues.apache.org/jira/browse/SYNAPSE-304
             Project: Synapse
          Issue Type: Bug
          Components: Transports
            Reporter: Andreas Veithen
            Assignee: Andreas Veithen
             Fix For: 1.3


BaseUtils#setSOAPEnvelope (together with BaseUtils#handleLegacyMessage) is used by the  VFS, Mail, JMS and AMQP transports and implements the following strategy to distinguish between XML, text and binary payloads: It first tries to parse the payload as XML. If that fails, it tries to load it as text using BaseUtils#getMessageTextPayload. If that fails again, it loads the message as binary data using BaseUtils#getMessageBinaryPayload.

This strategy has the following flaws:
* Corrupted or invalid XML messages are not detected as such but interpreted as text or binary data. This will almost certainly lead to errors at a later stage in the processing (typically in a mediation that doesn't expect text or binary payloads), but for the user it is difficult to identify the root cause of the problem.
* The VFSUtils and MailUtils implementation of the getMessageTextPayload method actually never fail (except if the file or mime part can't be read). The reason is that they read the content as binary and then construct a String object using new String(byte[]). This constructor never throws an exception, even if there are byte sequences not valid in the platform's default charset. Therefore the VFS and mail transport listeners will never process messages as binary payloads. This problem can't be solved because there is in fact no (reliable) way to distinguish text from binary data by inspecting the content alone. Also note that using the platform's default charset to decode the message is also incorrect (see SYNAPSE-261).
* This approach doesn't allow using custom message builders to parse messages that are neither XML nor plain text or binary.

I think that every transport should first determine the content type of the message and than decode the message according to that content type, rather than trying different ways to decode the message. The decoding should be delegated to the message builder corresponding to the content type. This approach has the following advantages:
* Corrupted or invalid messages trigger an appropriate error immediately.
* Since for text payloads the content type can include information about the charset (text/plain; charset=...), it provides a straightforward solution for issues like SYNAPSE-261.
* Custom message builders can be used.
* It naturally fits into Axis' architecture since it correctly uses the concepts of transport and message builder.
* It leads to a more consistent behavior between different transports (in particular with the NIO HTTP transport).

The transport should determine the content type either from the service configuration (e.g. the transport.vfs.ContentType property for VFS) or from information available at the transport protocol level (Content-Type header for mail messages, FileContentInfo or file suffix for VFS, message type for JMS, etc.).

The algorithm to select the message builder based on the content type and to invoke it to create the SOAP infoset is already implemented in the TransportUtils.createSOAPMessage utility method in the Axis2 kernel (which is also used by the NIO HTTP and the UDP transport). Therefore the proposed changes are:
1. Create message builders for text and binary payloads (which are the counterparts of PlainTextFormatter and BinaryFormatter introduced by SYNAPSE-261).
2. Let ServerManager#start register these new message builders by default in the Axis configuration for content types "text/plain" and "application/octet-stream" respectively (in a similar way as AxisConfigBuilder#populateConfig registers default message builders for text/xml, application/soap+xml, etc.). In addition they should be added to the default axis2.xml file shipped with Synapse.
3. Make sure that every affected transport implements an appropriate strategy to determine the content type and uses TransportUtils.createSOAPMessage instead of BaseUtils#setSOAPEnvelope to process the message payload.
4. Remove BaseUtils#setSOAPEnvelope and related code without replacement.

The proposed work plan is as follows:
* For release 1.2, implement 1 and 2 as well as 3 for the VFS transport. This allows to completely resolve issue SYNAPSE-261.
* For release 1.3, implement 3 and 4 for all remaining transports (for JMS and AMQP after SYNAPSE-303 has been handled).



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org


[jira] Commented: (SYNAPSE-304) BaseUtils uses incorrect strategy to distinguish between XML, text and binary

Posted by "Andreas Veithen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SYNAPSE-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595905#action_12595905 ] 

Andreas Veithen commented on SYNAPSE-304:
-----------------------------------------

The way the VFS transport determines the content type needs review. Indeed VFSTransportListener#startListeningForService considers the content type as a mandatory service parameter (see usage of getRequiredServiceParam), while VFSTransportListener#processFile defines some fallback mechanisms (such as looking at the file name suffix) if it is not specified as service parameter...

I will leave it like that for 1.2, but for 1.3 we should sort this out.

> BaseUtils uses incorrect strategy to distinguish between XML, text and binary
> -----------------------------------------------------------------------------
>
>                 Key: SYNAPSE-304
>                 URL: https://issues.apache.org/jira/browse/SYNAPSE-304
>             Project: Synapse
>          Issue Type: Bug
>          Components: Transports
>            Reporter: Andreas Veithen
>            Assignee: Andreas Veithen
>             Fix For: 1.3
>
>
> BaseUtils#setSOAPEnvelope (together with BaseUtils#handleLegacyMessage) is used by the  VFS, Mail, JMS and AMQP transports and implements the following strategy to distinguish between XML, text and binary payloads: It first tries to parse the payload as XML. If that fails, it tries to load it as text using BaseUtils#getMessageTextPayload. If that fails again, it loads the message as binary data using BaseUtils#getMessageBinaryPayload.
> This strategy has the following flaws:
> * Corrupted or invalid XML messages are not detected as such but interpreted as text or binary data. This will almost certainly lead to errors at a later stage in the processing (typically in a mediation that doesn't expect text or binary payloads), but for the user it is difficult to identify the root cause of the problem.
> * The VFSUtils and MailUtils implementation of the getMessageTextPayload method actually never fail (except if the file or mime part can't be read). The reason is that they read the content as binary and then construct a String object using new String(byte[]). This constructor never throws an exception, even if there are byte sequences not valid in the platform's default charset. Therefore the VFS and mail transport listeners will never process messages as binary payloads. This problem can't be solved because there is in fact no (reliable) way to distinguish text from binary data by inspecting the content alone. Also note that using the platform's default charset to decode the message is also incorrect (see SYNAPSE-261).
> * This approach doesn't allow using custom message builders to parse messages that are neither XML nor plain text or binary.
> I think that every transport should first determine the content type of the message and than decode the message according to that content type, rather than trying different ways to decode the message. The decoding should be delegated to the message builder corresponding to the content type. This approach has the following advantages:
> * Corrupted or invalid messages trigger an appropriate error immediately.
> * Since for text payloads the content type can include information about the charset (text/plain; charset=...), it provides a straightforward solution for issues like SYNAPSE-261.
> * Custom message builders can be used.
> * It naturally fits into Axis' architecture since it correctly uses the concepts of transport and message builder.
> * It leads to a more consistent behavior between different transports (in particular with the NIO HTTP transport).
> The transport should determine the content type either from the service configuration (e.g. the transport.vfs.ContentType property for VFS) or from information available at the transport protocol level (Content-Type header for mail messages, FileContentInfo or file suffix for VFS, message type for JMS, etc.).
> The algorithm to select the message builder based on the content type and to invoke it to create the SOAP infoset is already implemented in the TransportUtils.createSOAPMessage utility method in the Axis2 kernel (which is also used by the NIO HTTP and the UDP transport). Therefore the proposed changes are:
> 1. Create message builders for text and binary payloads (which are the counterparts of PlainTextFormatter and BinaryFormatter introduced by SYNAPSE-261).
> 2. Let ServerManager#start register these new message builders by default in the Axis configuration for content types "text/plain" and "application/octet-stream" respectively (in a similar way as AxisConfigBuilder#populateConfig registers default message builders for text/xml, application/soap+xml, etc.). In addition they should be added to the default axis2.xml file shipped with Synapse.
> 3. Make sure that every affected transport implements an appropriate strategy to determine the content type and uses TransportUtils.createSOAPMessage instead of BaseUtils#setSOAPEnvelope to process the message payload.
> 4. Remove BaseUtils#setSOAPEnvelope and related code without replacement.
> The proposed work plan is as follows:
> * For release 1.2, implement 1 and 2 as well as 3 for the VFS transport. This allows to completely resolve issue SYNAPSE-261.
> * For release 1.3, implement 3 and 4 for all remaining transports (for JMS and AMQP after SYNAPSE-303 has been handled).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org


[jira] Resolved: (SYNAPSE-304) BaseUtils uses incorrect strategy to distinguish between XML, text and binary

Posted by "Andreas Veithen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SYNAPSE-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Veithen resolved SYNAPSE-304.
-------------------------------------

    Resolution: Fixed

The mail, VFS and JMS transports now all use message builders and the code in BaseUtils#setSOAPEnvelope (and related methods) has been moved to AMQPUtils (the AMQP transport being the last transport using it). Since SYNAPSE-303 has been deferred, the present issue can be considered as resolved.

> BaseUtils uses incorrect strategy to distinguish between XML, text and binary
> -----------------------------------------------------------------------------
>
>                 Key: SYNAPSE-304
>                 URL: https://issues.apache.org/jira/browse/SYNAPSE-304
>             Project: Synapse
>          Issue Type: Bug
>          Components: Transports
>            Reporter: Andreas Veithen
>            Assignee: Andreas Veithen
>             Fix For: 1.3
>
>
> BaseUtils#setSOAPEnvelope (together with BaseUtils#handleLegacyMessage) is used by the  VFS, Mail, JMS and AMQP transports and implements the following strategy to distinguish between XML, text and binary payloads: It first tries to parse the payload as XML. If that fails, it tries to load it as text using BaseUtils#getMessageTextPayload. If that fails again, it loads the message as binary data using BaseUtils#getMessageBinaryPayload.
> This strategy has the following flaws:
> * Corrupted or invalid XML messages are not detected as such but interpreted as text or binary data. This will almost certainly lead to errors at a later stage in the processing (typically in a mediation that doesn't expect text or binary payloads), but for the user it is difficult to identify the root cause of the problem.
> * The VFSUtils and MailUtils implementation of the getMessageTextPayload method actually never fail (except if the file or mime part can't be read). The reason is that they read the content as binary and then construct a String object using new String(byte[]). This constructor never throws an exception, even if there are byte sequences not valid in the platform's default charset. Therefore the VFS and mail transport listeners will never process messages as binary payloads. This problem can't be solved because there is in fact no (reliable) way to distinguish text from binary data by inspecting the content alone. Also note that using the platform's default charset to decode the message is also incorrect (see SYNAPSE-261).
> * This approach doesn't allow using custom message builders to parse messages that are neither XML nor plain text or binary.
> I think that every transport should first determine the content type of the message and than decode the message according to that content type, rather than trying different ways to decode the message. The decoding should be delegated to the message builder corresponding to the content type. This approach has the following advantages:
> * Corrupted or invalid messages trigger an appropriate error immediately.
> * Since for text payloads the content type can include information about the charset (text/plain; charset=...), it provides a straightforward solution for issues like SYNAPSE-261.
> * Custom message builders can be used.
> * It naturally fits into Axis' architecture since it correctly uses the concepts of transport and message builder.
> * It leads to a more consistent behavior between different transports (in particular with the NIO HTTP transport).
> The transport should determine the content type either from the service configuration (e.g. the transport.vfs.ContentType property for VFS) or from information available at the transport protocol level (Content-Type header for mail messages, FileContentInfo or file suffix for VFS, message type for JMS, etc.).
> The algorithm to select the message builder based on the content type and to invoke it to create the SOAP infoset is already implemented in the TransportUtils.createSOAPMessage utility method in the Axis2 kernel (which is also used by the NIO HTTP and the UDP transport). Therefore the proposed changes are:
> 1. Create message builders for text and binary payloads (which are the counterparts of PlainTextFormatter and BinaryFormatter introduced by SYNAPSE-261).
> 2. Let ServerManager#start register these new message builders by default in the Axis configuration for content types "text/plain" and "application/octet-stream" respectively (in a similar way as AxisConfigBuilder#populateConfig registers default message builders for text/xml, application/soap+xml, etc.). In addition they should be added to the default axis2.xml file shipped with Synapse.
> 3. Make sure that every affected transport implements an appropriate strategy to determine the content type and uses TransportUtils.createSOAPMessage instead of BaseUtils#setSOAPEnvelope to process the message payload.
> 4. Remove BaseUtils#setSOAPEnvelope and related code without replacement.
> The proposed work plan is as follows:
> * For release 1.2, implement 1 and 2 as well as 3 for the VFS transport. This allows to completely resolve issue SYNAPSE-261.
> * For release 1.3, implement 3 and 4 for all remaining transports (for JMS and AMQP after SYNAPSE-303 has been handled).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org


[jira] Commented: (SYNAPSE-304) BaseUtils uses incorrect strategy to distinguish between XML, text and binary

Posted by "Andreas Veithen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SYNAPSE-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595908#action_12595908 ] 

Andreas Veithen commented on SYNAPSE-304:
-----------------------------------------

Things in scope for the 1.2 release have been implemented. Items planned for 1.3 will be implemented later.

> BaseUtils uses incorrect strategy to distinguish between XML, text and binary
> -----------------------------------------------------------------------------
>
>                 Key: SYNAPSE-304
>                 URL: https://issues.apache.org/jira/browse/SYNAPSE-304
>             Project: Synapse
>          Issue Type: Bug
>          Components: Transports
>            Reporter: Andreas Veithen
>            Assignee: Andreas Veithen
>             Fix For: 1.3
>
>
> BaseUtils#setSOAPEnvelope (together with BaseUtils#handleLegacyMessage) is used by the  VFS, Mail, JMS and AMQP transports and implements the following strategy to distinguish between XML, text and binary payloads: It first tries to parse the payload as XML. If that fails, it tries to load it as text using BaseUtils#getMessageTextPayload. If that fails again, it loads the message as binary data using BaseUtils#getMessageBinaryPayload.
> This strategy has the following flaws:
> * Corrupted or invalid XML messages are not detected as such but interpreted as text or binary data. This will almost certainly lead to errors at a later stage in the processing (typically in a mediation that doesn't expect text or binary payloads), but for the user it is difficult to identify the root cause of the problem.
> * The VFSUtils and MailUtils implementation of the getMessageTextPayload method actually never fail (except if the file or mime part can't be read). The reason is that they read the content as binary and then construct a String object using new String(byte[]). This constructor never throws an exception, even if there are byte sequences not valid in the platform's default charset. Therefore the VFS and mail transport listeners will never process messages as binary payloads. This problem can't be solved because there is in fact no (reliable) way to distinguish text from binary data by inspecting the content alone. Also note that using the platform's default charset to decode the message is also incorrect (see SYNAPSE-261).
> * This approach doesn't allow using custom message builders to parse messages that are neither XML nor plain text or binary.
> I think that every transport should first determine the content type of the message and than decode the message according to that content type, rather than trying different ways to decode the message. The decoding should be delegated to the message builder corresponding to the content type. This approach has the following advantages:
> * Corrupted or invalid messages trigger an appropriate error immediately.
> * Since for text payloads the content type can include information about the charset (text/plain; charset=...), it provides a straightforward solution for issues like SYNAPSE-261.
> * Custom message builders can be used.
> * It naturally fits into Axis' architecture since it correctly uses the concepts of transport and message builder.
> * It leads to a more consistent behavior between different transports (in particular with the NIO HTTP transport).
> The transport should determine the content type either from the service configuration (e.g. the transport.vfs.ContentType property for VFS) or from information available at the transport protocol level (Content-Type header for mail messages, FileContentInfo or file suffix for VFS, message type for JMS, etc.).
> The algorithm to select the message builder based on the content type and to invoke it to create the SOAP infoset is already implemented in the TransportUtils.createSOAPMessage utility method in the Axis2 kernel (which is also used by the NIO HTTP and the UDP transport). Therefore the proposed changes are:
> 1. Create message builders for text and binary payloads (which are the counterparts of PlainTextFormatter and BinaryFormatter introduced by SYNAPSE-261).
> 2. Let ServerManager#start register these new message builders by default in the Axis configuration for content types "text/plain" and "application/octet-stream" respectively (in a similar way as AxisConfigBuilder#populateConfig registers default message builders for text/xml, application/soap+xml, etc.). In addition they should be added to the default axis2.xml file shipped with Synapse.
> 3. Make sure that every affected transport implements an appropriate strategy to determine the content type and uses TransportUtils.createSOAPMessage instead of BaseUtils#setSOAPEnvelope to process the message payload.
> 4. Remove BaseUtils#setSOAPEnvelope and related code without replacement.
> The proposed work plan is as follows:
> * For release 1.2, implement 1 and 2 as well as 3 for the VFS transport. This allows to completely resolve issue SYNAPSE-261.
> * For release 1.3, implement 3 and 4 for all remaining transports (for JMS and AMQP after SYNAPSE-303 has been handled).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org