You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by "Sharmin Choksey [comcast]" <sd...@comcast.net> on 2003/10/23 16:27:49 UTC

SOAP message parsing redundancy with doc-lit services.

This is a followup to the thread posted at http://www.mail-archive.com/axis-user@xml.apache.org/msg09663.html,. Subject : "Accessing the XML in the SOAP message body" posted Feb-03.

In reference to the message posted, I am specifically interested in knowing if there has been any development done in addressing the issue around parsing of the XML payload sent in the SOAP body by the AXIS runtime.  I see a design issue here with respect to performance since parsing of the xml payload in a doc-lit style service does not add any value/functionality in performing AXIS related tasks.  I see that the Message object has the ability to store different formats of the SOAP body element, inputstream, string, bytes etc.  Why isn't the content of the SOAP body element left alone instead of parsing it.

Doc-lit services fail to scale beyond 5 concurrent users with payloads of 150K with poor response times in contrast to sending the xml payload within CDATA where the parser will ignore such contents, scaling exponentially with response times in sub seconds.

Is there an alternative other than using attachments ? where the payload will not be parsed ? Please advise.

-Sharmin.

Re: SOAP message parsing redundancy with doc-lit services.

Posted by Steve Loughran <st...@iseran.com>.
Bill de hÓra wrote:


> 
> No doubt, but even so. it's *still* not an Axis issue. To avoid parsing 
> the body as an optimization, Axis would need to do a (weird) 
>  preprocessing with the XML to strip out the body before handing the 
> remaining envelope off to the XML parser.
> 
> I really do think SwA is where you should start, even though it seems to 
> be deprecated in favour of embedded payloads - but that strikes me as 
> fashion more than sense.


SwA1.1 still sticks the binaries at the end, it just pretends to embed 
them.

the advantage is it lets things line message signing apply to the 
attachments as well as the the body.


Re: SOAP message parsing redundancy with doc-lit services.

Posted by Bill de hÓra <de...@eircom.net>.
Sharmin Choksey [comcast] wrote:

> I understand that it is part of the SOAP XML structure, but the parsing done
> by Axis is inefficient due to generation of intermediate DOM objects that
> are a major cause of performance degredation and are absolutely redundant
> since there are other mechanisms of gettting data out of a payload and
> should be left to the business implementation to check for well-formedness.

No doubt, but even so. it's *still* not an Axis issue. To avoid 
parsing the body as an optimization, Axis would need to do a (weird) 
  preprocessing with the XML to strip out the body before handing 
the remaining envelope off to the XML parser.

I really do think SwA is where you should start, even though it 
seems to be deprecated in favour of embedded payloads - but that 
strikes me as fashion more than sense.


> CDATA is not even to considered an option since the payload will not support
> charset outside of ISO8859 during the parse routines of a UTF-8 charset.

Yep, good catch.


Re: SOAP message parsing redundancy with doc-lit services.

Posted by Bill de hÓra <de...@eircom.net>.
Sharmin Choksey [comcast] wrote:

> I understand that it is part of the SOAP XML structure, but the parsing done
> by Axis is inefficient due to generation of intermediate DOM objects that
> are a major cause of performance degredation and are absolutely redundant
> since there are other mechanisms of gettting data out of a payload and
> should be left to the business implementation to check for well-formedness.

No doubt, but even so. it's *still* not an Axis issue. To avoid 
parsing the body as an optimization, Axis would need to do a (weird) 
  preprocessing with the XML to strip out the body before handing 
the remaining envelope off to the XML parser.

I really do think SwA is where you should start, even though it 
seems to be deprecated in favour of embedded payloads - but that 
strikes me as fashion more than sense.


> CDATA is not even to considered an option since the payload will not support
> charset outside of ISO8859 during the parse routines of a UTF-8 charset.

Yep, good catch.


Re: SOAP message parsing redundancy with doc-lit services.

Posted by "Sharmin Choksey [comcast]" <sd...@comcast.net>.
> Because the body element and its contents are inside the envelope's
> root element. That is, this is a SOAP/XML issue, not an Axis issue.
> And if you're sending XML, you should parse it to ensure
> well-formedness or you'll break your receiver.

I understand that it is part of the SOAP XML structure, but the parsing done
by Axis is inefficient due to generation of intermediate DOM objects that
are a major cause of performance degredation and are absolutely redundant
since there are other mechanisms of gettting data out of a payload and
should be left to the business implementation to check for well-formedness.

> The MIME people got this right a long time ago. If you don't want to
> parse a SOAP payload, don't put it in a SOAP envelope. I would use
> SwA, or multipart mime. CDATA is a hack.
>

CDATA is not even to considered an option since the payload will not support
charset outside of ISO8859 during the parse routines of a UTF-8 charset.



Re: SOAP message parsing redundancy with doc-lit services.

Posted by "Sharmin Choksey [comcast]" <sd...@comcast.net>.
> Because the body element and its contents are inside the envelope's
> root element. That is, this is a SOAP/XML issue, not an Axis issue.
> And if you're sending XML, you should parse it to ensure
> well-formedness or you'll break your receiver.

I understand that it is part of the SOAP XML structure, but the parsing done
by Axis is inefficient due to generation of intermediate DOM objects that
are a major cause of performance degredation and are absolutely redundant
since there are other mechanisms of gettting data out of a payload and
should be left to the business implementation to check for well-formedness.

> The MIME people got this right a long time ago. If you don't want to
> parse a SOAP payload, don't put it in a SOAP envelope. I would use
> SwA, or multipart mime. CDATA is a hack.
>

CDATA is not even to considered an option since the payload will not support
charset outside of ISO8859 during the parse routines of a UTF-8 charset.



Re: SOAP message parsing redundancy with doc-lit services.

Posted by Bill de hÓra <de...@eircom.net>.
Sharmin Choksey [comcast] wrote:


> In reference to the message posted, I am specifically interested in 
> knowing if there has been any development done in addressing the issue 
> around parsing of the XML payload sent in the SOAP body by the AXIS 
> runtime.  I see a design issue here with respect to performance since 
> parsing of the xml payload in a doc-lit style service does not add any 
> value/functionality in performing AXIS related tasks.  I see that the 
> Message object has the ability to store different formats of the SOAP 
> body element, inputstream, string, bytes etc.  Why isn't the content of 
> the SOAP body element left alone instead of parsing it.

Because the body element and its contents are inside the envelope's 
root element. That is, this is a SOAP/XML issue, not an Axis issue. 
And if you're sending XML, you should parse it to ensure 
well-formedness or you'll break your receiver.


> Is there an alternative other than using attachments ? where the payload 
> will not be parsed ? Please advise.

The MIME people got this right a long time ago. If you don't want to 
parse a SOAP payload, don't put it in a SOAP envelope. I would use 
SwA, or multipart mime. CDATA is a hack.


Re: SOAP message parsing redundancy with doc-lit services.

Posted by Bill de hÓra <de...@eircom.net>.
Sharmin Choksey [comcast] wrote:


> In reference to the message posted, I am specifically interested in 
> knowing if there has been any development done in addressing the issue 
> around parsing of the XML payload sent in the SOAP body by the AXIS 
> runtime.  I see a design issue here with respect to performance since 
> parsing of the xml payload in a doc-lit style service does not add any 
> value/functionality in performing AXIS related tasks.  I see that the 
> Message object has the ability to store different formats of the SOAP 
> body element, inputstream, string, bytes etc.  Why isn't the content of 
> the SOAP body element left alone instead of parsing it.

Because the body element and its contents are inside the envelope's 
root element. That is, this is a SOAP/XML issue, not an Axis issue. 
And if you're sending XML, you should parse it to ensure 
well-formedness or you'll break your receiver.


> Is there an alternative other than using attachments ? where the payload 
> will not be parsed ? Please advise.

The MIME people got this right a long time ago. If you don't want to 
parse a SOAP payload, don't put it in a SOAP envelope. I would use 
SwA, or multipart mime. CDATA is a hack.


Re: SOAP message parsing redundancy with doc-lit services.

Posted by "Sharmin Choksey [comcast]" <sd...@comcast.net>.
Please see my comments ****** 

 Have you find any solution on this topic?

***** : The current solution points towards the usage of SOAP w/Attachments over MIME - since the message carried with the Attachment Part will not be parsed/processed in any way by the AXIS engine implementation.

  We are using message style. The problems we found are:

  1. AXIS parses the message inside of the soap body element but not validate it. Only useful of this is replay ability of SAX events.
      We want to validate our XML messages, do you know anyway to validate such message without reinitializing another SAX parser?

***** : I'm afraid not, to the best of my knowledge, you will need to re-initialize the SAX parser.

  2. If we use the following two message style signatures:
        public Element [] method(Element [] bodies);
        public SOAPBodyElement [] method (SOAPBodyElement [] bodies);

   In order to using SAX parser validates the XML message, the service provider 
   has to use the toString method to convert the array of elements into string, then feeds it 
   into InputSource. The "toString" method seems doubled memory usage, is it true? We are worried
   that we will be running into memory problems. Plus by initiating the new sax parser to put already  
   started AXIS sax parser obsolete. It seems very inefficient. 


***** : Its the DOM tree-like representation in memory that gets constructed as part of the AXIS engine implementation when handed off to the service endpoint api (Element []) that should be the cause of memory concerns.  Initializing the SAX parser to parse and validate the payload is not of significant impact to memory since the parser type is event based and the xml tree is not held in memory.

Good luck,
- Sharmin.






At 07:27 AM 10/23/2003 -0700, Sharmin Choksey [comcast] wrote:

    This is a followup to the thread posted at http://www.mail-archive.com/axis-user@xml.apache.org/msg09663.html,. Subject : "Accessing the XML in the SOAP message body" posted Feb-03.
     
    In reference to the message posted, I am specifically interested in knowing if there has been any development done in addressing the issue around parsing of the XML payload sent in the SOAP body by the AXIS runtime.  I see a design issue here with respect to performance since parsing of the xml payload in a doc-lit style service does not add any value/functionality in performing AXIS related tasks.  I see that the Message object has the ability to store different formats of the SOAP body element, inputstream, string, bytes etc.  Why isn't the content of the SOAP body element left alone instead of parsing it.
     
    Doc-lit services fail to scale beyond 5 concurrent users with payloads of 150K with poor response times in contrast to sending the xml payload within CDATA where the parser will ignore such contents, scaling exponentially with response times in sub seconds.
     
    Is there an alternative other than using attachments ? where the payload will not be parsed ? Please advise.
     
    -Sharmin.

Re: SOAP message parsing redundancy with doc-lit services.

Posted by "Sharmin Choksey [comcast]" <sd...@comcast.net>.
Please see my comments ****** 

 Have you find any solution on this topic?

***** : The current solution points towards the usage of SOAP w/Attachments over MIME - since the message carried with the Attachment Part will not be parsed/processed in any way by the AXIS engine implementation.

  We are using message style. The problems we found are:

  1. AXIS parses the message inside of the soap body element but not validate it. Only useful of this is replay ability of SAX events.
      We want to validate our XML messages, do you know anyway to validate such message without reinitializing another SAX parser?

***** : I'm afraid not, to the best of my knowledge, you will need to re-initialize the SAX parser.

  2. If we use the following two message style signatures:
        public Element [] method(Element [] bodies);
        public SOAPBodyElement [] method (SOAPBodyElement [] bodies);

   In order to using SAX parser validates the XML message, the service provider 
   has to use the toString method to convert the array of elements into string, then feeds it 
   into InputSource. The "toString" method seems doubled memory usage, is it true? We are worried
   that we will be running into memory problems. Plus by initiating the new sax parser to put already  
   started AXIS sax parser obsolete. It seems very inefficient. 


***** : Its the DOM tree-like representation in memory that gets constructed as part of the AXIS engine implementation when handed off to the service endpoint api (Element []) that should be the cause of memory concerns.  Initializing the SAX parser to parse and validate the payload is not of significant impact to memory since the parser type is event based and the xml tree is not held in memory.

Good luck,
- Sharmin.






At 07:27 AM 10/23/2003 -0700, Sharmin Choksey [comcast] wrote:

    This is a followup to the thread posted at http://www.mail-archive.com/axis-user@xml.apache.org/msg09663.html,. Subject : "Accessing the XML in the SOAP message body" posted Feb-03.
     
    In reference to the message posted, I am specifically interested in knowing if there has been any development done in addressing the issue around parsing of the XML payload sent in the SOAP body by the AXIS runtime.  I see a design issue here with respect to performance since parsing of the xml payload in a doc-lit style service does not add any value/functionality in performing AXIS related tasks.  I see that the Message object has the ability to store different formats of the SOAP body element, inputstream, string, bytes etc.  Why isn't the content of the SOAP body element left alone instead of parsing it.
     
    Doc-lit services fail to scale beyond 5 concurrent users with payloads of 150K with poor response times in contrast to sending the xml payload within CDATA where the parser will ignore such contents, scaling exponentially with response times in sub seconds.
     
    Is there an alternative other than using attachments ? where the payload will not be parsed ? Please advise.
     
    -Sharmin.

Re: SOAP message parsing redundancy with doc-lit services.

Posted by Jinghua Gu <eg...@cisco.com>.
Hi,

   Have you find any solution on this topic?

   We are using message style. The problems we found are:

   1. AXIS parses the message inside of the soap body element but not 
validate it. Only useful of this is replay ability of SAX events.
       We want to validate our XML messages, do you know anyway to validate 
such message without reinitializing another SAX parser?

   2. If we use the following two message style signatures:
         public Element [] method(Element [] bodies);
         public SOAPBodyElement [] method (SOAPBodyElement [] bodies);

    In order to using SAX parser validates the XML message, the service 
provider
    has to use the toString method to convert the array of elements into 
string, then feeds it
    into InputSource. The "toString" method seems doubled memory usage, is 
it true? We are worried
    that we will be running into memory problems. Plus by initiating the 
new sax parser to put already
    started AXIS sax parser obsolete. It seems very inefficient.

  Please advice.

Thanks,
Emily






At 07:27 AM 10/23/2003 -0700, Sharmin Choksey [comcast] wrote:
>This is a followup to the thread posted at 
><http://www.mail-archive.com/axis-user@xml.apache.org/msg09663.html>http://www.mail-archive.com/axis-user@xml.apache.org/msg09663.html,. 
>Subject : "Accessing the XML in the SOAP message body" posted Feb-03.
>
>In reference to the message posted, I am specifically interested in 
>knowing if there has been any development done in addressing the issue 
>around parsing of the XML payload sent in the SOAP body by the AXIS 
>runtime.  I see a design issue here with respect to performance since 
>parsing of the xml payload in a doc-lit style service does not add any 
>value/functionality in performing AXIS related tasks.  I see that the 
>Message object has the ability to store different formats of the SOAP body 
>element, inputstream, string, bytes etc.  Why isn't the content of the 
>SOAP body element left alone instead of parsing it.
>
>Doc-lit services fail to scale beyond 5 concurrent users with payloads of 
>150K with poor response times in contrast to sending the xml payload 
>within CDATA where the parser will ignore such contents, scaling 
>exponentially with response times in sub seconds.
>
>Is there an alternative other than using attachments ? where the payload 
>will not be parsed ? Please advise.
>
>-Sharmin.

Re: SOAP message parsing redundancy with doc-lit services.

Posted by Jinghua Gu <eg...@cisco.com>.
Hi,

   Have you find any solution on this topic?

   We are using message style. The problems we found are:

   1. AXIS parses the message inside of the soap body element but not 
validate it. Only useful of this is replay ability of SAX events.
       We want to validate our XML messages, do you know anyway to validate 
such message without reinitializing another SAX parser?

   2. If we use the following two message style signatures:
         public Element [] method(Element [] bodies);
         public SOAPBodyElement [] method (SOAPBodyElement [] bodies);

    In order to using SAX parser validates the XML message, the service 
provider
    has to use the toString method to convert the array of elements into 
string, then feeds it
    into InputSource. The "toString" method seems doubled memory usage, is 
it true? We are worried
    that we will be running into memory problems. Plus by initiating the 
new sax parser to put already
    started AXIS sax parser obsolete. It seems very inefficient.

  Please advice.

Thanks,
Emily






At 07:27 AM 10/23/2003 -0700, Sharmin Choksey [comcast] wrote:
>This is a followup to the thread posted at 
><http://www.mail-archive.com/axis-user@xml.apache.org/msg09663.html>http://www.mail-archive.com/axis-user@xml.apache.org/msg09663.html,. 
>Subject : "Accessing the XML in the SOAP message body" posted Feb-03.
>
>In reference to the message posted, I am specifically interested in 
>knowing if there has been any development done in addressing the issue 
>around parsing of the XML payload sent in the SOAP body by the AXIS 
>runtime.  I see a design issue here with respect to performance since 
>parsing of the xml payload in a doc-lit style service does not add any 
>value/functionality in performing AXIS related tasks.  I see that the 
>Message object has the ability to store different formats of the SOAP body 
>element, inputstream, string, bytes etc.  Why isn't the content of the 
>SOAP body element left alone instead of parsing it.
>
>Doc-lit services fail to scale beyond 5 concurrent users with payloads of 
>150K with poor response times in contrast to sending the xml payload 
>within CDATA where the parser will ignore such contents, scaling 
>exponentially with response times in sub seconds.
>
>Is there an alternative other than using attachments ? where the payload 
>will not be parsed ? Please advise.
>
>-Sharmin.