You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@axis.apache.org by Gordon Brown <go...@yahoo.com> on 2009/06/05 23:15:42 UTC

Re: soap in client call contains gabage character -- A critical bug in guththila writer

OK, since no one reply to my question, I have to debug the code and found out that guththila has a bug in managing buffer when seriazlize thea axiom tree (the soap structure) before actually send out the request, and I have a potential fix. This is really a critical bug I think, so I hope some developers can take a look at this problem. I am attaching the test input data and code snappet to reproduce the problem.

Basically, the bug occurs in guththila_xml_writer.c. The guththila_xml_writer (I call it the soap serializer) maintains an array of buffers dynamically when it writes the soap structure into the buffers. The bug will occur in the following situation: 

Let's say I have an element <ns1:doDeleteFirst>12345</ns1:doDeleteFirst> somewhere in the soap structure. Now before this element, there are lots of other elements, and when the  guththila_xml_writer  trys to process this element, the first buffer is ALMOST full, it does not have enough space to write the whole element name <ns1:doDeleteFirst> (the start tag) into the buffer, it has to create a new buffer, so it writes <ns1: at the end of the first buffer (still a few more bytes left empty), and writes "doDeleteFirst" at the very beginning of the second buffer.

The first buffer (Buffer length 16384):
--------------------------------------------------------------------------
|**************************************************<ns1:--|

The second buffer (Buffer length 32768):
---------------------------------------------------------------------------------------------------------------------------
|doDeleteFirst-------------------------------------------------------------------------------------------------------------|

As the second buffer becomes the current buffer, when the writer trys to process the end tag (</ns1:doDeleteFirst>),  it uses an elem stack to track the namespace prefix and localname as in the following code: (starting from line 1396)

          elem->name = guththila_tok_list_get_token(&wr->tok_list, env);
          elem->prefix = guththila_tok_list_get_token(&wr->tok_list, env);
          elem->name->start = GUTHTHILA_BUF_POS(wr->buffer, elem_start);
          elem->name->size = elem_len;
          elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start);
          elem->prefix->size = pref_len; 

The macro GUTHTHILA_BUF_POS  is defined as this:

#ifndef GUTHTHILA_BUF_POS
#define GUTHTHILA_BUF_POS(_buffer, _pos) 
 ((_buffer).buff[(_buffer).cur_buff] + _pos - (_buffer).pre_tot_data)
#endif

The bug occurs when it calcuate elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start):

The elem_pref_start has a value of 16375, the pre_tot_data has a value of 16379 (the first buffer length is 16384), they are calculated based on the first buffer data, but the current buffer is the second one, so  elem->prefix->start points to gabage!

I hope this makes sense to you. Use my test case you will see this quickly. When you run the same XML data I attached, first set a break point at line 392 in the file guththila_xml_writer_wrapper, and set the hit count as 514 in the break properties (the 514th element in <ns1:doDeleteFirst>), then debug step by step.

The potential fix is to define GUTHTHILA_BUF_POS as the following:
 
     if ((_buffer)->pre_tot_data > _pos)
          return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
     else
          return ((_buffer)->buff[(_buffer)->cur_buff] + _pos - (_buffer)->pre_tot_data);

GUTHTHILA_BUF_POS is used everywhere, so I really hope some developer can take over this case and fix it!

Thanks!
Gordon




________________________________
From: Gordon Brown <go...@yahoo.com>
To: axis-c-user@ws.apache.org
Cc: axis-c-dev@ws.apache.org
Sent: Wednesday, June 3, 2009 12:49:21 AM
Subject: soap in client call contains gabage character -- Very very puzzling


Hi All,

I need urgent help with a very puzzling issue with axis2/c 1.6 ( I build the axis2/c using the code from trunk, slightly earlier before the offical release). Here is my issue:

I have a small XML data (16K) passed in to be as a UTF8 string, I checked the XML data is good (run through quite a few other tools to verify it). Now I used axiom APIs to parse the XML and make web service call like this:

=========
xml_reader = axiom_xml_reader_create_for_memory(_env, (
AXIS2_XML_PARSER_TYPE_BUFFER);
 
om_builder = axiom_stax_builder_create(_env, xml_reader);void*)xmlString_in.c_str(), xmlString_in.size(), "utf-8", 
axiom_document_t *document = axiom_stax_builder_get_document(om_builder, _env); 
axiom_node_t * payload = axiom_document_get_root_element(document, _env);
 
........
 axiom_node_t * node = axis2_svc_client_send_receive(_wsf_service_client, _env, payload );
============
 
Now I use tcpmon to intercept the call, I noticed that the data sent out contains some gabage characters (always in some XML tag, not the element value) like this:
 
    <ns1:doDeleteFirst>12345</ù:doDeleteFirst>
 
However, if I serialize the payload node before I make the client call, I can see the data is fine in memory. What puzzles me even more is that this thing only occur in one XML file I tried, but works fine for many other XML input (even as big as 10M bytes).  I've also attached the XML I used to procude the problem.
 
Does anyone have a clue about this?
 
Thanks much in advance!

Gordon


      

Re: soap in client call contains gabage character -- A critical bug in guththila writer

Posted by Gordon Brown <go...@yahoo.com>.
Hi Shankar, 

Thanks much for paying attention to this. My colleague Frank Zhou has already filed a jila report with the input data and testing code. It also contains the suggested fix. The jila bug is AXIS2C-1375.

We believe this is a blocker, as the soap message in pure English and it is small (~16K). This affects everyone using guththila (now the default).

Thanks!
Gordon


________________________________
From: Uthaiyashankar <sh...@wso2.com>
To: Apache AXIS C User List <ax...@ws.apache.org>
Cc: axis-c-dev@ws.apache.org
Sent: Tuesday, June 9, 2009 7:34:29 AM
Subject: Re: soap in client call contains gabage character -- A critical bug in guththila writer

Hi Gordon,

I'll have a look. Can you create a jira and attach the patch and the test code?

Regards,
Shankar

On Mon, Jun 8, 2009 at 11:40 PM, Gordon Brown<go...@yahoo.com> wrote:
> Can anyone in the development team please take a look at this one bug in
> Guththila component?
> At least the potential fix I provided in this message thread?
>
> ======================
> The potential fix is to define GUTHTHILA_BUF_POS as the following:
>
>      if ((_buffer)->pre_tot_data > _pos)
>           return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
>      else
>           return ((_buffer)->buff[(_buffer)->cur_buff] + _pos -
> (_buffer)->pre_tot_data);
> ======================
> It is a problem in the buffer management, so without fixing this bug, users
> should not use guththila at this point.
>
> Thanks!
> Gordon
> ________________________________
> From: Gordon Brown <go...@yahoo.com>
> To: axis-c-dev@ws.apache.org; shankar@wso2.com; samisa@wso2.com
> Cc: axis-c-user@ws.apache.org
> Sent: Friday, June 5, 2009 2:15:42 PM
> Subject: Re: soap in client call contains gabage character -- A critical bug
> in guththila writer
>
> OK, since no one reply to my question, I have to debug the code and found
> out that guththila has a bug in managing buffer when seriazlize thea axiom
> tree (the soap structure) before actually send out the request, and I have a
> potential fix. This is really a critical bug I think, so I hope some
> developers can take a look at this problem. I am attaching the test
> input data and code snappet to reproduce the problem.
>
> Basically, the bug occurs in guththila_xml_writer.c.
> The guththila_xml_writer (I call it the soap serializer) maintains an array
> of buffers dynamically when it writes the soap structure into the buffers.
> The bug will occur in the following situation:
>
> Let's say I have an element <ns1:doDeleteFirst>12345</ns1:doDeleteFirst>
> somewhere in the soap structure. Now before this element, there are lots of
> other elements, and when the  guththila_xml_writer  trys to process this
> element, the first buffer is ALMOST full, it does not have enough space
> to write the whole element name <ns1:doDeleteFirst> (the start tag) into the
> buffer, it has to create a new buffer, so it writes <ns1: at the end of the
> first buffer (still a few more bytes left empty), and writes "doDeleteFirst"
> at the very beginning of the second buffer.
>
> The first buffer (Buffer length 16384):
> --------------------------------------------------------------------------
> |**************************************************<ns1:--|
>
> The second buffer (Buffer length 32768):
> ---------------------------------------------------------------------------------------------------------------------------
> |doDeleteFirst-------------------------------------------------------------------------------------------------------------|
>
> As the second buffer becomes the current buffer, when the writer trys to
> process the end tag (</ns1:doDeleteFirst>),  it uses an elem stack to track
> the namespace prefix and localname as in the following code: (starting from
> line 1396)
>
>
>           elem->name = guththila_tok_list_get_token(&wr->tok_list, env);
>
>           elem->prefix = guththila_tok_list_get_token(&wr->tok_list, env);
>
>           elem->name->start = GUTHTHILA_BUF_POS(wr->buffer, elem_start);
>
>           elem->name->size = elem_len;
>
>           elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer,
> elem_pref_start);
>
>           elem->prefix->size = pref_len;
>
>
> The macro GUTHTHILA_BUF_POS  is defined as this:
>
> #ifndef GUTHTHILA_BUF_POS
> #define GUTHTHILA_BUF_POS(_buffer, _pos)
>  ((_buffer).buff[(_buffer).cur_buff] + _pos - (_buffer).pre_tot_data)
> #endif
> The bug occurs when it calcuate elem->prefix->start =
> GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start):
>
> The elem_pref_start has a value of 16375, the pre_tot_data has a value of
> 16379 (the first buffer length is 16384), they are calculated based on the
> first buffer data, but the current buffer is the second one, so
> elem->prefix->start points to gabage!
>
> I hope this makes sense to you. Use my test case you will see this quickly.
> When you run the same XML data I attached, first set a break point at line
> 392 in the file guththila_xml_writer_wrapper, and set the hit count as 514
> in the break properties (the 514th element in <ns1:doDeleteFirst>), then
> debug step by step.
>
> The potential fix is to define GUTHTHILA_BUF_POS as the following:
>
>      if ((_buffer)->pre_tot_data > _pos)
>           return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
>      else
>           return ((_buffer)->buff[(_buffer)->cur_buff] + _pos -
> (_buffer)->pre_tot_data);
> GUTHTHILA_BUF_POS is used everywhere, so I really hope some developer can
> take over this case and fix it!
>
> Thanks!
> Gordon
>
> ________________________________
> From: Gordon Brown <go...@yahoo.com>
> To: axis-c-user@ws.apache.org
> Cc: axis-c-dev@ws.apache.org
> Sent: Wednesday, June 3, 2009 12:49:21 AM
> Subject: soap in client call contains gabage character -- Very very puzzling
>
> Hi All,
>
> I need urgent help with a very puzzling issue with axis2/c 1.6 ( I build the
> axis2/c using the code from trunk, slightly earlier before the offical
> release). Here is my issue:
>
> I have a small XML data (16K) passed in to be as a UTF8 string, I checked
> the XML data is good (run through quite a few other tools to verify it). Now
> I used axiom APIs to parse the XML and make web service call like this:
>
> =========
>
> xml_reader = axiom_xml_reader_create_for_memory(_env, (
>
> void*)xmlString_in.c_str(), xmlString_in.size(), "utf-8",
>
> AXIS2_XML_PARSER_TYPE_BUFFER);
>
>
>
> om_builder = axiom_stax_builder_create(_env, xml_reader);
>
>
>
> axiom_document_t *document = axiom_stax_builder_get_document(om_builder,
> _env);
>
>
>
> axiom_node_t * payload = axiom_document_get_root_element(document, _env);
>
>
>
> .........
>
>
>
> axiom_node_t * node = axis2_svc_client_send_receive(_wsf_service_client,
> _env, payload );
>
> ============
>
>
>
> Now I use tcpmon to intercept the call, I noticed that the data sent out
> contains some gabage characters (always in some XML tag, not the element
> value) like this:
>
>
>
>     <ns1:doDeleteFirst>12345</ù:doDeleteFirst>
>
>
>
> However, if I serialize the payload node before I make the client call, I
> can see the data is fine in memory. What puzzles me even more is that this
> thing only occur in one XML file I tried, but works fine for many other XML
> input (even as big as 10M bytes).  I've also attached the XML I used to
> procude the problem.
>
>
>
> Does anyone have a clue about this?
>
>
>
> Thanks much in advance!
>
> Gordon
>
>
>
>
>



-- 
S.Uthaiyashankar
Software Architect
WSO2 Inc.
http://wso2.com/ - "The Open Source SOA Company"



      

Re: soap in client call contains gabage character -- A critical bug in guththila writer

Posted by Uthaiyashankar <sh...@wso2.com>.
Hi Gordon,

I'll have a look. Can you create a jira and attach the patch and the test code?

Regards,
Shankar

On Mon, Jun 8, 2009 at 11:40 PM, Gordon Brown<go...@yahoo.com> wrote:
> Can anyone in the development team please take a look at this one bug in
> Guththila component?
> At least the potential fix I provided in this message thread?
>
> ======================
> The potential fix is to define GUTHTHILA_BUF_POS as the following:
>
>      if ((_buffer)->pre_tot_data > _pos)
>           return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
>      else
>           return ((_buffer)->buff[(_buffer)->cur_buff] + _pos -
> (_buffer)->pre_tot_data);
> ======================
> It is a problem in the buffer management, so without fixing this bug, users
> should not use guththila at this point.
>
> Thanks!
> Gordon
> ________________________________
> From: Gordon Brown <go...@yahoo.com>
> To: axis-c-dev@ws.apache.org; shankar@wso2.com; samisa@wso2.com
> Cc: axis-c-user@ws.apache.org
> Sent: Friday, June 5, 2009 2:15:42 PM
> Subject: Re: soap in client call contains gabage character -- A critical bug
> in guththila writer
>
> OK, since no one reply to my question, I have to debug the code and found
> out that guththila has a bug in managing buffer when seriazlize thea axiom
> tree (the soap structure) before actually send out the request, and I have a
> potential fix. This is really a critical bug I think, so I hope some
> developers can take a look at this problem. I am attaching the test
> input data and code snappet to reproduce the problem.
>
> Basically, the bug occurs in guththila_xml_writer.c.
> The guththila_xml_writer (I call it the soap serializer) maintains an array
> of buffers dynamically when it writes the soap structure into the buffers.
> The bug will occur in the following situation:
>
> Let's say I have an element <ns1:doDeleteFirst>12345</ns1:doDeleteFirst>
> somewhere in the soap structure. Now before this element, there are lots of
> other elements, and when the  guththila_xml_writer  trys to process this
> element, the first buffer is ALMOST full, it does not have enough space
> to write the whole element name <ns1:doDeleteFirst> (the start tag) into the
> buffer, it has to create a new buffer, so it writes <ns1: at the end of the
> first buffer (still a few more bytes left empty), and writes "doDeleteFirst"
> at the very beginning of the second buffer.
>
> The first buffer (Buffer length 16384):
> --------------------------------------------------------------------------
> |**************************************************<ns1:--|
>
> The second buffer (Buffer length 32768):
> ---------------------------------------------------------------------------------------------------------------------------
> |doDeleteFirst-------------------------------------------------------------------------------------------------------------|
>
> As the second buffer becomes the current buffer, when the writer trys to
> process the end tag (</ns1:doDeleteFirst>),  it uses an elem stack to track
> the namespace prefix and localname as in the following code: (starting from
> line 1396)
>
>
>           elem->name = guththila_tok_list_get_token(&wr->tok_list, env);
>
>           elem->prefix = guththila_tok_list_get_token(&wr->tok_list, env);
>
>           elem->name->start = GUTHTHILA_BUF_POS(wr->buffer, elem_start);
>
>           elem->name->size = elem_len;
>
>           elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer,
> elem_pref_start);
>
>           elem->prefix->size = pref_len;
>
>
> The macro GUTHTHILA_BUF_POS  is defined as this:
>
> #ifndef GUTHTHILA_BUF_POS
> #define GUTHTHILA_BUF_POS(_buffer, _pos)
>  ((_buffer).buff[(_buffer).cur_buff] + _pos - (_buffer).pre_tot_data)
> #endif
> The bug occurs when it calcuate elem->prefix->start =
> GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start):
>
> The elem_pref_start has a value of 16375, the pre_tot_data has a value of
> 16379 (the first buffer length is 16384), they are calculated based on the
> first buffer data, but the current buffer is the second one, so
> elem->prefix->start points to gabage!
>
> I hope this makes sense to you. Use my test case you will see this quickly.
> When you run the same XML data I attached, first set a break point at line
> 392 in the file guththila_xml_writer_wrapper, and set the hit count as 514
> in the break properties (the 514th element in <ns1:doDeleteFirst>), then
> debug step by step.
>
> The potential fix is to define GUTHTHILA_BUF_POS as the following:
>
>      if ((_buffer)->pre_tot_data > _pos)
>           return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
>      else
>           return ((_buffer)->buff[(_buffer)->cur_buff] + _pos -
> (_buffer)->pre_tot_data);
> GUTHTHILA_BUF_POS is used everywhere, so I really hope some developer can
> take over this case and fix it!
>
> Thanks!
> Gordon
>
> ________________________________
> From: Gordon Brown <go...@yahoo.com>
> To: axis-c-user@ws.apache.org
> Cc: axis-c-dev@ws.apache.org
> Sent: Wednesday, June 3, 2009 12:49:21 AM
> Subject: soap in client call contains gabage character -- Very very puzzling
>
> Hi All,
>
> I need urgent help with a very puzzling issue with axis2/c 1.6 ( I build the
> axis2/c using the code from trunk, slightly earlier before the offical
> release). Here is my issue:
>
> I have a small XML data (16K) passed in to be as a UTF8 string, I checked
> the XML data is good (run through quite a few other tools to verify it). Now
> I used axiom APIs to parse the XML and make web service call like this:
>
> =========
>
> xml_reader = axiom_xml_reader_create_for_memory(_env, (
>
> void*)xmlString_in.c_str(), xmlString_in.size(), "utf-8",
>
> AXIS2_XML_PARSER_TYPE_BUFFER);
>
>
>
> om_builder = axiom_stax_builder_create(_env, xml_reader);
>
>
>
> axiom_document_t *document = axiom_stax_builder_get_document(om_builder,
> _env);
>
>
>
> axiom_node_t * payload = axiom_document_get_root_element(document, _env);
>
>
>
> .........
>
>
>
> axiom_node_t * node = axis2_svc_client_send_receive(_wsf_service_client,
> _env, payload );
>
> ============
>
>
>
> Now I use tcpmon to intercept the call, I noticed that the data sent out
> contains some gabage characters (always in some XML tag, not the element
> value) like this:
>
>
>
>     <ns1:doDeleteFirst>12345</ù:doDeleteFirst>
>
>
>
> However, if I serialize the payload node before I make the client call, I
> can see the data is fine in memory. What puzzles me even more is that this
> thing only occur in one XML file I tried, but works fine for many other XML
> input (even as big as 10M bytes).  I've also attached the XML I used to
> procude the problem.
>
>
>
> Does anyone have a clue about this?
>
>
>
> Thanks much in advance!
>
> Gordon
>
>
>
>
>



-- 
S.Uthaiyashankar
Software Architect
WSO2 Inc.
http://wso2.com/ - "The Open Source SOA Company"

Re: soap in client call contains gabage character -- A critical bug in guththila writer

Posted by Uthaiyashankar <sh...@wso2.com>.
Hi Gordon,

I'll have a look. Can you create a jira and attach the patch and the test code?

Regards,
Shankar

On Mon, Jun 8, 2009 at 11:40 PM, Gordon Brown<go...@yahoo.com> wrote:
> Can anyone in the development team please take a look at this one bug in
> Guththila component?
> At least the potential fix I provided in this message thread?
>
> ======================
> The potential fix is to define GUTHTHILA_BUF_POS as the following:
>
>      if ((_buffer)->pre_tot_data > _pos)
>           return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
>      else
>           return ((_buffer)->buff[(_buffer)->cur_buff] + _pos -
> (_buffer)->pre_tot_data);
> ======================
> It is a problem in the buffer management, so without fixing this bug, users
> should not use guththila at this point.
>
> Thanks!
> Gordon
> ________________________________
> From: Gordon Brown <go...@yahoo.com>
> To: axis-c-dev@ws.apache.org; shankar@wso2.com; samisa@wso2.com
> Cc: axis-c-user@ws.apache.org
> Sent: Friday, June 5, 2009 2:15:42 PM
> Subject: Re: soap in client call contains gabage character -- A critical bug
> in guththila writer
>
> OK, since no one reply to my question, I have to debug the code and found
> out that guththila has a bug in managing buffer when seriazlize thea axiom
> tree (the soap structure) before actually send out the request, and I have a
> potential fix. This is really a critical bug I think, so I hope some
> developers can take a look at this problem. I am attaching the test
> input data and code snappet to reproduce the problem.
>
> Basically, the bug occurs in guththila_xml_writer.c.
> The guththila_xml_writer (I call it the soap serializer) maintains an array
> of buffers dynamically when it writes the soap structure into the buffers.
> The bug will occur in the following situation:
>
> Let's say I have an element <ns1:doDeleteFirst>12345</ns1:doDeleteFirst>
> somewhere in the soap structure. Now before this element, there are lots of
> other elements, and when the  guththila_xml_writer  trys to process this
> element, the first buffer is ALMOST full, it does not have enough space
> to write the whole element name <ns1:doDeleteFirst> (the start tag) into the
> buffer, it has to create a new buffer, so it writes <ns1: at the end of the
> first buffer (still a few more bytes left empty), and writes "doDeleteFirst"
> at the very beginning of the second buffer.
>
> The first buffer (Buffer length 16384):
> --------------------------------------------------------------------------
> |**************************************************<ns1:--|
>
> The second buffer (Buffer length 32768):
> ---------------------------------------------------------------------------------------------------------------------------
> |doDeleteFirst-------------------------------------------------------------------------------------------------------------|
>
> As the second buffer becomes the current buffer, when the writer trys to
> process the end tag (</ns1:doDeleteFirst>),  it uses an elem stack to track
> the namespace prefix and localname as in the following code: (starting from
> line 1396)
>
>
>           elem->name = guththila_tok_list_get_token(&wr->tok_list, env);
>
>           elem->prefix = guththila_tok_list_get_token(&wr->tok_list, env);
>
>           elem->name->start = GUTHTHILA_BUF_POS(wr->buffer, elem_start);
>
>           elem->name->size = elem_len;
>
>           elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer,
> elem_pref_start);
>
>           elem->prefix->size = pref_len;
>
>
> The macro GUTHTHILA_BUF_POS  is defined as this:
>
> #ifndef GUTHTHILA_BUF_POS
> #define GUTHTHILA_BUF_POS(_buffer, _pos)
>  ((_buffer).buff[(_buffer).cur_buff] + _pos - (_buffer).pre_tot_data)
> #endif
> The bug occurs when it calcuate elem->prefix->start =
> GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start):
>
> The elem_pref_start has a value of 16375, the pre_tot_data has a value of
> 16379 (the first buffer length is 16384), they are calculated based on the
> first buffer data, but the current buffer is the second one, so
> elem->prefix->start points to gabage!
>
> I hope this makes sense to you. Use my test case you will see this quickly.
> When you run the same XML data I attached, first set a break point at line
> 392 in the file guththila_xml_writer_wrapper, and set the hit count as 514
> in the break properties (the 514th element in <ns1:doDeleteFirst>), then
> debug step by step.
>
> The potential fix is to define GUTHTHILA_BUF_POS as the following:
>
>      if ((_buffer)->pre_tot_data > _pos)
>           return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
>      else
>           return ((_buffer)->buff[(_buffer)->cur_buff] + _pos -
> (_buffer)->pre_tot_data);
> GUTHTHILA_BUF_POS is used everywhere, so I really hope some developer can
> take over this case and fix it!
>
> Thanks!
> Gordon
>
> ________________________________
> From: Gordon Brown <go...@yahoo.com>
> To: axis-c-user@ws.apache.org
> Cc: axis-c-dev@ws.apache.org
> Sent: Wednesday, June 3, 2009 12:49:21 AM
> Subject: soap in client call contains gabage character -- Very very puzzling
>
> Hi All,
>
> I need urgent help with a very puzzling issue with axis2/c 1.6 ( I build the
> axis2/c using the code from trunk, slightly earlier before the offical
> release). Here is my issue:
>
> I have a small XML data (16K) passed in to be as a UTF8 string, I checked
> the XML data is good (run through quite a few other tools to verify it). Now
> I used axiom APIs to parse the XML and make web service call like this:
>
> =========
>
> xml_reader = axiom_xml_reader_create_for_memory(_env, (
>
> void*)xmlString_in.c_str(), xmlString_in.size(), "utf-8",
>
> AXIS2_XML_PARSER_TYPE_BUFFER);
>
>
>
> om_builder = axiom_stax_builder_create(_env, xml_reader);
>
>
>
> axiom_document_t *document = axiom_stax_builder_get_document(om_builder,
> _env);
>
>
>
> axiom_node_t * payload = axiom_document_get_root_element(document, _env);
>
>
>
> .........
>
>
>
> axiom_node_t * node = axis2_svc_client_send_receive(_wsf_service_client,
> _env, payload );
>
> ============
>
>
>
> Now I use tcpmon to intercept the call, I noticed that the data sent out
> contains some gabage characters (always in some XML tag, not the element
> value) like this:
>
>
>
>     <ns1:doDeleteFirst>12345</ù:doDeleteFirst>
>
>
>
> However, if I serialize the payload node before I make the client call, I
> can see the data is fine in memory. What puzzles me even more is that this
> thing only occur in one XML file I tried, but works fine for many other XML
> input (even as big as 10M bytes).  I've also attached the XML I used to
> procude the problem.
>
>
>
> Does anyone have a clue about this?
>
>
>
> Thanks much in advance!
>
> Gordon
>
>
>
>
>



-- 
S.Uthaiyashankar
Software Architect
WSO2 Inc.
http://wso2.com/ - "The Open Source SOA Company"

Re: soap in client call contains gabage character -- A critical bug in guththila writer

Posted by Gordon Brown <go...@yahoo.com>.
Can anyone in the development team please take a look at this one bug in Guththila component?
At least the potential fix I provided in this message thread?

======================
The potential fix is to define GUTHTHILA_BUF_POS as the following:
 
     if ((_buffer)->pre_tot_data > _pos)
          return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
     else
          return ((_buffer)->buff[(_buffer)->cur_buff] + _pos - (_buffer)->pre_tot_data);
======================

It is a problem in the buffer management, so without fixing this bug, users should not use guththila at this point.

Thanks!
Gordon

________________________________
From: Gordon Brown <go...@yahoo.com>
To: axis-c-dev@ws.apache.org; shankar@wso2.com; samisa@wso2.com
Cc: axis-c-user@ws.apache.org
Sent: Friday, June 5, 2009 2:15:42 PM
Subject: Re: soap in client call contains gabage character -- A critical bug in guththila writer


OK, since no one reply to my question, I have to debug the code and found out that guththila has a bug in managing buffer when seriazlize thea axiom tree (the soap structure) before actually send out the request, and I have a potential fix. This is really a critical bug I think, so I hope some developers can take a look at this problem. I am attaching the test input data and code snappet to reproduce the problem.

Basically, the bug occurs in guththila_xml_writer.c. The guththila_xml_writer (I call it the soap serializer) maintains an array of buffers dynamically when it writes the soap structure into the buffers. The bug will occur in the following situation: 

Let's say I have an element <ns1:doDeleteFirst>12345</ns1:doDeleteFirst> somewhere in the soap structure. Now before this element, there are lots of other elements, and when the  guththila_xml_writer  trys to process this element, the first buffer is ALMOST full, it does not have enough space to write the whole element name <ns1:doDeleteFirst> (the start tag) into the buffer, it has to create a new buffer, so it writes <ns1: at the end of the first buffer (still a few more bytes left empty), and writes "doDeleteFirst" at the very beginning of the second buffer.

The first buffer (Buffer length 16384):
--------------------------------------------------------------------------
|**************************************************<ns1:--|

The second buffer (Buffer length 32768):
---------------------------------------------------------------------------------------------------------------------------
|doDeleteFirst-------------------------------------------------------------------------------------------------------------|

As the second buffer becomes the current buffer, when the writer trys to process the end tag (</ns1:doDeleteFirst>),  it uses an elem stack to track the namespace prefix and localname as in the following code: (starting from line 1396)

          elem->name = guththila_tok_list_get_token(&wr->tok_list, env);
          elem->prefix = guththila_tok_list_get_token(&wr->tok_list, env);
          elem->name->start = GUTHTHILA_BUF_POS(wr->buffer, elem_start);
          elem->name->size = elem_len;
          elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start);
          elem->prefix->size = pref_len; 

The macro GUTHTHILA_BUF_POS  is defined as this:

#ifndef GUTHTHILA_BUF_POS
#define GUTHTHILA_BUF_POS(_buffer, _pos) 
 ((_buffer).buff[(_buffer).cur_buff] + _pos - (_buffer).pre_tot_data)
#endif

The bug occurs when it calcuate elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start):

The elem_pref_start has a value of 16375, the pre_tot_data has a value of 16379 (the first buffer length is 16384), they are calculated based on the first buffer data, but the current buffer is the second one, so  elem->prefix->start points to gabage!

I hope this makes sense to you. Use my test case you will see this quickly. When you run the same XML data I attached, first set a break point at line 392 in the file guththila_xml_writer_wrapper, and set the hit count as 514 in the break properties (the 514th element in <ns1:doDeleteFirst>), then debug step by step.

The potential fix is to define GUTHTHILA_BUF_POS as the following:
 
     if ((_buffer)->pre_tot_data > _pos)
          return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
     else
          return ((_buffer)->buff[(_buffer)->cur_buff] + _pos - (_buffer)->pre_tot_data);

GUTHTHILA_BUF_POS is used everywhere, so I really hope some developer can take over this case and fix it!

Thanks!
Gordon




________________________________
From: Gordon Brown <go...@yahoo.com>
To: axis-c-user@ws.apache.org
Cc: axis-c-dev@ws.apache.org
Sent: Wednesday, June 3, 2009 12:49:21 AM
Subject: soap in client call contains gabage character -- Very very puzzling


Hi All,

I need urgent help with a very puzzling issue with axis2/c 1.6 ( I build the axis2/c using the code from trunk, slightly earlier before the offical release). Here is my issue:

I have a small XML data (16K) passed in to be as a UTF8 string, I checked the XML data is good (run through quite a few other tools to verify it). Now I used axiom APIs to parse the XML and make web service call like this:

=========
xml_reader = axiom_xml_reader_create_for_memory(_env, (
AXIS2_XML_PARSER_TYPE_BUFFER);
 
om_builder = axiom_stax_builder_create(_env, xml_reader);void*)xmlString_in.c_str(), xmlString_in.size(), "utf-8", 
axiom_document_t *document = axiom_stax_builder_get_document(om_builder, _env); 
axiom_node_t * payload = axiom_document_get_root_element(document, _env);
 
..........
 axiom_node_t * node = axis2_svc_client_send_receive(_wsf_service_client, _env, payload );
============
 
Now I use tcpmon to intercept the call, I noticed that the data sent out contains some gabage characters (always in some XML tag, not the element value) like this:
 
    <ns1:doDeleteFirst>12345</ù:doDeleteFirst>
 
However, if I serialize the payload node before I make the client call, I can see the data is fine in memory. What puzzles me even more is that this thing only occur in one XML file I tried, but works fine for many other XML input (even as big as 10M bytes).  I've also attached the XML I used to procude the problem.
 
Does anyone have a clue about this?
 
Thanks much in advance!

Gordon


      

Re: soap in client call contains gabage character -- A critical bug in guththila writer

Posted by Gordon Brown <go...@yahoo.com>.
Can anyone in the development team please take a look at this one bug in Guththila component?
At least the potential fix I provided in this message thread?

======================
The potential fix is to define GUTHTHILA_BUF_POS as the following:
 
     if ((_buffer)->pre_tot_data > _pos)
          return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
     else
          return ((_buffer)->buff[(_buffer)->cur_buff] + _pos - (_buffer)->pre_tot_data);
======================

It is a problem in the buffer management, so without fixing this bug, users should not use guththila at this point.

Thanks!
Gordon

________________________________
From: Gordon Brown <go...@yahoo.com>
To: axis-c-dev@ws.apache.org; shankar@wso2.com; samisa@wso2.com
Cc: axis-c-user@ws.apache.org
Sent: Friday, June 5, 2009 2:15:42 PM
Subject: Re: soap in client call contains gabage character -- A critical bug in guththila writer


OK, since no one reply to my question, I have to debug the code and found out that guththila has a bug in managing buffer when seriazlize thea axiom tree (the soap structure) before actually send out the request, and I have a potential fix. This is really a critical bug I think, so I hope some developers can take a look at this problem. I am attaching the test input data and code snappet to reproduce the problem.

Basically, the bug occurs in guththila_xml_writer.c. The guththila_xml_writer (I call it the soap serializer) maintains an array of buffers dynamically when it writes the soap structure into the buffers. The bug will occur in the following situation: 

Let's say I have an element <ns1:doDeleteFirst>12345</ns1:doDeleteFirst> somewhere in the soap structure. Now before this element, there are lots of other elements, and when the  guththila_xml_writer  trys to process this element, the first buffer is ALMOST full, it does not have enough space to write the whole element name <ns1:doDeleteFirst> (the start tag) into the buffer, it has to create a new buffer, so it writes <ns1: at the end of the first buffer (still a few more bytes left empty), and writes "doDeleteFirst" at the very beginning of the second buffer.

The first buffer (Buffer length 16384):
--------------------------------------------------------------------------
|**************************************************<ns1:--|

The second buffer (Buffer length 32768):
---------------------------------------------------------------------------------------------------------------------------
|doDeleteFirst-------------------------------------------------------------------------------------------------------------|

As the second buffer becomes the current buffer, when the writer trys to process the end tag (</ns1:doDeleteFirst>),  it uses an elem stack to track the namespace prefix and localname as in the following code: (starting from line 1396)

          elem->name = guththila_tok_list_get_token(&wr->tok_list, env);
          elem->prefix = guththila_tok_list_get_token(&wr->tok_list, env);
          elem->name->start = GUTHTHILA_BUF_POS(wr->buffer, elem_start);
          elem->name->size = elem_len;
          elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start);
          elem->prefix->size = pref_len; 

The macro GUTHTHILA_BUF_POS  is defined as this:

#ifndef GUTHTHILA_BUF_POS
#define GUTHTHILA_BUF_POS(_buffer, _pos) 
 ((_buffer).buff[(_buffer).cur_buff] + _pos - (_buffer).pre_tot_data)
#endif

The bug occurs when it calcuate elem->prefix->start = GUTHTHILA_BUF_POS(wr->buffer, elem_pref_start):

The elem_pref_start has a value of 16375, the pre_tot_data has a value of 16379 (the first buffer length is 16384), they are calculated based on the first buffer data, but the current buffer is the second one, so  elem->prefix->start points to gabage!

I hope this makes sense to you. Use my test case you will see this quickly. When you run the same XML data I attached, first set a break point at line 392 in the file guththila_xml_writer_wrapper, and set the hit count as 514 in the break properties (the 514th element in <ns1:doDeleteFirst>), then debug step by step.

The potential fix is to define GUTHTHILA_BUF_POS as the following:
 
     if ((_buffer)->pre_tot_data > _pos)
          return ((_buffer)->buff[(_buffer)->cur_buff-1] + _pos);
     else
          return ((_buffer)->buff[(_buffer)->cur_buff] + _pos - (_buffer)->pre_tot_data);

GUTHTHILA_BUF_POS is used everywhere, so I really hope some developer can take over this case and fix it!

Thanks!
Gordon




________________________________
From: Gordon Brown <go...@yahoo.com>
To: axis-c-user@ws.apache.org
Cc: axis-c-dev@ws.apache.org
Sent: Wednesday, June 3, 2009 12:49:21 AM
Subject: soap in client call contains gabage character -- Very very puzzling


Hi All,

I need urgent help with a very puzzling issue with axis2/c 1.6 ( I build the axis2/c using the code from trunk, slightly earlier before the offical release). Here is my issue:

I have a small XML data (16K) passed in to be as a UTF8 string, I checked the XML data is good (run through quite a few other tools to verify it). Now I used axiom APIs to parse the XML and make web service call like this:

=========
xml_reader = axiom_xml_reader_create_for_memory(_env, (
AXIS2_XML_PARSER_TYPE_BUFFER);
 
om_builder = axiom_stax_builder_create(_env, xml_reader);void*)xmlString_in.c_str(), xmlString_in.size(), "utf-8", 
axiom_document_t *document = axiom_stax_builder_get_document(om_builder, _env); 
axiom_node_t * payload = axiom_document_get_root_element(document, _env);
 
..........
 axiom_node_t * node = axis2_svc_client_send_receive(_wsf_service_client, _env, payload );
============
 
Now I use tcpmon to intercept the call, I noticed that the data sent out contains some gabage characters (always in some XML tag, not the element value) like this:
 
    <ns1:doDeleteFirst>12345</ù:doDeleteFirst>
 
However, if I serialize the payload node before I make the client call, I can see the data is fine in memory. What puzzles me even more is that this thing only occur in one XML file I tried, but works fine for many other XML input (even as big as 10M bytes).  I've also attached the XML I used to procude the problem.
 
Does anyone have a clue about this?
 
Thanks much in advance!

Gordon