You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@axis.apache.org by "Bill Mitchell (JIRA)" <ji...@apache.org> on 2007/12/29 21:03:44 UTC

[jira] Created: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

guththila parser fails to handle escape sequences for ampersand, less than, greater than
----------------------------------------------------------------------------------------

                 Key: AXIS2C-859
                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
             Project: Axis2-C
          Issue Type: Bug
          Components: guththila
    Affects Versions: Current (Nightly)
         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
            Reporter: Bill Mitchell


When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)

Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


Re: [jira] Created: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by Senaka Fernando <se...@wso2.com>.
> guththila parser fails to handle escape sequences for ampersand, less
> than, greater than
> ----------------------------------------------------------------------------------------
>
>                  Key: AXIS2C-859
>                  URL: https://issues.apache.org/jira/browse/AXIS2C-859
>              Project: Axis2-C
>           Issue Type: Bug
>           Components: guththila
>     Affects Versions: Current (Nightly)
>          Environment: Windows XP, Visual Studio 2005, guththila parser,
> libcurl
>             Reporter: Bill Mitchell
>
>
> When an incoming message contains within text the escaped ampersand
> sequence, "&amp;", this sequence is being passed to the client as raw text
> without being converted to the single ampersand character.  Clearly, this
> action must take place at the level of the parser, as only the parser
> knows whether it is seeing simple text, and conversion is required, or
> text embedded in a CDATA section, where conversion is not allowed.  I have
> tested the build with the libxml parser, and of course the libxml parser
> behaves correctly: the text passed to the client contains only the single
> ampersand character, not the escaped sequence.  (See section 2.4 of XML
> 1.0 spec.)
>
> Looking at the code, I expect the same problem occurs with all escaped
> sequences, less than and greater than as well as ampersand, on both input
> and output.  I also don't see where CDATA sections are handled, but as I
> am not seeing CDATA in the messages from the service I am hitting, I have
> not tested this case.

I'm +1 for a fix. It is the parser's responsibility to figure this out.

Regards,
Senaka

>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-dev-help@ws.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Updated: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Bill Mitchell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Mitchell updated AXIS2C-859:
---------------------------------

    Attachment:     (was: diff_2.txt)

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt, diff_1.txt, diff_2.txt, guththila_xml_writer.diff
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Commented: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Bill Mitchell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563745#action_12563745 ] 

Bill Mitchell commented on AXIS2C-859:
--------------------------------------

Thanks for picking up this issue, Lahiru.  I was thinking about starting to look at it in detail myself.    

Examining the patch, I have a couple thoughts.

First, you allocate a block of memory, escape_char, to hold the copy of the token to this point.  But the size of the block is 4 on most machines, the sizeof a pointer to char.  So this size will frequently not be enough to copy all the characters preceding the escaped character.  

It occurs to me that guththila tries to go to a lot of effort to avoid allocating memory.  Having worked on some issues recently in the buffer management code, I would propose a different solution: moving the data in the buffer itself.  Although the obvious solution would be to replace the escaped sequence with the intended character and slide the remainder of the buffer down, this could be timeconsuming.  A clever idea might be to replace the escaped sequence, placing the intended character at the end of the sequence, and copy the characters up from the start of the token, moving the token start up and reducing its size.  In most cases, this would not be a large amount of data to move, and it avoids the memory allocation entirely.  

Second, I like where you chose to insert this code, in the token_close logic.  Although I can imagine trying to make this part of guththila_next, where it could massage the buffer contents while it was deciding where the token boundaries are, it seems best to leave that logic deciding where the edges of the tokens are without changing the characters inside the tokens.  

Third, looking at the examples of character escaping in various texts, it appears that one can find escaped character sequences in text and in attribute values.  So this logic either needs to be duplicated, not pretty, or pushed down into a lower level shared routine.

Fourth, you inserted this logic in the _char_data: case.  It appears to me from the XML documentation that we are supposed to replace sequences in text, but not in comments.  guththila_next() seems to confuse this issue, as it treats them both as _char_data.  To distinguish the two, my guess is it would be better to define a new token type, rather than cheat and look at the m->guththila_event to tell them apart.  A new token type might point the direction to solving the CDATA problem, whenever that gets approached.  Maybe use _char_data for the raw char data, without processing, and a new _text_data for char data that undergoes processing of entity sequences.  

Fifth, when checking the following characters after the ampersand, it would be best to check first that enough characters are left in the token, before looking at the characters themselves and perhaps falling off the end of the buffer.  

Of course, I'm relatively new to this logic, so these are just my observations.  

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Updated: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Lahiru Gunathilake (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lahiru Gunathilake updated AXIS2C-859:
--------------------------------------

    Attachment: diff_1.txt

Hi Bill,
I've implement this patch according to your fourth and fifth suggestions.I doubt on how to implement support escape characters in attribute values.We can simply use the we use with text value.As you told it's not pretty.But i'm confuse on how to implement it in lower level.
And the next thing is the performance hit.With this patch i did a benchmark and there's no performance hit with this patch.Any ideas....

regs
lahiru

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt, diff_1.txt
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Commented: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Bill Mitchell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564199#action_12564199 ] 

Bill Mitchell commented on AXIS2C-859:
--------------------------------------

Lahira, after yesterday I researched again the XML spec and I find that it says that replacement of XML characters and entity references happens on the URI to generate the normalized value.  So it seems we have to do this character replacement logic on the attribute value string before we process it as a possible namespace declaration.  Just another extra wrinkle.  

My "second" item above alluded to a different solution, built into guththila_next() instead of guththila_token_close().  One could imagine, in the "right" loops in guththila_next where we are looking at the characters one at a time anyway, we could detect the leading ampersand, check the next 4 or 5 characters against the XML character reference values, and replace the character there, again as above sliding the leading part of the token to abut the smaller single character.  This would avoid a second pass over the token characters looking for the ampersands, but I suspect it would make guththila_next() much harder to understand than it already is.  So my second point above was just to say that I think you have chosen the better approach, to handle this issue of XML character entities in guthtila_token_close() well separate from the token parsing in guththila_next().

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Updated: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Dinesh Premalal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dinesh Premalal updated AXIS2C-859:
-----------------------------------

    Fix Version/s: 1.3.0

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>             Fix For: 1.3.0
>
>         Attachments: diff.txt, diff_1.txt, diff_2.txt, guththila_xml_writer.diff
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Updated: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Bill Mitchell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Mitchell updated AXIS2C-859:
---------------------------------

    Attachment: guththila_xml_writer.diff
                diff_2.txt

Lahiru, looking at the code again, I now agree that you were right to replace the character by sliding the token data down.  I was under the mistaken impression that the code was sliding all the rest of the buffer down; as long as we are sliding from one end or the other of the token, there is no reason not do the obvious slide down.  

In the attached diff_2.txt, I moved the code to perform the replacement into a lower level routine.  As guththila_close_token has constructed a temp token in both the text case and the attribute value case, it is easy to perform replacement on this temp token string before further processing of the attribute for a namespace declaration.  Beware that the line number where we change the token type to _text may be different in yours; my version includes changes for AXIS2C-933 that Supun wants to review before they are applied.  

Separately, in the attached guththila_xml_writer.diff, is a patch to other side of this issue, the insertion of character sequences on outgoing messages that include ampersand or greater than in the text.  

With both fixes installed, I was able to see ampersand data characters from the client arrive at the server intact, and vice versa.  

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt, diff_1.txt, diff_2.txt, guththila_xml_writer.diff
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Updated: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Lahiru Gunathilake (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lahiru Gunathilake updated AXIS2C-859:
--------------------------------------

    Attachment: diff.txt

Now guththila parses escape charactor.Please go through the patch file and apply this since this may reduce the performance of Guththila parser. 

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Updated: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Bill Mitchell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Mitchell updated AXIS2C-859:
---------------------------------

    Attachment: diff_2.txt

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt, diff_1.txt, diff_2.txt, guththila_xml_writer.diff
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Resolved: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Bill Mitchell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Mitchell resolved AXIS2C-859.
----------------------------------

    Resolution: Fixed
      Assignee: Bill Mitchell

I checked in Lahiru and my fixes to guththila_xml_parser and guththila_xml_writer in svn rev 618378.

> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>            Assignee: Bill Mitchell
>             Fix For: 1.3.0
>
>         Attachments: diff.txt, diff_1.txt, diff_2.txt, guththila_xml_writer.diff
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org


[jira] Commented: (AXIS2C-859) guththila parser fails to handle escape sequences for ampersand, less than, greater than

Posted by "Lahiru Gunathilake (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AXIS2C-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563814#action_12563814 ] 

Lahiru Gunathilake commented on AXIS2C-859:
-------------------------------------------

Thanks Bill for the comment.Here's my idea on your comment,

First:This is very similar thing what i have done and only difference is there's no memory allocation in this method.However if it's better to avoid another memory allocation i will do it in that way.

Second : This is not much clear for me.If you can explain some more that'll be great.

Third Forth Fifth: +1for all three ideas. I'll implement these in that way.

Regards
Lahiru


> guththila parser fails to handle escape sequences for ampersand, less than, greater than
> ----------------------------------------------------------------------------------------
>
>                 Key: AXIS2C-859
>                 URL: https://issues.apache.org/jira/browse/AXIS2C-859
>             Project: Axis2-C
>          Issue Type: Bug
>          Components: guththila
>    Affects Versions: Current (Nightly)
>         Environment: Windows XP, Visual Studio 2005, guththila parser, libcurl
>            Reporter: Bill Mitchell
>         Attachments: diff.txt
>
>
> When an incoming message contains within text the escaped ampersand sequence, "&amp;", this sequence is being passed to the client as raw text without being converted to the single ampersand character.  Clearly, this action must take place at the level of the parser, as only the parser knows whether it is seeing simple text, and conversion is required, or text embedded in a CDATA section, where conversion is not allowed.  I have tested the build with the libxml parser, and of course the libxml parser behaves correctly: the text passed to the client contains only the single ampersand character, not the escaped sequence.  (See section 2.4 of XML 1.0 spec.)
> Looking at the code, I expect the same problem occurs with all escaped sequences, less than and greater than as well as ampersand, on both input and output.  I also don't see where CDATA sections are handled, but as I am not seeing CDATA in the messages from the service I am hitting, I have not tested this case.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-dev-help@ws.apache.org