You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@santuario.apache.org by Ling Xiaohan <li...@cn.fujitsu.com> on 2008/10/16 07:34:32 UTC

Attribute normalization !!

Hi,

    I am using apache XMLdsig(1.4.2) to canonicalize XML file.

    The W3C Recommendation "Canonical XML" said that "Attribute
values are normalized, as if by a validating processor".

    And paragraph 3.3.3 Attribute-Value Normalization of XML1.1
Recommendation said that "If the attribute type is not CDATA,
then the XML processor MUST further process the normalized
attribute value by discarding any leading and trailing space (#x20)
characters, and by replacing sequences of space (#x20) characters
by a single space (#x20) character".

    When inputing a XML segment containing an attribute (normal,
not specified CDATA) like
    ...
    <a attr="   abc               abc   ">
    ...
the canonicalized output is still
    ...
    <a attr="   abc               abc   ">
    ...
where leading and trailing spaces were not removed and sequences
of space between value "abc"s were not replaced with a single space.

Could anyone tell me why?
Thank you very much.

______________________________________________________________________________________________
nolen

Re: Attribute normalization !!

Posted by Ling Xiaohan <li...@cn.fujitsu.com>.
I see, XMLdsig may handle the case correctly, because attribute not declared 
is regarded as
CDATA. Example below contains attribute not declared.
  ----- Original Message ----- 
  From: Jesse Pelton
  To: security-dev@xml.apache.org
  Sent: Thursday, October 16, 2008 9:54 PM
  Subject: RE: Attribute normalization !!


  According to section 3.3.1, "XML attribute types are of three kinds: a 
string type, a set of tokenized types, and enumerated types," and string 
types are CDATA. In addition, section 3.3.3 says, "All attributes for which 
no declaration has been read SHOULD be treated by a non-validating processor 
as if declared CDATA."

  If you have declared your attribute to be one of the tokenized types or an 
enumerated type, normalization should collapse whitespace, and the behavior 
you describe sounds like a bug. Otherwise, normalization should not collapse 
whitespace.



------------------------------------------------------------------------------
  From: Ling Xiaohan [mailto:lingxh@cn.fujitsu.com]
  Sent: Thursday, October 16, 2008 1:35 AM
  To: security-dev@xml.apache.org
  Subject: Attribute normalization !!


  Hi,

      I am using apache XMLdsig(1.4.2) to canonicalize XML file.

      The W3C Recommendation "Canonical XML" said that "Attribute
  values are normalized, as if by a validating processor".

      And paragraph 3.3.3 Attribute-Value Normalization of XML1.1
  Recommendation said that "If the attribute type is not CDATA,
  then the XML processor MUST further process the normalized
  attribute value by discarding any leading and trailing space (#x20)
  characters, and by replacing sequences of space (#x20) characters
  by a single space (#x20) character".

      When inputing a XML segment containing an attribute (normal,
  not specified CDATA) like
      ...
      <a attr="   abc               abc   ">
      ...
  the canonicalized output is still
      ...
      <a attr="   abc               abc   ">
      ...
  where leading and trailing spaces were not removed and sequences
  of space between value "abc"s were not replaced with a single space.

  Could anyone tell me why?
  Thank you very much.

  ______________________________________________________________________________________________
  nolen

RE: Attribute normalization !!

Posted by Jesse Pelton <js...@PKC.com>.
According to section 3.3.1
<http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-attribute-types> ,
"XML attribute types are of three kinds: a string type, a set of
tokenized types, and enumerated types," and string types are CDATA. In
addition, section 3.3.3 says, "All attributes for which no declaration
has been read SHOULD be treated by a non-validating processor as if
declared CDATA."
 
If you have declared your attribute to be one of the tokenized types or
an enumerated type, normalization should collapse whitespace, and the
behavior you describe sounds like a bug. Otherwise, normalization should
not collapse whitespace.

________________________________

From: Ling Xiaohan [mailto:lingxh@cn.fujitsu.com] 
Sent: Thursday, October 16, 2008 1:35 AM
To: security-dev@xml.apache.org
Subject: Attribute normalization !!


Hi,
 
    I am using apache XMLdsig(1.4.2) to canonicalize XML file.
 
    The W3C Recommendation "Canonical XML" said that "Attribute
values are normalized, as if by a validating processor". 
 
    And paragraph 3.3.3 Attribute-Value Normalization of XML1.1
Recommendation said that "If the attribute type is not CDATA, 
then the XML processor MUST further process the normalized 
attribute value by discarding any leading and trailing space (#x20) 
characters, and by replacing sequences of space (#x20) characters 
by a single space (#x20) character". 
 
    When inputing a XML segment containing an attribute (normal,   
not specified CDATA) like
    ...
    <a attr="   abc               abc   "> 
    ...
the canonicalized output is still
    ... 
    <a attr="   abc               abc   "> 
    ...
where leading and trailing spaces were not removed and sequences 
of space between value "abc"s were not replaced with a single space.
 
Could anyone tell me why? 
Thank you very much.
 
________________________________________________________________________
______________________
nolen