You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Scott Cantor <ca...@osu.edu> on 2004/10/13 04:39:06 UTC

Base64 validator even more strict?

I've been running into problems with the way data type validation of
base64Binary is done forever, but this is hitting critical now. ;-(

If I'm understanding correctly, the 2.6.0 changes now require that datatype
normalization be on (I note the default is off) in order to get very typical
base64 content to validate. An example being signed XML, in which use of
most XMLSig libraries will almost always include extra linefeeds between
elements and the values:

<SignatureValue>
Base64stuff
</SignatureValue>

With normalization, the extra linefeeds are removed first, turned to spaces,
and then removed with the new code before the data is validated. Without it,
the new stricter code fails because it leaves the leading linefeed and then
computes the byte count as a non-multiple of 4.

The problem here is that datatype normalization often *cannot* be used if:

- you're signing content that includes base64 and the initial sign takes
place over unnormalized data
- you're intending to process the base64 with, say, OpenSSL, which chokes on
the base64 data if it's in a collapsed form without linefeeds

Basically XML-Security + Xerces 2.6.0 + schema validation is now approaching
incompatible.

The Xerces-J code works because it's much less strict about whitespace
during initial validation, so leaving normalization off is a viable
workaround for these kinds of signature issues.

I guess I'm just scrambling for advice here. I can't quite see how this
works anymore in practice. I can't exactly file a bug. This is technically
correct behavior. It's just not usable behavior.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Base64 validator even more strict?

Posted by Scott Cantor <ca...@osu.edu>.
>     In order to be consistent with XercesJ, and actually that is what
> XercesJ does now, is to expose the normalized/unnormalized value in DOM
> based on the DOM L3's normalization_data flag, while always get internal
> representation normalized and in your case, the base64 data 
> containing line feed, and multiple whitespaces be validated 
> and accepted should the normalized base64 is valid. 

Ah, now I understand, thanks. The data type validator can work on a
normalized version even if the DOM is left unnormalized. That makes sense.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Base64 validator even more strict?

Posted by PeiYong Zhang <pe...@ca.ibm.com>.
Scott,

    In order to be consistent with XercesJ, and actually that is what 
XercesJ does now, is to
expose the normalized/unnormalized value in DOM based on the DOM L3's 
normalization_data 
flag, while always get internal representation normalized and in your 
case, the base64 data
containing line feed, and multiple whitespaces be validated and accepted 
should the normalized
base64 is valid.

Rgds,
PeiYong





"Scott Cantor" <ca...@osu.edu> 
10/22/2004 12:14 PM
Please respond to
xerces-c-dev


To
<xe...@xml.apache.org>
cc

Subject
RE: Base64 validator even more strict?






> The current behavior needs to change so that the flag has an effect on 
> DOM content only (more precisely, element content only. Attribute
> normalization is always done). 

Sorry if I'm being obtuse, but this is fairly critical for my library, so
I'm trying to be clear...

As far as I know, the current behavior regarding normalization is fully
correct. It normalizes when true and not when false. The fact that
normalizing breaks signatures is just a nasty side effect of the schema 
spec
and the canonicalization specs not mixing well. It's bad, but it can't
really be helped, unless the normalized values are exposed through a
different API from the DOM, since the xml-security code will use the DOM.

But, the problem in 2.6.0 is that the base64 datatype validator *cannot*
accept unnormalized values anymore. It did before. IMHO, this isn't
technically a bug. But it does make 2.6.0 unusable in my application and I
have to distribute my own source for Xerces to get around this (I also
needed a fix for the unfixed xml:lang bug, so I patched that too).

Xerces-J is different. They are not following the letter of the spec, I
suppose. But it also functions. Sometimes that's the overriding concern as 
a
programmer.

So I guess I'm pleading for Xerces-C to be consistent with Xerces-J, but 
if
not, I'll have to work around the problem for now and later just stop
validating. Or parse twice, but that won't be desirable.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



RE: Base64 validator even more strict?

Posted by Scott Cantor <ca...@osu.edu>.
> The current behavior needs to change so that the flag has an effect on 
> DOM content only (more precisely, element content only. Attribute
> normalization is always done). 

Sorry if I'm being obtuse, but this is fairly critical for my library, so
I'm trying to be clear...

As far as I know, the current behavior regarding normalization is fully
correct. It normalizes when true and not when false. The fact that
normalizing breaks signatures is just a nasty side effect of the schema spec
and the canonicalization specs not mixing well. It's bad, but it can't
really be helped, unless the normalized values are exposed through a
different API from the DOM, since the xml-security code will use the DOM.

But, the problem in 2.6.0 is that the base64 datatype validator *cannot*
accept unnormalized values anymore. It did before. IMHO, this isn't
technically a bug. But it does make 2.6.0 unusable in my application and I
have to distribute my own source for Xerces to get around this (I also
needed a fix for the unfixed xml:lang bug, so I patched that too).

Xerces-J is different. They are not following the letter of the spec, I
suppose. But it also functions. Sometimes that's the overriding concern as a
programmer.

So I guess I'm pleading for Xerces-C to be consistent with Xerces-J, but if
not, I'll have to work around the problem for now and later just stop
validating. Or parse twice, but that won't be desirable.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Base64 validator even more strict?

Posted by PeiYong Zhang <pe...@ca.ibm.com>.
Scott,

    Yes, the DOM content get normalized as well and therefore you can't 
get
your raw data back.

     The current behavior needs to change so that the flag has an effect 
on
DOM content only (more precisely, element content only. Attribute 
normalization
is always done).

"datatype-normalization"
true
[optional]
Expose schema normalized values in the tree, such as XML Schema normalized 
values in the case of XML Schema. Since this parameter requires to have 
schema information, the "validate" parameter will also be set to true. 
Having this parameter activated when "validate" is false has no effect and 
no schema-normalization will happen. 
Note: Since the document contains the result of the XML 1.0 processing, 
this parameter does not apply to attribute value normalization as defined 
in section 3.3.3 of [XML 1.0] and is only meant for schema languages other 
than Document Type Definition (DTD). 
false
[required] (default)
Do not perform schema normalization on the tree. 

Rgds,
PeiYong




"Scott Cantor" <ca...@osu.edu> 
10/21/2004 04:22 PM
Please respond to
xerces-c-dev


To
<xe...@xml.apache.org>
cc

Subject
RE: Base64 validator even more strict?






>    Yes, this flag shall not affect content validity (in this case, the
> base64 data), and this behavior is to be changed. 

Thank you, if this gets fixed, I owe you a beverage of your choice!

>     However, I noticed that in the DOM tree, the content 
> remains unnormlized irregardless of the flag (of course this is not
> correct either). I think you can set the normalization on to 
> get your base64 through and at the mean time still have the 
> unnormlized data from the DOM tree. Can you elabrate more on this?

My code suggest this isn't the case. When I turn the datatype normalize 
flag
on in my parser object, it does in fact change the base64 data placed into
the DOM tree (which is certainly expected behavior). When I grab the text
node's value, it's definitely normalized.

I know this because when I set the flag on (which as you say, does get it
past the 2.6 validator), OpenSSL chokes. I pass the node values into 
OpenSSL
calls to create X509 objects, for example, and the missing linefeeds cause
the decoder there to complain.

The original issue I used to have that caused me to run with normalization
off was that signature computation broke. So that also suggests that it is
changing the data in the DOM.

If you're seeing different behavior, can you send me a test program to 
try?

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



RE: Base64 validator even more strict?

Posted by Scott Cantor <ca...@osu.edu>.
>    Yes, this flag shall not affect content validity (in this case, the
> base64 data), and this behavior is to be changed. 

Thank you, if this gets fixed, I owe you a beverage of your choice!

>     However, I noticed that in the DOM tree, the content 
> remains unnormlized irregardless of the flag (of course this is not
> correct either). I think you can set the normalization on to 
> get your base64 through and at the mean time still have the 
> unnormlized data from the DOM tree. Can you elabrate more on this?

My code suggest this isn't the case. When I turn the datatype normalize flag
on in my parser object, it does in fact change the base64 data placed into
the DOM tree (which is certainly expected behavior). When I grab the text
node's value, it's definitely normalized.

I know this because when I set the flag on (which as you say, does get it
past the 2.6 validator), OpenSSL chokes. I pass the node values into OpenSSL
calls to create X509 objects, for example, and the missing linefeeds cause
the decoder there to complain.

The original issue I used to have that caused me to run with normalization
off was that signature computation broke. So that also suggests that it is
changing the data in the DOM.

If you're seeing different behavior, can you send me a test program to try?

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Base64 validator even more strict?

Posted by PeiYong Zhang <pe...@ca.ibm.com>.
Scott,

   Yes, this flag shall not affect content validity (in this case, the 
base64 data), and this
behavior is to be changed.

    However, I noticed that in the DOM tree, the content remains 
unnormlized irregardless
of the flag (of course this is not correct either). I think you can set 
the normalization on to 
get your base64 through and at the mean time still have the unnormlized 
data from the 
DOM tree. Can you elabrate more on this? thanks.

Rgds,
PeiYong




"Scott Cantor" <ca...@osu.edu> 
10/12/2004 10:39 PM
Please respond to
xerces-c-dev


To
<xe...@xml.apache.org>
cc

Subject
Base64 validator even more strict?






I've been running into problems with the way data type validation of
base64Binary is done forever, but this is hitting critical now. ;-(

If I'm understanding correctly, the 2.6.0 changes now require that 
datatype
normalization be on (I note the default is off) in order to get very 
typical
base64 content to validate. An example being signed XML, in which use of
most XMLSig libraries will almost always include extra linefeeds between
elements and the values:

<SignatureValue>
Base64stuff
</SignatureValue>

With normalization, the extra linefeeds are removed first, turned to 
spaces,
and then removed with the new code before the data is validated. Without 
it,
the new stricter code fails because it leaves the leading linefeed and 
then
computes the byte count as a non-multiple of 4.

The problem here is that datatype normalization often *cannot* be used if:

- you're signing content that includes base64 and the initial sign takes
place over unnormalized data
- you're intending to process the base64 with, say, OpenSSL, which chokes 
on
the base64 data if it's in a collapsed form without linefeeds

Basically XML-Security + Xerces 2.6.0 + schema validation is now 
approaching
incompatible.

The Xerces-J code works because it's much less strict about whitespace
during initial validation, so leaving normalization off is a viable
workaround for these kinds of signature issues.

I guess I'm just scrambling for advice here. I can't quite see how this
works anymore in practice. I can't exactly file a bug. This is technically
correct behavior. It's just not usable behavior.

-- Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org