You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Amit K <kl...@gmail.com> on 2012/07/19 23:00:35 UTC
question regarding XMLUri::isValidRegistryBasedAuthority
Hi,
I am using the C++ distribution of xerces 2.7, the implementation of
the function in question in it is:
bool XMLUri::isValidRegistryBasedAuthority(const XMLCh* const authority,
const int authLen)
{
// check authority int index = 0;
while (index < authLen)
{
if (isUnreservedCharacter(authority[index]) ||
(XMLString::indexOf(REG_NAME_CHARACTERS, authority[index]) != -1))
{
index++;
}
else if (authority[index] == chPercent) // '%' {
*if (XMLString::isHex(authority[index+1]) && // 1st
hex XMLString::isHex(authority[index+2]) ) // 2nd
hex index +=3;*
else
return false;
}
else
return false;
} //while
return true;
}
I've boldened the lines which I want to further discuss here.
and also note that I've seen that the implementation is the same in
the latest version of xerces.
However, I noticed that the Java implementation is different in
relation to the same lines:
/**
+ * Determines whether the given string is a registry based authority.
+ *
+ * @param authority the authority component of a URI
+ *
+ * @return true if the given string is a registry based authority
+ */
+ private boolean isValidRegistryBasedAuthority(String authority) {
+ int index = 0;
+ int end = authority.length();
+ char testChar;
+
+ while (index < end) {
+ testChar = authority.charAt(index);
+
+ // check for valid escape sequence
+ if (testChar == '%') {
+ *if (index+2 >= end ||
+ !isHex(authority.charAt(index+1)) ||
+ !isHex(authority.charAt(index+2))) {*
+ return false;
+ }
+ index += 2;
+ }
+ // can check against path characters because the set
+ // is the same except for '/' which we've already excluded.
+ else if (!isPathCharacter(testChar)) {
+ return false;
+ }
+ ++index;
+ }
+ return true;
+ }
The important difference is of course, the bounds check on the string
/ character array. Why is it omitted in the C++ version ?
I thought it might be because the string can be assume to be null
terminated in the C++ version.. but I can't be sure whether it's just
a bug or not.
I would thank your reply
Sincerely,
Amit
Re: question regarding XMLUri::isValidRegistryBasedAuthority
Posted by Amit K <kl...@gmail.com>.
Hi Alberto,
Thank you for your quick reply ! I'll apply the same patch in order to
spare the same confusion from other people in the future might they bump
into the same code :)
Cheers,
Amit
On Fri, Jul 20, 2012 at 10:15 PM, Alberto Massari <
Alberto.Massari@progress.com> wrote:
> Hi Amit,
> thanks for spotting this; even if the code was not going to crash (as the
> first character after the authority is still in the allocated memory and
> must be a / or a NULL, and both would have failed the isHex test, avoiding
> that the second isHex would access memory after the end of the string) I
> added the extra check to make it clear.
>
> Thanks,
> Alberto
>
> Il 19/07/2012 23:00, Amit K ha scritto:
>
>> Hi,
>>
>>
>> I am using the C++ distribution of xerces 2.7, the implementation of
>> the function in question in it is:
>>
>>
>> bool XMLUri::**isValidRegistryBasedAuthority(**const XMLCh* const
>> authority,
>>
>> const int authLen)
>> {
>> // check authority int index = 0;
>> while (index < authLen)
>> {
>> if (isUnreservedCharacter(**authority[index]) ||
>> (XMLString::indexOf(REG_NAME_**CHARACTERS,
>> authority[index]) != -1))
>> {
>> index++;
>> }
>> else if (authority[index] == chPercent) // '%'
>> {
>> *if (XMLString::isHex(authority[**index+1]) && // 1st
>>
>> hex XMLString::isHex(authority[**index+2]) ) // 2nd
>> hex index +=3;*
>>
>> else
>> return false;
>> }
>> else
>> return false;
>> } //while
>> return true;
>> }
>>
>>
>> I've boldened the lines which I want to further discuss here.
>>
>> and also note that I've seen that the implementation is the same in
>> the latest version of xerces.
>>
>> However, I noticed that the Java implementation is different in
>> relation to the same lines:
>>
>> /**
>> + * Determines whether the given string is a registry based
>> authority.
>> + *
>> + * @param authority the authority component of a URI
>> + *
>> + * @return true if the given string is a registry based authority
>> + */
>> + private boolean isValidRegistryBasedAuthority(**String authority) {
>> + int index = 0;
>> + int end = authority.length();
>> + char testChar;
>> +
>> + while (index < end) {
>> + testChar = authority.charAt(index);
>> +
>> + // check for valid escape sequence
>> + if (testChar == '%') {
>> + *if (index+2 >= end ||
>> + !isHex(authority.charAt(index+**1)) ||
>> + !isHex(authority.charAt(index+**2))) {*
>>
>> + return false;
>> + }
>> + index += 2;
>> + }
>> + // can check against path characters because the set
>> + // is the same except for '/' which we've already excluded.
>> + else if (!isPathCharacter(testChar)) {
>> + return false;
>> + }
>> + ++index;
>> + }
>> + return true;
>> + }
>>
>>
>> The important difference is of course, the bounds check on the string
>> / character array. Why is it omitted in the C++ version ?
>>
>> I thought it might be because the string can be assume to be null
>> terminated in the C++ version.. but I can't be sure whether it's just
>> a bug or not.
>>
>> I would thank your reply
>>
>>
>> Sincerely,
>>
>> Amit
>> .
>>
>>
>
Re: question regarding XMLUri::isValidRegistryBasedAuthority
Posted by Alberto Massari <Al...@progress.com>.
Hi Amit,
thanks for spotting this; even if the code was not going to crash (as
the first character after the authority is still in the allocated memory
and must be a / or a NULL, and both would have failed the isHex test,
avoiding that the second isHex would access memory after the end of the
string) I added the extra check to make it clear.
Thanks,
Alberto
Il 19/07/2012 23:00, Amit K ha scritto:
> Hi,
>
>
> I am using the C++ distribution of xerces 2.7, the implementation of
> the function in question in it is:
>
>
> bool XMLUri::isValidRegistryBasedAuthority(const XMLCh* const authority,
>
> const int authLen)
> {
> // check authority int index = 0;
> while (index < authLen)
> {
> if (isUnreservedCharacter(authority[index]) ||
> (XMLString::indexOf(REG_NAME_CHARACTERS, authority[index]) != -1))
> {
> index++;
> }
> else if (authority[index] == chPercent) // '%' {
> *if (XMLString::isHex(authority[index+1]) && // 1st
> hex XMLString::isHex(authority[index+2]) ) // 2nd
> hex index +=3;*
> else
> return false;
> }
> else
> return false;
> } //while
> return true;
> }
>
>
> I've boldened the lines which I want to further discuss here.
>
> and also note that I've seen that the implementation is the same in
> the latest version of xerces.
>
> However, I noticed that the Java implementation is different in
> relation to the same lines:
>
> /**
> + * Determines whether the given string is a registry based authority.
> + *
> + * @param authority the authority component of a URI
> + *
> + * @return true if the given string is a registry based authority
> + */
> + private boolean isValidRegistryBasedAuthority(String authority) {
> + int index = 0;
> + int end = authority.length();
> + char testChar;
> +
> + while (index < end) {
> + testChar = authority.charAt(index);
> +
> + // check for valid escape sequence
> + if (testChar == '%') {
> + *if (index+2 >= end ||
> + !isHex(authority.charAt(index+1)) ||
> + !isHex(authority.charAt(index+2))) {*
> + return false;
> + }
> + index += 2;
> + }
> + // can check against path characters because the set
> + // is the same except for '/' which we've already excluded.
> + else if (!isPathCharacter(testChar)) {
> + return false;
> + }
> + ++index;
> + }
> + return true;
> + }
>
>
> The important difference is of course, the bounds check on the string
> / character array. Why is it omitted in the C++ version ?
>
> I thought it might be because the string can be assume to be null
> terminated in the C++ version.. but I can't be sure whether it's just
> a bug or not.
>
> I would thank your reply
>
>
> Sincerely,
>
> Amit
> .
>