You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Amit K <kl...@gmail.com> on 2012/07/19 23:00:35 UTC

question regarding XMLUri::isValidRegistryBasedAuthority

Hi,


I am using the C++ distribution of xerces 2.7, the implementation of
the function in question in it is:


bool XMLUri::isValidRegistryBasedAuthority(const XMLCh* const authority,

                                           const int authLen)
{
    // check authority    int index = 0;
    while (index < authLen)
    {
        if (isUnreservedCharacter(authority[index]) ||
            (XMLString::indexOf(REG_NAME_CHARACTERS, authority[index]) != -1))
        {
            index++;
        }
        else if (authority[index] == chPercent)               // '%'        {
            *if (XMLString::isHex(authority[index+1]) &&     // 1st
hex                XMLString::isHex(authority[index+2])  )     // 2nd
hex                index +=3;*
            else
                return false;
        }
        else
            return false;
    } //while
    return true;
}


I've boldened the lines which I want to further discuss here.

and also note that I've seen that the implementation is the same in
the latest version of xerces.

However, I noticed that the Java implementation is different in
relation to the same lines:

/**
  +   * Determines whether the given string is a registry based authority.
  +   *
  +   * @param authority the authority component of a URI
  +   *
  +   * @return true if the given string is a registry based authority
  +   */
  +  private boolean isValidRegistryBasedAuthority(String authority) {
  +    int index = 0;
  +    int end = authority.length();
  +    char testChar;
  +  	
  +    while (index < end) {
  +      testChar = authority.charAt(index);
  +
  +      // check for valid escape sequence
  +      if (testChar == '%') {
  +        *if (index+2 >= end ||
  +            !isHex(authority.charAt(index+1)) ||
  +            !isHex(authority.charAt(index+2))) {*
  +            return false;
  +        }
  +        index += 2;
  +      }
  +      // can check against path characters because the set
  +      // is the same except for '/' which we've already excluded.
  +      else if (!isPathCharacter(testChar)) {
  +        return false;
  +      }
  +      ++index;
  +    }
  +    return true;
  +  }


The important difference is of course, the bounds check on the string
/ character array. Why is it omitted in the C++ version ?

I thought it might be because the string can be assume to be null
terminated in the C++ version.. but I can't be sure whether it's just
a bug or not.

I would thank your reply


Sincerely,

Amit

Re: question regarding XMLUri::isValidRegistryBasedAuthority

Posted by Amit K <kl...@gmail.com>.
Hi Alberto,

Thank you for your quick reply ! I'll apply the same patch in order to
spare the same confusion from other people in the future might they bump
into the same code :)

Cheers,
Amit

On Fri, Jul 20, 2012 at 10:15 PM, Alberto Massari <
Alberto.Massari@progress.com> wrote:

> Hi Amit,
> thanks for spotting this; even if the code was not going to crash (as the
> first character after the authority is still in the allocated memory and
> must be a / or a NULL, and both would have failed the isHex test, avoiding
> that the second isHex would access memory after the end of the string) I
> added the extra check to make it clear.
>
> Thanks,
> Alberto
>
> Il 19/07/2012 23:00, Amit K ha scritto:
>
>> Hi,
>>
>>
>> I am using the C++ distribution of xerces 2.7, the implementation of
>> the function in question in it is:
>>
>>
>> bool XMLUri::**isValidRegistryBasedAuthority(**const XMLCh* const
>> authority,
>>
>>                                             const int authLen)
>> {
>>      // check authority    int index = 0;
>>      while (index < authLen)
>>      {
>>          if (isUnreservedCharacter(**authority[index]) ||
>>              (XMLString::indexOf(REG_NAME_**CHARACTERS,
>> authority[index]) != -1))
>>          {
>>              index++;
>>          }
>>          else if (authority[index] == chPercent)               // '%'
>>    {
>>              *if (XMLString::isHex(authority[**index+1]) &&     // 1st
>>
>> hex                XMLString::isHex(authority[**index+2])  )     // 2nd
>> hex                index +=3;*
>>
>>              else
>>                  return false;
>>          }
>>          else
>>              return false;
>>      } //while
>>      return true;
>> }
>>
>>
>> I've boldened the lines which I want to further discuss here.
>>
>> and also note that I've seen that the implementation is the same in
>> the latest version of xerces.
>>
>> However, I noticed that the Java implementation is different in
>> relation to the same lines:
>>
>> /**
>>    +   * Determines whether the given string is a registry based
>> authority.
>>    +   *
>>    +   * @param authority the authority component of a URI
>>    +   *
>>    +   * @return true if the given string is a registry based authority
>>    +   */
>>    +  private boolean isValidRegistryBasedAuthority(**String authority) {
>>    +    int index = 0;
>>    +    int end = authority.length();
>>    +    char testChar;
>>    +
>>    +    while (index < end) {
>>    +      testChar = authority.charAt(index);
>>    +
>>    +      // check for valid escape sequence
>>    +      if (testChar == '%') {
>>    +        *if (index+2 >= end ||
>>    +            !isHex(authority.charAt(index+**1)) ||
>>    +            !isHex(authority.charAt(index+**2))) {*
>>
>>    +            return false;
>>    +        }
>>    +        index += 2;
>>    +      }
>>    +      // can check against path characters because the set
>>    +      // is the same except for '/' which we've already excluded.
>>    +      else if (!isPathCharacter(testChar)) {
>>    +        return false;
>>    +      }
>>    +      ++index;
>>    +    }
>>    +    return true;
>>    +  }
>>
>>
>> The important difference is of course, the bounds check on the string
>> / character array. Why is it omitted in the C++ version ?
>>
>> I thought it might be because the string can be assume to be null
>> terminated in the C++ version.. but I can't be sure whether it's just
>> a bug or not.
>>
>> I would thank your reply
>>
>>
>> Sincerely,
>>
>> Amit
>> .
>>
>>
>

Re: question regarding XMLUri::isValidRegistryBasedAuthority

Posted by Alberto Massari <Al...@progress.com>.
Hi Amit,
thanks for spotting this; even if the code was not going to crash (as 
the first character after the authority is still in the allocated memory 
and must be a / or a NULL, and both would have failed the isHex test, 
avoiding that the second isHex would access memory after the end of the 
string) I added the extra check to make it clear.

Thanks,
Alberto

Il 19/07/2012 23:00, Amit K ha scritto:
> Hi,
>
>
> I am using the C++ distribution of xerces 2.7, the implementation of
> the function in question in it is:
>
>
> bool XMLUri::isValidRegistryBasedAuthority(const XMLCh* const authority,
>
>                                             const int authLen)
> {
>      // check authority    int index = 0;
>      while (index < authLen)
>      {
>          if (isUnreservedCharacter(authority[index]) ||
>              (XMLString::indexOf(REG_NAME_CHARACTERS, authority[index]) != -1))
>          {
>              index++;
>          }
>          else if (authority[index] == chPercent)               // '%'        {
>              *if (XMLString::isHex(authority[index+1]) &&     // 1st
> hex                XMLString::isHex(authority[index+2])  )     // 2nd
> hex                index +=3;*
>              else
>                  return false;
>          }
>          else
>              return false;
>      } //while
>      return true;
> }
>
>
> I've boldened the lines which I want to further discuss here.
>
> and also note that I've seen that the implementation is the same in
> the latest version of xerces.
>
> However, I noticed that the Java implementation is different in
> relation to the same lines:
>
> /**
>    +   * Determines whether the given string is a registry based authority.
>    +   *
>    +   * @param authority the authority component of a URI
>    +   *
>    +   * @return true if the given string is a registry based authority
>    +   */
>    +  private boolean isValidRegistryBasedAuthority(String authority) {
>    +    int index = 0;
>    +    int end = authority.length();
>    +    char testChar;
>    +  	
>    +    while (index < end) {
>    +      testChar = authority.charAt(index);
>    +
>    +      // check for valid escape sequence
>    +      if (testChar == '%') {
>    +        *if (index+2 >= end ||
>    +            !isHex(authority.charAt(index+1)) ||
>    +            !isHex(authority.charAt(index+2))) {*
>    +            return false;
>    +        }
>    +        index += 2;
>    +      }
>    +      // can check against path characters because the set
>    +      // is the same except for '/' which we've already excluded.
>    +      else if (!isPathCharacter(testChar)) {
>    +        return false;
>    +      }
>    +      ++index;
>    +    }
>    +    return true;
>    +  }
>
>
> The important difference is of course, the bounds check on the string
> / character array. Why is it omitted in the C++ version ?
>
> I thought it might be because the string can be assume to be null
> terminated in the C++ version.. but I can't be sure whether it's just
> a bug or not.
>
> I would thank your reply
>
>
> Sincerely,
>
> Amit
> .
>