You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by David Bertoni <db...@apache.org> on 2007/07/24 02:30:56 UTC

Inconsistencies in unsigned int to XMLSize_t changes

Hi all,

I noticed that in many places, unsigned int parameters have been changed to 
  XMLSize_t.  However, there seem to be some places where that did not 
happen.  For example, the SAX/SAX2 AttributeList and Attributes classes 
still use unsigned int, as does DOMNodeList.

Is this intended, or should those classes be updated as well?  I would be 
happy to fix them up, if necessary.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


RE: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Scott Cantor <ca...@osu.edu>.
> Switching from philosophical to practical (not that being
> philosophical is necessary a bad thing), is this a request to add new
> typedefs?

Depends on how much size independence you want between the different pieces
of code. I just know that things like DOMNodeList already return XMLSize_t
and that was really the kind of thing that I was referring to.

> E.g.: XMLCount_t, XMLLineCol_t, others...?
> What about those APIs that currently return 'int' because they use -1
> to signal 'not found'?

There's no requirement that the typedef be unsigned, but generally speaking
I'm not found of that kind of idiom myself. XMLString::indexOf for example
returns a position index, whereas I would have stuck with C idiom and
returned the pointer or NULL for not found. People that care just do result
- start if they need the position.

But I don't know all the places where you're doing that, that's just one
example I know of.

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Inconsistencies in unsigned int to XMLSize_t changes

Posted by David Bertoni <db...@apache.org>.
Alberto Massari wrote:
> Hi David,
> 
> At 08.45 24/07/2007 -0700, David Bertoni wrote:
>> Scott Cantor wrote:
>>> Another point regarding this is that when you use native types, 
>>> people glue
>>> APIs and components together based on the fact that both APIs use those
>>> types. Then you have a brittle contract because neither component 
>>> *meant* to
>>> promise that they would work together through the native type. Using a
>>> typedef prevents people from overestimating what they can combine 
>>> together
>>> safely, since it requires a cast.
>> Wow, I didn't realize I would create so much controversy here! ;-)
>>
>> I have to agree with most others that the distinction between the size 
>> of a "memory buffer" and the number of items in a container seems 
>> arbitrary.
>>
>> Also, the "length" of a UTF-16 string in Xerces-C, which is now 
>> expressed in XMLSize_t is _not_ the size of a memory buffer.  Rather, 
>> it is the number of UTF-16 code units, which doesn't seem much 
>> different to me than the number of attributes in a document, or the 
>> number of items in a container.  After all, isn't it all about 
>> "counting" things?
> 
> Well, every number is expressing a "count" of something ;-)

Yes, that's exactly my point.

> What I have in mind could be expressed also in this way: a 
> size_t/XMLSize_t has a meaning only when given a starting address, i.e. 
> it's part of a tuple (void*, size_t), while an unsigned int/unsigned 
> long can be treated as a standalone number, i.e. X attributes, Y lines.

Although I understand the distinction you're making, I don't think it will 
even be remotely obvious to users.

> 
>> Finally, I have to agree with Scott that a typedef would be better.  
>> You've no idea the number of changes I had to make in Xalan-C for 
>> compatibility with the Xerces-C changes to XMLSize_t.  Luckily, about 
>> 5 years ago, we started using typedefs extensively, or I would have 
>> had to make many more changes.
> 
> Switching from philosophical to practical (not that being philosophical 
> is necessary a bad thing), is this a request to add new typedefs?
> E.g.: XMLCount_t, XMLLineCol_t, others...?
> What about those APIs that currently return 'int' because they use -1 to 
> signal 'not found'?

Well, I would be more in favor of consistency in the APIs, so I would 
prefer XMLSize_t be used for "n" lines or "n" attributes, etc...

As for the use of int in the XMLString interface, I think it's a bad idea. 
If we're using XMLSize_t for the length of null-terminated UTF-16 strings 
in the SAX APIs, we should use XMLSize_t in the XMLString class.  To signal 
"not found," we can do what the std::string class does, and define a value 
that indicates the condition.  In fact, we can make that value compatible 
with -1, so existing code won't break.  Xalan-C's XalanDOMString class does 
just that:

class XALAN_DOM_EXPORT XalanDOMString
{
public:

...

#if defined(XALAN_INLINE_INITIALIZATION)
     static const size_type  npos = ~0u;
#else
     enum { npos = ~0u };
#endif

As long as we're going through the pain of changing some of the interfaces, 
   we really should just bite the bullet and change them all.  In fact, I 
remember a debate we had on the list more than a year ago that talked about 
changing all of the unsigned int types to use size_t.  I was opposed to it 
because it caused lots of turbulence, but now that we've started down that 
path, I really feel we should go all the way.

Thanks!

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Alberto Massari <am...@datadirect.com>.
Hi David,

At 08.45 24/07/2007 -0700, David Bertoni wrote:
>Scott Cantor wrote:
>>Another point regarding this is that when you use native types, people glue
>>APIs and components together based on the fact that both APIs use those
>>types. Then you have a brittle contract because neither component *meant* to
>>promise that they would work together through the native type. Using a
>>typedef prevents people from overestimating what they can combine together
>>safely, since it requires a cast.
>Wow, I didn't realize I would create so much controversy here! ;-)
>
>I have to agree with most others that the distinction between the 
>size of a "memory buffer" and the number of items in a container 
>seems arbitrary.
>
>Also, the "length" of a UTF-16 string in Xerces-C, which is now 
>expressed in XMLSize_t is _not_ the size of a memory 
>buffer.  Rather, it is the number of UTF-16 code units, which 
>doesn't seem much different to me than the number of attributes in a 
>document, or the number of items in a container.  After all, isn't 
>it all about "counting" things?

Well, every number is expressing a "count" of something ;-)
What I have in mind could be expressed also in this way: a 
size_t/XMLSize_t has a meaning only when given a starting address, 
i.e. it's part of a tuple (void*, size_t), while an unsigned 
int/unsigned long can be treated as a standalone number, i.e. X 
attributes, Y lines.


>Finally, I have to agree with Scott that a typedef would be 
>better.  You've no idea the number of changes I had to make in 
>Xalan-C for compatibility with the Xerces-C changes to 
>XMLSize_t.  Luckily, about 5 years ago, we started using typedefs 
>extensively, or I would have had to make many more changes.

Switching from philosophical to practical (not that being 
philosophical is necessary a bad thing), is this a request to add new typedefs?
E.g.: XMLCount_t, XMLLineCol_t, others...?
What about those APIs that currently return 'int' because they use -1 
to signal 'not found'?

Alberto


>Thanks!
>
>Dave
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>For additional commands, e-mail: c-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Inconsistencies in unsigned int to XMLSize_t changes

Posted by David Bertoni <db...@apache.org>.
Scott Cantor wrote:
> Another point regarding this is that when you use native types, people glue
> APIs and components together based on the fact that both APIs use those
> types. Then you have a brittle contract because neither component *meant* to
> promise that they would work together through the native type. Using a
> typedef prevents people from overestimating what they can combine together
> safely, since it requires a cast.
Wow, I didn't realize I would create so much controversy here! ;-)

I have to agree with most others that the distinction between the size of a 
"memory buffer" and the number of items in a container seems arbitrary.

Also, the "length" of a UTF-16 string in Xerces-C, which is now expressed 
in XMLSize_t is _not_ the size of a memory buffer.  Rather, it is the 
number of UTF-16 code units, which doesn't seem much different to me than 
the number of attributes in a document, or the number of items in a 
container.  After all, isn't it all about "counting" things?

Finally, I have to agree with Scott that a typedef would be better.  You've 
no idea the number of changes I had to make in Xalan-C for compatibility 
with the Xerces-C changes to XMLSize_t.  Luckily, about 5 years ago, we 
started using typedefs extensively, or I would have had to make many more 
changes.

Thanks!

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


RE: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Scott Cantor <ca...@osu.edu>.
Another point regarding this is that when you use native types, people glue
APIs and components together based on the fact that both APIs use those
types. Then you have a brittle contract because neither component *meant* to
promise that they would work together through the native type. Using a
typedef prevents people from overestimating what they can combine together
safely, since it requires a cast.

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


RE: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Scott Cantor <ca...@osu.edu>.
> could you elaborate your opinion? I don't grasp what you are suggesting.

I'm saying that any containers would be better off using a typedef and not a
native type, and I'd really look long and hard at any use of int or long in
the entire public API.

Even if you want the container sizes to be implementation-defined (as
opposed to documented), it's best to make them opaque so people don't make
assumptions about the size based on a native type that they think they
understand (and in 64-bit land usually end up wrong about).

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


RE: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Alberto Massari <am...@datadirect.com>.
Hi Scott,
could you elaborate your opinion? I don't grasp what you are suggesting.

Thanks,
Alberto

At 10.45 24/07/2007 -0400, Scott Cantor wrote:
> > STL uses name "size_type" in the same case. So I should remark that size
> > exists not only for buffers :) Though I can live with current
> > implementation.
>
>For good reason...I would suggest changing this decision.
>
> > > The only doubt is if we want to store more than 4 billion items in a
> > > vector on 64-bit platforms; in this case, unsigned long should be used.
> >
> > AFAIK long is 32-bit on Visual C++ in 64-bit mode in contrast to g++.
>
>Exactly. If you want consistent container sizes across platforms, there is
>no built-in type for that.
>
>-- Scott
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>For additional commands, e-mail: c-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


RE: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Scott Cantor <ca...@osu.edu>.
> STL uses name "size_type" in the same case. So I should remark that size
> exists not only for buffers :) Though I can live with current
> implementation.

For good reason...I would suggest changing this decision.

> > The only doubt is if we want to store more than 4 billion items in a
> > vector on 64-bit platforms; in this case, unsigned long should be used.
> 
> AFAIK long is 32-bit on Visual C++ in 64-bit mode in contrast to g++.

Exactly. If you want consistent container sizes across platforms, there is
no built-in type for that.

-- Scott



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Vitaly Prapirny <ma...@mebius.net>.
Alberto Massari wrote:
> At 12.04 24/07/2007 +0300, Vitaly Prapirny wrote:
>> Hi Alberto,
>> Alberto Massari wrote:
>>> the change was intentional; XMLSize_t should be used when dealing 
>>> with a number that expresses the size of a memory buffer (i.e. it 
>>> complements a pointer), while unsigned int/unsigned long are to be 
>>> used when expressing standalone numbers (i.e. the number of items in 
>>> a vector/list, the line/column number).
>> The number of items in a container is the size of it, isn't it?
> 
> That's an implementation decision; the fact is that its meaning is 
> "count of items" instead of "size of buffer".

size of buffer - count of bytes in a buffer
size of container - count of items in a container

STL uses name "size_type" in the same case. So I should remark that size
exists not only for buffers :) Though I can live with current
implementation.

> The only doubt is if we want to store more than 4 billion items in a 
> vector on 64-bit platforms; in this case, unsigned long should be used.

AFAIK long is 32-bit on Visual C++ in 64-bit mode in contrast to g++.

Good luck!
	Vitaly

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Alberto Massari <am...@datadirect.com>.
At 12.04 24/07/2007 +0300, Vitaly Prapirny wrote:
>Hi Alberto,
>Alberto Massari wrote:
>>the change was intentional; XMLSize_t should be used when dealing 
>>with a number that expresses the size of a memory buffer (i.e. it 
>>complements a pointer), while unsigned int/unsigned long are to be 
>>used when expressing standalone numbers (i.e. the number of items 
>>in a vector/list, the line/column number).
>The number of items in a container is the size of it, isn't it?

That's an implementation decision; the fact is that its meaning is 
"count of items" instead of "size of buffer".
The only doubt is if we want to store more than 4 billion items in a 
vector on 64-bit platforms; in this case, unsigned long should be used.

Alberto


>Good luck!
>         Vitaly
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>For additional commands, e-mail: c-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Vitaly Prapirny <ma...@mebius.net>.
Hi Alberto,
Alberto Massari wrote:
> the change was intentional; XMLSize_t should be used when dealing with a 
> number that expresses the size of a memory buffer (i.e. it complements a 
> pointer), while unsigned int/unsigned long are to be used when 
> expressing standalone numbers (i.e. the number of items in a 
> vector/list, the line/column number).
The number of items in a container is the size of it, isn't it?

Good luck!
	Vitaly

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Inconsistencies in unsigned int to XMLSize_t changes

Posted by Alberto Massari <am...@datadirect.com>.
Hi Dave,
the change was intentional; XMLSize_t should be used when dealing 
with a number that expresses the size of a memory buffer (i.e. it 
complements a pointer), while unsigned int/unsigned long are to be 
used when expressing standalone numbers (i.e. the number of items in 
a vector/list, the line/column number).

Alberto

At 17.30 23/07/2007 -0700, David Bertoni wrote:
>Hi all,
>
>I noticed that in many places, unsigned int parameters have been 
>changed to  XMLSize_t.  However, there seem to be some places where 
>that did not happen.  For example, the SAX/SAX2 AttributeList and 
>Attributes classes still use unsigned int, as does DOMNodeList.
>
>Is this intended, or should those classes be updated as well?  I 
>would be happy to fix them up, if necessary.
>
>Dave
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
>For additional commands, e-mail: c-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org