You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@wandisco.com> on 2015/08/11 16:50:51 UTC

Review of sizeof usage

Hi all,

The bug that lead to r1694533 prompted me to grep
for all 2500+ uses of sizeof and to review them.

I only found one other real problem and fixed it in
r1694929. However, I noticed an inconsistency in the
way we use sizeof. In my opinion, we should take the
size of the created or processed variable instead of its
type, i.e.

  abc_t *v = apr_pcalloc(pool, sizeof(*v));
  apr_hash_set(hash, key, sizeof(*key), y);
  z = apr_hash_get(hash, key, sizeof(*key));

rather than

  abc_t *v = apr_pcalloc(pool, sizeof(abc_t));
  apr_hash_set(hash, key, sizeof(key_t), y);
  z = apr_hash_get(hash, key, sizeof(key_t));

I found and fixed about 50 occurrences. If nobody
objects, I'll commit them over the weekend.

-- Stefan^2.

Re: Review of sizeof usage

Posted by Julian Foad <ju...@btopenworld.com>.
FWIW, I too have come to prefer the variable-based form, for
readability reasons. One reason is that it is easier to quickly
recognize and verify the idiom when the redundancy is in the form of
two identical names close together:

svn_my_type_t *my_var;
[...]
foo(my_var, sizeof(*my_var));  <== 'my_var' twice

than when the nearby pairing uses two different names and the
repetition of the type name is widely separated:

svn_my_type_t *my_var;
[...]
foo(my_var, sizeof(svn_my_type_t));

especially when the declaration was separate from the statement that uses it.

Another reason is that the variable name is usually shorter, the
entire statement often fitting on one line, whereas the type name is
usually a public, user-defined type (or pointer to such) with a longer
name.

- Julian

Re: Review of sizeof usage

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Thu, Aug 13, 2015 at 11:17 AM, Philip Martin <ph...@wandisco.com>
wrote:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
> >> I prefer the explicit type as it is easier to grep.
> >
> > What do you grep for, specifically? The type should
> > already show up for the variable / function argument
> > declaration - so, you should not miss a type usage
> > either way.
>
> If I want to find all the locations that a given type is allocated then
> using grep is hard on code like this:
>
>     type_t *p;
>     ...
>     p = apr_palloc(result_pool, *p);
>
> and this:
>
>     void foo(type_t **p, apr_pool_t *pool)
>     {
>        *p = apr_palloc(pool, **p);
>     }
>
> Finding sizeof(type_t) is easier.
>

So, changing apr_hash_get/set calls and allocations
where the type is specified on the same line to the
variable-based variant would not affect that use-case.

This would already cover half of the places where we
currently use explicit types. Is changing those o.k.
with you?

-- Stefan^2.

Re: Review of sizeof usage

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

>> I prefer the explicit type as it is easier to grep.
>
> What do you grep for, specifically? The type should
> already show up for the variable / function argument
> declaration - so, you should not miss a type usage
> either way.

If I want to find all the locations that a given type is allocated then
using grep is hard on code like this:

    type_t *p;
    ...
    p = apr_palloc(result_pool, *p);

and this:

    void foo(type_t **p, apr_pool_t *pool)
    {
       *p = apr_palloc(pool, **p);
    }

Finding sizeof(type_t) is easier.

-- 
Philip Martin
WANdisco

Re: Review of sizeof usage

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Aug 11, 2015 at 4:02 PM, Philip Martin <ph...@wandisco.com>
wrote:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
> > way we use sizeof. In my opinion, we should take the
> > size of the created or processed variable instead of its
> > type, i.e.
> >
> >   abc_t *v = apr_pcalloc(pool, sizeof(*v));
> >   apr_hash_set(hash, key, sizeof(*key), y);
> >   z = apr_hash_get(hash, key, sizeof(*key));
> >
> > rather than
> >
> >   abc_t *v = apr_pcalloc(pool, sizeof(abc_t));
> >   apr_hash_set(hash, key, sizeof(key_t), y);
> >   z = apr_hash_get(hash, key, sizeof(key_t));
>
> We have had problems with both styles in the past, so neither is immune
> to bugs.


Absolutely, neither way is foolproof. The variable-based
variant is prone to using the wrong level of indirection,
for instance.

Once the code has been debugged, though, the variable-
based variant should have better maintainability because
it is closer to the single point of definition principle. And
I found that usage easier to verify during my review.


> I prefer the explicit type as it is easier to grep.
>

What do you grep for, specifically? The type should
already show up for the variable / function argument
declaration - so, you should not miss a type usage
either way.

-- Stefan^2.

Re: Review of sizeof usage

Posted by Branko Čibej <br...@wandisco.com>.
On 14.08.2015 01:25, Daniel Shahaf wrote:
> Branko Čibej wrote on Wed, Aug 12, 2015 at 10:07:49 +0200:
>> On 12.08.2015 00:31, Daniel Shahaf wrote:
>>>>> We have had problems with both styles in the past, so neither is immune
>>>>> to bugs.  I prefer the explicit type as it is easier to grep.
>>>> The explicit type form is more accident-prone than the variable form
>>>> because any change requires two modifications in the same statement
>>>> instead of one.
>>> Why doesn't the compiler or buildbot catch accidents?
>> I can't imagine a way for the compiler to emit warnings for such
>> constructs without getting a far too large percentage of false
>> positives. It's perfectly valid, and in many cases required by some
>> object-like architecture, to allocate a buffer that has a different size
>> than the one implied by the pointer that stores the return value. This
>> is C, after all.
>>
> Okay, so from the compiler authors' perspective, "allocation size mismatches
> pointed-to-object size" warnings should not be on by default.  Fair enough.
> But from our perspective as Subversion maintainers, we never *intentionally*
> allocate a buffer smaller than the pointed-to object, so the warnings would be
> useful to us.  We should therefore opt-in to them.

Smaller buffer, probably not. Larger, definitely.

-- Brane


Re: Review of sizeof usage

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Wed, Aug 12, 2015 at 10:07:49 +0200:
> On 12.08.2015 00:31, Daniel Shahaf wrote:
> >
> >>> We have had problems with both styles in the past, so neither is immune
> >>> to bugs.  I prefer the explicit type as it is easier to grep.
> >> The explicit type form is more accident-prone than the variable form
> >> because any change requires two modifications in the same statement
> >> instead of one.
> > Why doesn't the compiler or buildbot catch accidents?
> 
> I can't imagine a way for the compiler to emit warnings for such
> constructs without getting a far too large percentage of false
> positives. It's perfectly valid, and in many cases required by some
> object-like architecture, to allocate a buffer that has a different size
> than the one implied by the pointer that stores the return value. This
> is C, after all.
> 

Okay, so from the compiler authors' perspective, "allocation size mismatches
pointed-to-object size" warnings should not be on by default.  Fair enough.
But from our perspective as Subversion maintainers, we never *intentionally*
allocate a buffer smaller than the pointed-to object, so the warnings would be
useful to us.  We should therefore opt-in to them.

(via compiler flags, or have buildbot run static analysis, or…)

Cheers,

Daniel

> -- Brane

Re: Review of sizeof usage

Posted by Branko Čibej <br...@wandisco.com>.
On 12.08.2015 00:31, Daniel Shahaf wrote:
>
>>> We have had problems with both styles in the past, so neither is immune
>>> to bugs.  I prefer the explicit type as it is easier to grep.
>> The explicit type form is more accident-prone than the variable form
>> because any change requires two modifications in the same statement
>> instead of one.
> Why doesn't the compiler or buildbot catch accidents?

I can't imagine a way for the compiler to emit warnings for such
constructs without getting a far too large percentage of false
positives. It's perfectly valid, and in many cases required by some
object-like architecture, to allocate a buffer that has a different size
than the one implied by the pointer that stores the return value. This
is C, after all.

-- Brane

Re: Review of sizeof usage

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Tue, Aug 11, 2015 at 21:55:48 +0200:
> On 11.08.2015 17:02, Philip Martin wrote:
> > Stefan Fuhrmann <st...@wandisco.com> writes:
> >
> >> way we use sizeof. In my opinion, we should take the
> >> size of the created or processed variable instead of its
> >> type, i.e.
> >>
> >>   abc_t *v = apr_pcalloc(pool, sizeof(*v));
> >>   apr_hash_set(hash, key, sizeof(*key), y);
> >>   z = apr_hash_get(hash, key, sizeof(*key));
> >>
> >> rather than
> >>
> >>   abc_t *v = apr_pcalloc(pool, sizeof(abc_t));
> >>   apr_hash_set(hash, key, sizeof(key_t), y);
> >>   z = apr_hash_get(hash, key, sizeof(key_t));

Both of these variants are redundant.  We could encapsulate the
redundancy in a macro:

    #define SVN__CALLOC(obj, pool) do { (obj) = apr_pcalloc((pool), sizeof(*(obj))); } while (0)
    {
      abc_t *v;
      SVN__CALLOC(v, pool);
    }

> > We have had problems with both styles in the past, so neither is immune
> > to bugs.  I prefer the explicit type as it is easier to grep.
> 
> The explicit type form is more accident-prone than the variable form
> because any change requires two modifications in the same statement
> instead of one.

Why doesn't the compiler or buildbot catch accidents?

Re: Review of sizeof usage

Posted by Greg Stein <gs...@gmail.com>.
On Tue, Aug 11, 2015 at 2:55 PM, Branko Čibej <br...@wandisco.com> wrote:

> On 11.08.2015 17:02, Philip Martin wrote:
> > Stefan Fuhrmann <st...@wandisco.com> writes:
> >
> >> way we use sizeof. In my opinion, we should take the
> >> size of the created or processed variable instead of its
> >> type, i.e.
> >>
> >>   abc_t *v = apr_pcalloc(pool, sizeof(*v));
> >>   apr_hash_set(hash, key, sizeof(*key), y);
> >>   z = apr_hash_get(hash, key, sizeof(*key));
> >>
> >> rather than
> >>
> >>   abc_t *v = apr_pcalloc(pool, sizeof(abc_t));
> >>   apr_hash_set(hash, key, sizeof(key_t), y);
> >>   z = apr_hash_get(hash, key, sizeof(key_t));
> > We have had problems with both styles in the past, so neither is immune
> > to bugs.  I prefer the explicit type as it is easier to grep.
>
> The explicit type form is more accident-prone than the variable form
> because any change requires two modifications in the same statement
> instead of one.
>

Agreed. I much prefer the variable-based form. It works well for locals,
arguments, fields in a structure, dereferences, etc.

Cheers,
-g

Re: Review of sizeof usage

Posted by Branko Čibej <br...@wandisco.com>.
On 11.08.2015 17:02, Philip Martin wrote:
> Stefan Fuhrmann <st...@wandisco.com> writes:
>
>> way we use sizeof. In my opinion, we should take the
>> size of the created or processed variable instead of its
>> type, i.e.
>>
>>   abc_t *v = apr_pcalloc(pool, sizeof(*v));
>>   apr_hash_set(hash, key, sizeof(*key), y);
>>   z = apr_hash_get(hash, key, sizeof(*key));
>>
>> rather than
>>
>>   abc_t *v = apr_pcalloc(pool, sizeof(abc_t));
>>   apr_hash_set(hash, key, sizeof(key_t), y);
>>   z = apr_hash_get(hash, key, sizeof(key_t));
> We have had problems with both styles in the past, so neither is immune
> to bugs.  I prefer the explicit type as it is easier to grep.

The explicit type form is more accident-prone than the variable form
because any change requires two modifications in the same statement
instead of one.

-- Brane

Re: Review of sizeof usage

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

> way we use sizeof. In my opinion, we should take the
> size of the created or processed variable instead of its
> type, i.e.
>
>   abc_t *v = apr_pcalloc(pool, sizeof(*v));
>   apr_hash_set(hash, key, sizeof(*key), y);
>   z = apr_hash_get(hash, key, sizeof(*key));
>
> rather than
>
>   abc_t *v = apr_pcalloc(pool, sizeof(abc_t));
>   apr_hash_set(hash, key, sizeof(key_t), y);
>   z = apr_hash_get(hash, key, sizeof(key_t));

We have had problems with both styles in the past, so neither is immune
to bugs.  I prefer the explicit type as it is easier to grep.

-- 
Philip Martin
WANdisco