You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2009/06/30 12:05:50 UTC

[RFC] Simplify use of apr_hash_this()

I propose a new macro to simplify the use of apr_hash_this().

To de-reference an APR hash iterator, we want to write something like:

  for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
    {
      const char *item;
      svn_wc_entry_t *entry;

      apr_hash_this(hi, &item, NULL, &entry);

      ...
    }

But to avoid warnings or errors to do with pointer type conversions,
currently we have to either mess around with temporary variables:

  for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
    {
      const char *item;
      svn_wc_entry_t *entry;
      const void *key;
      void *val;

      apr_hash_this(hi, &key, NULL, &val);

      item = key;
      entry = val;

      ...
    }

which is what we normally do, or add explicit type casts:

  for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
    {
      const char *item;
      svn_wc_entry_t *entry;

      apr_hash_this(hi, (void *)&item, NULL, (void *)&entry);

      ...
    }

Whether such type casts make a detectable difference depends on what
compiler you're running, and so it is easy for us to get them wrong.
(Sometimes we write them with too much indirection: (void **), or with a
mis-placed "const".)

The attached patch "apr_hash_this-def.patch" allows us to write it as I
showed first, clean and simple. It provides a macro APR_HASH_THIS (we
can name it SVN_* if we think we should) that works like apr_hash_this()
except without the need for temporary variables or type casts at the
point of use.

The second attached patch "apr_hash_this-use.patch" demonstrates the use
of this macro in a few instances.

To me, this will enable a big improvement in code clarity. Anything
against it?

(The attached patch contains two alternative implementations, and I
haven't determined whether either one is sufficient, nor that they don't
produce any new warnings on my compiler, let alone other compilers. If
you are willing to test this, please let us know the result. If you want
to take this idea forward and tweak/commit/expand it, please do.
Otherwise I hope to get back to it at some point, but I'm putting it out
here while I concentrate on other things for an unknown time.)

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366677

Re: [RFC] Simplify use of apr_hash_this()

Posted by Branko Cibej <br...@xbc.nu>.
Philip Martin wrote:
> So in the same way
>
>      svn_wc_entry_t *entry;
>      apr_hash_this(... (void*)&entry);
>
> can the compiler assume that calling apr_hash_this doesn't modify
> entry?  Does it need whole program optimisation to make this
> assumption?  It's because I can't answer questions like this that I
> would rather have temporary variables and no casts.
>   

You're right that whole-program optimization doesn't figure in there;
that was a thinko on my part. The aliasing knots are too knotty as it
is, in C. Sigh ...

Well, if (the) compiler(s) issue(s) warnings, we can at least assume
that something's potentially wrong. Since I already know that the type
punning is 'orrible because of pointer type incompatibilities, that's
good enought for me. :)

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366901

Re: [RFC] Simplify use of apr_hash_this()

Posted by Philip Martin <ph...@codematters.co.uk>.
Branko Čibej <br...@xbc.nu> writes:

> Strictly speaking, no, it's not valid. And aliasing *might* break
> if/when GCC does whole-program optimization. Neither is likely to cause
> real problems, though.

Is whole program optimisation required before aliasing becomes a
problem?

This is one way to initialise a pointer:

     svn_wc_entry_t *entry;
     svn_wc_entry_t **p = &entry;
     *p = 0;

but this:

     svn_wc_entry_t *entry;
     long *p = (void*)&entry;
     *p = 0;

violates the aliasing rules.  If long is the same size as a pointer
then the write should set the correct bytes to zero.  However even if
that happens an optimising compiler can assume that the write to *p
doesn't change entry and can thus assume that entry is still
unitialised.

So in the same way

     svn_wc_entry_t *entry;
     apr_hash_this(... (void*)&entry);

can the compiler assume that calling apr_hash_this doesn't modify
entry?  Does it need whole program optimisation to make this
assumption?  It's because I can't answer questions like this that I
would rather have temporary variables and no casts.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366802


Re: [RFC] Simplify use of apr_hash_this()

Posted by Branko Cibej <br...@xbc.nu>.
Philip Martin wrote:
> Julian Foad <ju...@btopenworld.com> writes:
>   
>> which is what we normally do, or add explicit type casts:
>>
>>   for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
>>     {
>>       const char *item;
>>       svn_wc_entry_t *entry;
>>
>>       apr_hash_this(hi, (void *)&item, NULL, (void *)&entry);
>>
>>       ...
>>     }
>>     
>
> I don't think that's valid, although it will work on most common
> platforms today.  The problem is that in C a void* pointer can be
> bigger than any particular data pointer.  When the compiler assigns
> to/from void* it adjusts for size as necessary, but the cast above
> defeats that.  So inside apr_hash_this a void* value will get written
> into the space occupied by entry and that space might not be big
> enough.
>   

Strictly speaking, no, it's not valid. And aliasing *might* break
if/when GCC does whole-program optimization. Neither is likely to cause
real problems, though.


-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366770

Re: [RFC] Simplify use of apr_hash_this()

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Jul 06, 2009 at 06:25:56PM +0100, Julian Foad wrote:
> Is this a good version that I can go ahead with now?

I like it.

Stefan

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
Arfrever Frehtes Taifersar Arahesis wrote:
> 2009-07-09 14:45:57 Julian Foad napisał(a):
> > This version seems OK so I committed it in r38381 (implementation)
> 
> The return types of functions should be on separate lines.

Thanks.
r38385.

> Do you plan to send similar patch (with s/svn_//) to APR upstream?

I didn't have this on my to-do list but it's a good idea. I will.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369431


Re: [RFC] Simplify use of apr_hash_this()

Posted by Arfrever Frehtes Taifersar Arahesis <Ar...@GMail.Com>.
2009-07-09 14:45:57 Julian Foad napisał(a):
> Julian Foad wrote:
> [...]
> > This version?
> 
> This version seems OK so I committed it in r38381 (implementation)

The return types of functions should be on separate lines.
Do you plan to send similar patch (with s/svn_//) to APR upstream?

-- 
Arfrever Frehtes Taifersar Arahesis

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:
> I may or may not continue with converting code to use it; anyone else is
> welcome to. (I'll let the current commits settle for a bit, first.)

I converted libsvn_client in r38421. I have no further conversions in
the pipeline.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2371274

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:
[...]
> This version?

This version seems OK so I committed it in r38381 (implementation) and
r38382 (a start on using it to clean up existing code).

I was pleased to discover a bug (fix: r38380) in the process of going
through and starting to use it. I think this usage makes it much easier
to spot and avoid that kind of bug.

I may or may not continue with converting code to use it; anyone else is
welcome to. (I'll let the current commits settle for a bit, first.)

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369383

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
On Mon, 2009-07-06 at 19:53 +0200, Greg Stein wrote:
> On Mon, Jul 6, 2009 at 19:25, Julian Foad<ju...@btopenworld.com> wrote:
> >...
> > OK, I've attached a patch, <apr_hash_this-def-2.patch>.
> 
> The concept is good, though I'd suggest changing the HI parameter to a
> const, and un-const'ing it the implementation. When these move into
> APR, they should be made const.

OK. (I wondered about this and decided that there was a de-facto
assumption in APR that all references to 'hi' should be non-const even
when logically constant. But I will be glad to have the API
const-correct. It will give us "casting away const" warnings in the
implementations but I think it's OK to have a small fixed number of them
for reasons like this.)

> > I chose the names apr_hash_index_key(), apr_hash_index_klen(),
> > apr_hash_index_val(). They are in APR namespace because, like our
> > APR_ARRAY_IDX and APR_ARRAY_PUSH, they are APR-level functions that are
> > candidates for going into APR. (APR_ARRAY_IDX and APR_ARRAY_PUSH made it
> > into APR 1.3.)
> 
> Those are preprocessor macros, and are easy to *avoid* defining when
> the correct APR comes along.
> 
> That is not the case for functions.
> 
> Consider what happens when they become defined in (say) APR 1.5. Now
> the symbol is visible in *both* libapr, and libsvn. Not good.
> 
> Not to mention duplicate declaration fiascos.

OK. The prefix "svn_hash_" is already used to refer to a text dump
format, so how about

  svn_apr_hash_index_key()
  svn_apr_hash_index_klen()
  svn_apr_hash_index_val()

Those names are getting a bit long but maybe not too long. I think
they're OK.

> > The definitions are in svn_types.h, because they are needed throughout
> > the code base and we have had other APR helper APIs there.
> 
> Yes.
> 
> > The implementations are in subversion/libsvn_subr/iter.c because,
> > although their use is an alternative rather than a complement to
> > svn_iter_*(), they are closely related.
> 
> *shrug*

(Not a great reason, I know. Just a good enough place for starters. It's
easy to move them within libsvn_subr if we think of a better place.)

> > This now works, and you can see how much simpler it makes the usage in
> > the attached <apr_hash_this-use-2.patch>. In this patch I have
> > simplified the usage in the first several C files in
> > subversion/libsvn_client/ and all the C files in subversion/svn*/.
> 
> Yup. Definitely cleaner.
> 
> I'd even suggest that in some cases, the intermediate variable can go
> away entirely. For cases like:
> 
>   const char *value = apr_hash_index_val(hi);
> 
>   some_function(value);
> 
> Could be:
> 
>   some_function(apr_hash_index_val(hi));
> 
> Since the cast will be implied by some_function's prototype.

Yes, that will be a valid option at the point of use.

> > Is this a good version that I can go ahead with now?
> 
> Not yet :-P

Thanks for the quick review.

This version?

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2368510

Re: [RFC] Simplify use of apr_hash_this()

Posted by Greg Stein <gs...@gmail.com>.
On Mon, Jul 6, 2009 at 19:25, Julian Foad<ju...@btopenworld.com> wrote:
>...
> OK, I've attached a patch, <apr_hash_this-def-2.patch>.

The concept is good, though I'd suggest changing the HI parameter to a
const, and un-const'ing it the implementation. When these move into
APR, they should be made const.

Also...

> I chose the names apr_hash_index_key(), apr_hash_index_klen(),
> apr_hash_index_val(). They are in APR namespace because, like our
> APR_ARRAY_IDX and APR_ARRAY_PUSH, they are APR-level functions that are
> candidates for going into APR. (APR_ARRAY_IDX and APR_ARRAY_PUSH made it
> into APR 1.3.)

Those are preprocessor macros, and are easy to *avoid* defining when
the correct APR comes along.

That is not the case for functions.

Consider what happens when they become defined in (say) APR 1.5. Now
the symbol is visible in *both* libapr, and libsvn. Not good.

Not to mention duplicate declaration fiascos.

> The definitions are in svn_types.h, because they are needed throughout
> the code base and we have had other APR helper APIs there.

Yes.

> The implementations are in subversion/libsvn_subr/iter.c because,
> although their use is an alternative rather than a complement to
> svn_iter_*(), they are closely related.

*shrug*

>...
> This now works, and you can see how much simpler it makes the usage in
> the attached <apr_hash_this-use-2.patch>. In this patch I have
> simplified the usage in the first several C files in
> subversion/libsvn_client/ and all the C files in subversion/svn*/.

Yup. Definitely cleaner.

I'd even suggest that in some cases, the intermediate variable can go
away entirely. For cases like:

  const char *value = apr_hash_index_val(hi);

  some_function(value);

Could be:

  some_function(apr_hash_index_val(hi));

Since the cast will be implied by some_function's prototype.

> Is this a good version that I can go ahead with now?

Not yet :-P

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2368457

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
> On Wed, 2009-07-01 at 11:17 -0400, Greg Hudson wrote:
> > This leads me to believe that the interfaces we really want are:
> > 
> > const void *apr_hash_key(hi)
> > void *apr_hash_val(hi)
[...]

OK, I've attached a patch, <apr_hash_this-def-2.patch>.

I chose the names apr_hash_index_key(), apr_hash_index_klen(),
apr_hash_index_val(). They are in APR namespace because, like our
APR_ARRAY_IDX and APR_ARRAY_PUSH, they are APR-level functions that are
candidates for going into APR. (APR_ARRAY_IDX and APR_ARRAY_PUSH made it
into APR 1.3.)

The definitions are in svn_types.h, because they are needed throughout
the code base and we have had other APR helper APIs there.

The implementations are in subversion/libsvn_subr/iter.c because,
although their use is an alternative rather than a complement to
svn_iter_*(), they are closely related.


> we could then write
> 
>   for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
>     {
>       const char *item = apr_hash_key(hi);
>       svn_wc_entry_t *entry = apr_hash_val(hi);
> 
>       ...
>     }

This now works, and you can see how much simpler it makes the usage in
the attached <apr_hash_this-use-2.patch>. In this patch I have
simplified the usage in the first several C files in
subversion/libsvn_client/ and all the C files in subversion/svn*/.

Is this a good version that I can go ahead with now?

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2368450

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
On Wed, 2009-07-01 at 11:17 -0400, Greg Hudson wrote:
> On Wed, 2009-07-01 at 00:45 +0100, Julian Foad wrote:
> > In a minority of instances, we are using explicit casts:
> 
> I observe that in every instance you cited, one of the arguments passed
> is NULL.

In those instances I cited and also throughout all our uses of
apr_hash_this(), in most cases both key and value are wanted, but the
key-length argument is NULL.

> This leads me to believe that the interfaces we really want are:
> 
> const void *apr_hash_key(hi)
> void *apr_hash_val(hi)
> 
> which nicely sidestep the aliasing issue by returning the value instead
> of using a pointer argument.  Even in the (apparently rare) cases where
> we want the key and the value, it may be more elegant to make two
> function calls than to use a single call with temporaries, or a
> multi-line macro.

Even though we usually want both key and value, that could well be a
nice solution. For one thing, it enables us to initialize the variables
at their declaration. Even better than the hypothetical...

  for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
    {
      const char *item;
      svn_wc_entry_t *entry;

      apr_hash_this(hi, &item, NULL, &entry);  /* WISHFUL THINKING */

      ...
    }

we could then write

  for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
    {
      const char *item = apr_hash_key(hi);
      svn_wc_entry_t *entry = apr_hash_val(hi);

      ...
    }


> Of course, those APIs do not exist in APR (right now), so we'd have to
> define them ourselves.

Just to point out for anyone reading this who may not be familiar with
the history: That's perfectly fine, and we have a precedent of defining
useful APR extensions that are sometimes later incoprporated into APR
itself.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2367091

Re: [RFC] Simplify use of apr_hash_this()

Posted by Greg Hudson <gh...@mit.edu>.
On Wed, 2009-07-01 at 00:45 +0100, Julian Foad wrote:
> In a minority of instances, we are using explicit casts:

I observe that in every instance you cited, one of the arguments passed
is NULL.

This leads me to believe that the interfaces we really want are:

const void *apr_hash_key(hi)
void *apr_hash_val(hi)

which nicely sidestep the aliasing issue by returning the value instead
of using a pointer argument.  Even in the (apparently rare) cases where
we want the key and the value, it may be more elegant to make two
function calls than to use a single call with temporaries, or a
multi-line macro.

Of course, those APIs do not exist in APR (right now), so we'd have to
define them ourselves.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2367072

Re: [RFC] Simplify use of apr_hash_this()

Posted by Steve Sisak <sg...@codewell.com>.
At 12:45 AM +0100 7/1/09, Julian Foad wrote:
>I got confused by the fact that we use temp vars in many instances but
>explicit casts in other instances (see below). This made me think the
>two approaches were equally correct. I think I understand now that:
>
>   - the temp vars are essential for a strictly correct solution;
>   - we already use temp vars and get a strictly correct solution in the
>majority of cases;
>   - I was proposing (in my "simple" solution) doing away with the temp
>vars everywhere;
>   - so that "solution" would have brought every instance down to the
>lowest common denominator of correctness.

Incidentally, it's worth adding that, not only are the temp vars 
necessary for correctness, but they're unlikely to harm performance 
and may actually improve it because they make life easier for the 
optimizer.

When you take the address of a variable (in C), it can no longer be 
placed in a register -- unless the compiler figures out you meant and 
generates a temporary variable for you.

With modern compilers, it's generally better to be explicit in your 
code so the compiler has a better chance optimize it correctly.

Cheers,

-Steve

P.S. You might want to see what warnings the LLVM static analyzer 
generates for this code.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366912

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> Julian Foad <ju...@btopenworld.com> writes:
> 
> > We're still not on the same page. That's not "the" problem, but "a"
> > problem. The problem you are describing is that apr_hash_this() is not
> > safe on mixed-pointer-size systems. I believe that is a problem with the
> > API of apr_hash_this().

Oops, it is a safe API when used with temp vars. I lost sight of that.

> It's standard C, it's even in the comp.lang.c FAQ:
> 
> http://www.c-faq.com/ptrs/genericpp.html

Thanks for persisting, Philip. That helped to clarify it.

I got confused by the fact that we use temp vars in many instances but
explicit casts in other instances (see below). This made me think the
two approaches were equally correct. I think I understand now that:

  - the temp vars are essential for a strictly correct solution;
  - we already use temp vars and get a strictly correct solution in the
majority of cases;
  - I was proposing (in my "simple" solution) doing away with the temp
vars everywhere;
  - so that "solution" would have brought every instance down to the
lowest common denominator of correctness.

Whether the present code in which some instances are invalid is better
than having all instances consistently invalid is debatable, but I see
your point.

[...]
> > I am trying to solve the problem of source code verbosity. We currently
> > use verbose code (either temporary variables or explicit casts) at the
> 
> Are we using explicit casts?  I thought we always used temporary
> variables.

In a minority of instances, we are using explicit casts:

:grep "apr_hash_this(.*(" subversion/*/*.c subversion/*/*/*.c
subversion/libsvn_client/repos_diff.c:1429:
  apr_hash_this(hi, &deleted_path, NULL, (void *)&dpn);
subversion/libsvn_ra/compat.c:811:
  apr_hash_this(hi, (void *) &path, NULL, &val);
subversion/libsvn_repos/hooks.c:439:
  apr_hash_this(hi, (void *)&token, NULL, &val);
subversion/libsvn_repos/log.c:1193:
  apr_hash_this(hi, (void *) &rp->path, NULL,
subversion/libsvn_repos/rev_hunt.c:1229:
  apr_hash_this(hi, (void *) &path, NULL, (void *) &rangelist);
subversion/svndumpfilter/main.c:541:
  apr_hash_this(hi, (const void **) &key, NULL, &val);
subversion/svn/log-cmd.c:370:
  apr_hash_this(hi, (void *) &path, NULL, &val);
subversion/svn/log-cmd.c:549:
  apr_hash_this(hi, (void *)&property, NULL, (void *)&value);
subversion/mod_dav_svn/reports/log.c:119:
  apr_hash_this(hi, (void *)&name, NULL, (void *)&value);
subversion/mod_dav_svn/reports/log.c:169:
  apr_hash_this(hi, (void *) &path, NULL, &val);

> > point of use of apr_hash_this() to achieve correct and warning-free
> > compilation (on a restricted but generally sufficient set of machine
> > architectures), whereas we could (and I want to) achieve the same with
> > terse code.
> >
> > I believe that introducing intermediate type casts to the arguments to
> > apr_hash_this() silences the "incompatible pointer" warnings without
> > being any more unsafe than the API already is, and without worsening the
> > mixed-pointer-size problem that already exists.
> 
> If you are replacing the temporary variables with casts you are making
> the code worse (strictly speaking you are making the code invalid).

Right, I think I get it now.


So, I'll go and refine the version that uses temp vars.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366883

Re: [RFC] Simplify use of apr_hash_this()

Posted by Philip Martin <ph...@codematters.co.uk>.
Julian Foad <ju...@btopenworld.com> writes:

> We're still not on the same page. That's not "the" problem, but "a"
> problem. The problem you are describing is that apr_hash_this() is not
> safe on mixed-pointer-size systems. I believe that is a problem with the
> API of apr_hash_this().

It's standard C, it's even in the comp.lang.c FAQ:

http://www.c-faq.com/ptrs/genericpp.html

> I am not trying to solve that problem. I am happy to let that problem
> continue to exist.
>
> I am trying to solve the problem of source code verbosity. We currently
> use verbose code (either temporary variables or explicit casts) at the

Are we using explicit casts?  I thought we always used temporary
variables.

> point of use of apr_hash_this() to achieve correct and warning-free
> compilation (on a restricted but generally sufficient set of machine
> architectures), whereas we could (and I want to) achieve the same with
> terse code.
>
> I believe that introducing intermediate type casts to the arguments to
> apr_hash_this() silences the "incompatible pointer" warnings without
> being any more unsafe than the API already is, and without worsening the
> mixed-pointer-size problem that already exists.

If you are replacing the temporary variables with casts you are making
the code worse (strictly speaking you are making the code invalid).

> I'm not clear whether you think I'm trying to (or need to) solve the
> mixed-pointer-size problem, or whether you think my addition of an
> intermediate cast introduces the mixed-pointer-size problem or makes it
> worse, or something else.
>
> Another clarification please?

I think we should be writing valid code.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366795

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> Julian Foad <ju...@btopenworld.com> writes:
> > Thanks for the response.
> >
> > The warnings I'm talking about (that I get from gcc if I use neither
> > casts nor temp vars with apr_hash_this()) are:
> >
> >   warning: passing argument 2 [or 4] of 'apr_hash_this' from
> > incompatible pointer type
> >
> > I'm not sure if you're looking at the same issue.
> 
> That's exactly the problem I described.

Maybe the difference is that you are thinking about the fact that the
types are incompatible (true, but not my concern), whereas I am thinking
about whether a warning is issued. A warning is issued if the conversion
is implicit but can be suppressed by making that conversion explicit. My
goal is to silence the warning without affecting the correctness (or, as
you point out, incorrectness) of the code. See below on why it doesn't
worsen the (in)correctness.

[...]

> The problem is that "svn_wc_entry_t*" isn't guaranteed to be big
> enough to hold the void* value that gets written by apr_hash_this.  On
[...]

We're still not on the same page. That's not "the" problem, but "a"
problem. The problem you are describing is that apr_hash_this() is not
safe on mixed-pointer-size systems. I believe that is a problem with the
API of apr_hash_this().

I am not trying to solve that problem. I am happy to let that problem
continue to exist.

I am trying to solve the problem of source code verbosity. We currently
use verbose code (either temporary variables or explicit casts) at the
point of use of apr_hash_this() to achieve correct and warning-free
compilation (on a restricted but generally sufficient set of machine
architectures), whereas we could (and I want to) achieve the same with
terse code.

I believe that introducing intermediate type casts to the arguments to
apr_hash_this() silences the "incompatible pointer" warnings without
being any more unsafe than the API already is, and without worsening the
mixed-pointer-size problem that already exists.

I'm not clear whether you think I'm trying to (or need to) solve the
mixed-pointer-size problem, or whether you think my addition of an
intermediate cast introduces the mixed-pointer-size problem or makes it
worse, or something else.

Another clarification please?


> > The issue of whether (void **) is compatible with (&entry) is a separate
> > issue that has always existed with this APR API. Is that the
> > "type-punned" issue that you're talking about?
> 
> Type-punning is a separate issue, I find it hard to work out whether
> it's a false alarm partly because the code has this other incompatible
> pointer problem.
> 
> >
> >> > +#if 1
> >> > +#define APR_HASH_THIS(hi,key,klen,val) \
> >> > +  apr_hash_this((hi), (void *)(key), (klen), (void *)(val))
> >> 
> >> This is not valid for the reason above.
> >> 
> >> > +#else
> >> > +#define APR_HASH_THIS(hi,key,klen,val) \
> >> > +  do { \
> >> > +    const void *__svn_key; \
> >> > +    apr_ssize_t __svn_klen; \
> >> > +    void *__svn_val; \
> >> > +    apr_hash_this((hi), &__svn_key, &__svn_klen, &__svn_val); \
> >> > +    if (key) *((const void **)key) = __svn_key; \
> >> > +    if (klen) *((apr_ssize_t *)klen) = __svn_klen; \
> >> > +    if (val) *((void **)val) = __svn_val; \
> >> > +  } while (0)
> >> > +#endif
> >> 
> >> That's not valid because the casts in the three assignment lines have
> >> the same problem.  It should be OK without the casts.
> >
> > Here I agree that these casts are unsafe for systems on which pointers
> > are not all the same size. I added these casts only to be able to
> > compile this code without warnings when the "calling" code passes NULL
> > for key/klen/val. If I need to use an implementation like this one I
> > will look for a different way to achieve that.
> 
> Three macros perhaps?
> 
> APR_HASH_THIS_VAL(hi,val)
> APR_HASH_THIS_KEY_VAL(hi,key,val)
> APR_HASH_THIS_KEY_KLEN_VAL(hi,key,klen,val)

Yup, that will be a pretty good option if we can't agree on the simple
solution.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366788

Re: [RFC] Simplify use of apr_hash_this()

Posted by Philip Martin <ph...@codematters.co.uk>.
Julian Foad <ju...@btopenworld.com> writes:

> Philip Martin wrote:
>> Julian Foad <ju...@btopenworld.com> writes:
>> 
>> This is gcc's "type-punned pointer will break strict-aliasing"
>> warning;
>
> Thanks for the response.
>
> The warnings I'm talking about (that I get from gcc if I use neither
> casts nor temp vars with apr_hash_this()) are:
>
>   warning: passing argument 2 [or 4] of 'apr_hash_this' from
> incompatible pointer type
>
> I'm not sure if you're looking at the same issue.

That's exactly the problem I described.

>>  I've never been able to decide whether this is a false alarm
>> or not.  The gcc documentation states that false alarms may occur but
>> that there should be very few of them.  However even if it is a false
>> alarm the code is suspect for another reason.
>> 
>> > which is what we normally do, or add explicit type casts:
>> >
>> >   for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
>> >     {
>> >       const char *item;
>> >       svn_wc_entry_t *entry;
>> >
>> >       apr_hash_this(hi, (void *)&item, NULL, (void *)&entry);
>> >
>> >       ...
>> >     }
>> 
>> I don't think that's valid, although it will work on most common
>> platforms today.  The problem is that in C a void* pointer can be
>> bigger than any particular data pointer.  When the compiler assigns
>> to/from void* it adjusts for size as necessary, but the cast above
>> defeats that.  So inside apr_hash_this a void* value will get written
>> into the space occupied by entry and that space might not be big
>> enough.
>
> Are you sure you're looking at the right level of indirection?

Yes, I think I am.

> apr_hash_this() is a function whose prototype ensures its last argument
> gets converted to type (void **). All we're doing here is casting
> (&entry) to type (void *) before it gets converted automatically to
> (void **), instead of letting the conversion happen in a single step. I
> assume that the result of this pair of type conversions would be the
> same as the single step, on all platforms. The result is the same, but,
> by making an intermediate explicit cast, we avoid the "converting
> between incompatible pointer target types" warning.

The problem is that "svn_wc_entry_t*" isn't guaranteed to be big
enough to hold the void* value that gets written by apr_hash_this.  On
most platforms all data pointers are the same size, but that's not
guaranteed by the standard.  Suppose svn_wc_entry_t* is 32bits but
void* is 64bits.  You are passing the address of a 32bit block of
memory into apr_hash_this but within the function the code will write
a 64bit value.  It's like passing the address of a 32bit int into a
function that writes a 64bit int; casting doesn't solve the problem:

     void foo(u64 *p) { *p = 0; }

     u32 i;
     foo(&i);        /* incompatible pointer warning */
     foo((u64*)&i); /* no warning, but invalid code */


> The issue of whether (void **) is compatible with (&entry) is a separate
> issue that has always existed with this APR API. Is that the
> "type-punned" issue that you're talking about?

Type-punning is a separate issue, I find it hard to work out whether
it's a false alarm partly because the code has this other incompatible
pointer problem.

>
>> > +#if 1
>> > +#define APR_HASH_THIS(hi,key,klen,val) \
>> > +  apr_hash_this((hi), (void *)(key), (klen), (void *)(val))
>> 
>> This is not valid for the reason above.
>> 
>> > +#else
>> > +#define APR_HASH_THIS(hi,key,klen,val) \
>> > +  do { \
>> > +    const void *__svn_key; \
>> > +    apr_ssize_t __svn_klen; \
>> > +    void *__svn_val; \
>> > +    apr_hash_this((hi), &__svn_key, &__svn_klen, &__svn_val); \
>> > +    if (key) *((const void **)key) = __svn_key; \
>> > +    if (klen) *((apr_ssize_t *)klen) = __svn_klen; \
>> > +    if (val) *((void **)val) = __svn_val; \
>> > +  } while (0)
>> > +#endif
>> 
>> That's not valid because the casts in the three assignment lines have
>> the same problem.  It should be OK without the casts.
>
> Here I agree that these casts are unsafe for systems on which pointers
> are not all the same size. I added these casts only to be able to
> compile this code without warnings when the "calling" code passes NULL
> for key/klen/val. If I need to use an implementation like this one I
> will look for a different way to achieve that.

Three macros perhaps?

APR_HASH_THIS_VAL(hi,val)
APR_HASH_THIS_KEY_VAL(hi,key,val)
APR_HASH_THIS_KEY_KLEN_VAL(hi,key,klen,val)

>
> (Why have I even shown you this particular implementation? Because I
> wrote it before I discovered the simple "#if 1" solution, and could
> hardly believe that the simple solution will be OK.)

It's not.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366775

Re: [RFC] Simplify use of apr_hash_this()

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> Julian Foad <ju...@btopenworld.com> writes:
> 
> This is gcc's "type-punned pointer will break strict-aliasing"
> warning;

Thanks for the response.

The warnings I'm talking about (that I get from gcc if I use neither
casts nor temp vars with apr_hash_this()) are:

  warning: passing argument 2 [or 4] of 'apr_hash_this' from
incompatible pointer type

I'm not sure if you're looking at the same issue.

>  I've never been able to decide whether this is a false alarm
> or not.  The gcc documentation states that false alarms may occur but
> that there should be very few of them.  However even if it is a false
> alarm the code is suspect for another reason.
> 
> > which is what we normally do, or add explicit type casts:
> >
> >   for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
> >     {
> >       const char *item;
> >       svn_wc_entry_t *entry;
> >
> >       apr_hash_this(hi, (void *)&item, NULL, (void *)&entry);
> >
> >       ...
> >     }
> 
> I don't think that's valid, although it will work on most common
> platforms today.  The problem is that in C a void* pointer can be
> bigger than any particular data pointer.  When the compiler assigns
> to/from void* it adjusts for size as necessary, but the cast above
> defeats that.  So inside apr_hash_this a void* value will get written
> into the space occupied by entry and that space might not be big
> enough.

Are you sure you're looking at the right level of indirection?
apr_hash_this() is a function whose prototype ensures its last argument
gets converted to type (void **). All we're doing here is casting
(&entry) to type (void *) before it gets converted automatically to
(void **), instead of letting the conversion happen in a single step. I
assume that the result of this pair of type conversions would be the
same as the single step, on all platforms. The result is the same, but,
by making an intermediate explicit cast, we avoid the "converting
between incompatible pointer target types" warning.

The issue of whether (void **) is compatible with (&entry) is a separate
issue that has always existed with this APR API. Is that the
"type-punned" issue that you're talking about?


> > +#if 1
> > +#define APR_HASH_THIS(hi,key,klen,val) \
> > +  apr_hash_this((hi), (void *)(key), (klen), (void *)(val))
> 
> This is not valid for the reason above.
> 
> > +#else
> > +#define APR_HASH_THIS(hi,key,klen,val) \
> > +  do { \
> > +    const void *__svn_key; \
> > +    apr_ssize_t __svn_klen; \
> > +    void *__svn_val; \
> > +    apr_hash_this((hi), &__svn_key, &__svn_klen, &__svn_val); \
> > +    if (key) *((const void **)key) = __svn_key; \
> > +    if (klen) *((apr_ssize_t *)klen) = __svn_klen; \
> > +    if (val) *((void **)val) = __svn_val; \
> > +  } while (0)
> > +#endif
> 
> That's not valid because the casts in the three assignment lines have
> the same problem.  It should be OK without the casts.

Here I agree that these casts are unsafe for systems on which pointers
are not all the same size. I added these casts only to be able to
compile this code without warnings when the "calling" code passes NULL
for key/klen/val. If I need to use an implementation like this one I
will look for a different way to achieve that.

(Why have I even shown you this particular implementation? Because I
wrote it before I discovered the simple "#if 1" solution, and could
hardly believe that the simple solution will be OK.)

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366763

Re: [RFC] Simplify use of apr_hash_this()

Posted by Philip Martin <ph...@codematters.co.uk>.
Julian Foad <ju...@btopenworld.com> writes:

This is gcc's "type-punned pointer will break strict-aliasing"
warning; I've never been able to decide whether this is a false alarm
or not.  The gcc documentation states that false alarms may occur but
that there should be very few of them.  However even if it is a false
alarm the code is suspect for another reason.

> which is what we normally do, or add explicit type casts:
>
>   for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
>     {
>       const char *item;
>       svn_wc_entry_t *entry;
>
>       apr_hash_this(hi, (void *)&item, NULL, (void *)&entry);
>
>       ...
>     }

I don't think that's valid, although it will work on most common
platforms today.  The problem is that in C a void* pointer can be
bigger than any particular data pointer.  When the compiler assigns
to/from void* it adjusts for size as necessary, but the cast above
defeats that.  So inside apr_hash_this a void* value will get written
into the space occupied by entry and that space might not be big
enough.

> +#if 1
> +#define APR_HASH_THIS(hi,key,klen,val) \
> +  apr_hash_this((hi), (void *)(key), (klen), (void *)(val))

This is not valid for the reason above.

> +#else
> +#define APR_HASH_THIS(hi,key,klen,val) \
> +  do { \
> +    const void *__svn_key; \
> +    apr_ssize_t __svn_klen; \
> +    void *__svn_val; \
> +    apr_hash_this((hi), &__svn_key, &__svn_klen, &__svn_val); \
> +    if (key) *((const void **)key) = __svn_key; \
> +    if (klen) *((apr_ssize_t *)klen) = __svn_klen; \
> +    if (val) *((void **)val) = __svn_val; \
> +  } while (0)
> +#endif

That's not valid because the casts in the three assignment lines have
the same problem.  It should be OK without the casts.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366745