You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2012/11/20 03:16:55 UTC

Re: [lucy-dev] c99-types branch

On Sun, Nov 18, 2012 at 12:01 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> I created a branch 'c99-types' which makes the whole Lucy codebase switch to
> several standard macros defined in the C99 standard. This includes
>
> * Integer min/max macros from stdint.h
>   CHY_I32_MAX => INT32_MAX
> * Integer literal macros from stdint.h
>   CHY_I32_C => INT32_C
> * printf type specifiers from inttypes.h
>   I64P => PRId64
>
> If any of the standard headers isn't found, the macros will be defined by
> Charmonizer.

Nick++

I've checked out the branch and everything looks solid.  Extra kudos for that
big bool_t-to-bool commit, which I imagine was tedious to prepare!

One original design goal of the Charmonizer integer types was to eliminate the
possibility of namespace conflicts with either system headers or arbitrary
libraries.  Whenever non-standard headers might be include (for example in
our XS binding code which pulls in perl.h and friends) the full names for
Charmonizer-created types would be in effect: e.g. `chy_i32_t`.  When we could
be certain that only C standard headers would be in effect (e.g. in most of
the Lucy core code[1]), we could use "short names" like i32_t.   The short
names were still guaranteed not to conflict with anything, since only the
symbols from standard C would be competing and those are a known, finite set.

Other projects define similar types -- e.g. APR has apr_int32_t:
<https://apr.apache.org/docs/apr/trunk/group__apr__platform.html>.

Switching over to stdint.h names for most integer types -- e.g. from
chy_i32_t/i32_t to int32_t -- means that we no longer have to explain our
conventions to new contributors coming up to speed.

However, it also means that we're no longer hermetically sealed off in our own
namespace -- we're futzing around in the same standard namespace with
everybody else, and we have to deal with both incomplete standard environments
and with other people's attempts to "fix" the standard environments which may
conflict with our own.  It's sure been nice having our own 100% reliable
boolean type for this long without having to deal with the standards mess!

>From the perspective of Charmonizer as a component in isolation (which has the
potential to spin off from Lucy one day) I wonder whether it wouldn't be
advantageous to leave intrusions into the standard namespace up to the
consumer:

    #ifndef CHY_HAS_STDINT_H
      #define int32_t chy_i32_t
      ...
    #endif

>From the perspective of Lucy and Clownfish, though, exactly where the names
get defined is an implementation detail, so long as they are still defined in
a common header -- what matters more is that the codebase has migrated towards
C99.

Still, I'm a little uneasy about whether Clownfish ought to support all of C's
fifty bajillion integer types.  On one hand, I would love it if we could kill
all of them save for a single 64-bit integer primitive type and then
normalize all arguments to Clownfish methods to 64 bits.  On the other hand,
Clownfish is a low-level library and so there's an argument to be made that it
should support all system types.  (We already support va_list, int, and size_t
because it was hard to do without them.)

> On top of that, I tested switching from 'chy_bool_t' to 'bool' from
> stdbool.h (or from the C++ compiler when compiling as C++). This is a bit
> problematic since 'bool' is often defined by other headers.

The fact that we require either C99 or C++ for Lucy and Clownfish ought to
make things easier because we are guaranteed to have either stdbool.h or the
C++ `bool` type available.  Our `bool` is guaranteed to come from the system
one way or another.

Since Charmonizer only requires C89, it's not possible for Charmonizer to
resolve this problem in the general case.  If you want to fake up a bool type
for C89, you need to `typedef bool` to something -- `char` and `int` are both
popular -- and there's significant potential for conflict with someone who
has made a similar but incompatible hack.

My understanding is that the stdbool.h `bool` and the C++ `bool` are not
technically compatible, but I hope that won't matter in practice.  Hopefully
no system vendor will define them to be different sizes.

> Unfortunately, also the Perl headers define their own version of 'bool'
> under certain conditions. As a work-around, I had to add -DHAS_BOOL to the
> build options to make the Linux and Windows build work. This looks a bit
> sketchy and might cause problems with other Perl versions.

The workaround doesn't seem any sketchier to me than the fact that Perl
defines its own `bool` in the first place.

After your changes, our `bool` is either a macro defined in stdbool.h or a C++
builtin.  It's not a careless hack.

*   If the Perl headers are conflicting with stdbool.h, that's messed up --
    maybe it happens because Perl wants to use the keyword `bool`, but also
    wants to specify `-std=c89`?
*   If the Perl headers are conflicting with C++, that's more understandable.
    But if the Perl devs have seen fit to provide a flag which enables C++
    compatibility, let's take advantage of it.

> I successfully tested:
>
> - MacOS 10.8 with stock Perl 5.12 (64-bit)
> - Debian sid with stock Perl 5.14 (32-bit)
> - Windows 7 with ActivePerl 5.16 and MSVC6 (32-bit)

Nice!

Marvin Humphrey

[1] Except for places like FSFileHandle.c which need e.g. windows.h or other
    OS-specific functionality.

Re: [lucy-dev] c99-types branch

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Nov 20, 2012 at 1:09 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Tue, Nov 20, 2012 at 7:45 AM, Nick Wellnhofer <we...@aevum.de> wrote:
>> I'm not sure whether HAS_BOOL is really meant to be used for C++
>> compatibility. I'd really like to see whether the bool changes work with
>> oldest Perl we officially support.

Perl 5.8.3 on CentOS 5: all tests pass.

Marvin Humphrey

Re: [lucy-dev] c99-types branch

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Nov 20, 2012 at 7:45 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 20/11/2012 03:16, Marvin Humphrey wrote:
>>  From the perspective of Charmonizer as a component in isolation (which has
>>  the potential to spin off from Lucy one day) I wonder whether it wouldn't
>>  be advantageous to leave intrusions into the standard namespace up to the
>>  consumer:
>>
>>      #ifndef CHY_HAS_STDINT_H
>>        #define int32_t chy_i32_t
>>        ...
>>      #endif
>
>
> We could keep the old Integers module, and rename the new version to
> C99_Integers or so. But I think Charmonizer users should either use
> (chy_)i32_t or int32_t. I can see no value in using both names at the same
> time.

+1

Regardless, there's no hurry to muck with all the good efforts you just put in
getting C99 working for Lucy.

Charmonizer's interface for specifying which symbols you need it to define is
insufficiently granular for a general-purpose tool.  What we have works fine
for Lucy and Clownfish today because Charmonizer's modules happen to produce
exactly what we require, but that changes as soon as Charmonizer starts
needing to support other projects.  (A fun exercise would be to try to replace
arbitrary configuration scripts.)

If and when Charmonizer gets its own independent public API, we'll probably
want to provide functions which support either C99 names, chy_ names, or names
with arbitrary prefixes like "lucy_" or "apr_".  But we can cross that bridge
when we come to it.

In the meantime, as of Lucy 0.4.0 anyone who wants to use Charmonizer will be
able to grab the amalgamated charmonizer.c file and either use it as-is or
hack it to meet their needs.  IMO that's a "good enough" informal, provisional
API for a long time to come -- allowing us to either focus on accumulating
and testing probes (the hard, important work for a configuration prober like
Charmonizer), or just rest on our laurels since Charmonizer already does
pretty much what we need it to.

> Maybe Charmonizer should also provide two separate modules for booleans.

People can always do stuff like this:

    #ifdef __cplusplus
      typedef my_bool bool;
    #elif defined(CHY_HAS_STDBOOL_H)
      #include <stdbool.h>
      typedef my_bool bool;
    #else
      typedef my_bool char;
    #endif

Sorting out that kind of logic in your header is not nearly as annoying as
figuring out how to discover CHY_HAS_STDBOOL_H in the first place on every
modern compilation platform.

But sure, we could offer a feature to define `bool` one way or another --
documented with appropriate caveats. :)

>> The workaround doesn't seem any sketchier to me than the fact that Perl
>> defines its own `bool` in the first place.
>>
>> After your changes, our `bool` is either a macro defined in stdbool.h or a
>> C++ builtin.  It's not a careless hack.
>
> Right, but compiling with HAS_BOOL might break with certain Perl versions.
> After all, we use a different definition of 'bool' than what Perl was
> compiled with.

OK, so that means that e.g. if the stdbool.h _Bool is typedef'd to char but
Perl's `bool` is typedef'd to an int or an enum, then invoking a function from
Perl's XS api which takes a `bool` argument would break.  Hrmm.... hopefully
that never happens.

> I'm not sure whether HAS_BOOL is really meant to be used for C++
> compatibility. I'd really like to see whether the bool changes work with
> oldest Perl we officially support.

We officially support back to Perl 5.8.3 (with some build-time requirements
like Module::Build and Devel::PPPort needing to be at the state they were for
5.10.0).

The logic in handy.h for 5.8.3 looks like it will still work with -DHAS_BOOL.

    http://perl5.git.perl.org/perl.git/blob/f0095d0783d416c5b117e7b415e301cfea24ac44:/handy.h

FWIW, everything seems OK with the c99-types branch on a custom-compiled Perl
5.10.0 on CentOS 5.

Marvin Humphrey

Re: [lucy-dev] c99-types branch

Posted by Nick Wellnhofer <we...@aevum.de>.
On 20/11/2012 03:16, Marvin Humphrey wrote:
> I've checked out the branch and everything looks solid.  Extra kudos for that
> big bool_t-to-bool commit, which I imagine was tedious to prepare!

With a little help from sed, it was actually quite easy.

>  From the perspective of Charmonizer as a component in isolation (which has the
> potential to spin off from Lucy one day) I wonder whether it wouldn't be
> advantageous to leave intrusions into the standard namespace up to the
> consumer:
>
>      #ifndef CHY_HAS_STDINT_H
>        #define int32_t chy_i32_t
>        ...
>      #endif

We could keep the old Integers module, and rename the new version to 
C99_Integers or so. But I think Charmonizer users should either use 
(chy_)i32_t or int32_t. I can see no value in using both names at the 
same time.

>> On top of that, I tested switching from 'chy_bool_t' to 'bool' from
>> stdbool.h (or from the C++ compiler when compiling as C++). This is a bit
>> problematic since 'bool' is often defined by other headers.
>
> The fact that we require either C99 or C++ for Lucy and Clownfish ought to
> make things easier because we are guaranteed to have either stdbool.h or the
> C++ `bool` type available.  Our `bool` is guaranteed to come from the system
> one way or another.
>
> Since Charmonizer only requires C89, it's not possible for Charmonizer to
> resolve this problem in the general case.  If you want to fake up a bool type
> for C89, you need to `typedef bool` to something -- `char` and `int` are both
> popular -- and there's significant potential for conflict with someone who
> has made a similar but incompatible hack.

Maybe Charmonizer should also provide two separate modules for booleans.

> My understanding is that the stdbool.h `bool` and the C++ `bool` are not
> technically compatible, but I hope that won't matter in practice.  Hopefully
> no system vendor will define them to be different sizes.

stdbool.h defines 'bool' as '_Bool', which is a real boolean type 
defined in C99. It can only have the two distinct values true or false 
like 'bool' in C++. I can't imagine that a single compiler suite would 
use different sizes for C++ 'bool' and C99 '_Bool', so both types should 
be compatible.

>> Unfortunately, also the Perl headers define their own version of 'bool'
>> under certain conditions. As a work-around, I had to add -DHAS_BOOL to the
>> build options to make the Linux and Windows build work. This looks a bit
>> sketchy and might cause problems with other Perl versions.
>
> The workaround doesn't seem any sketchier to me than the fact that Perl
> defines its own `bool` in the first place.
>
> After your changes, our `bool` is either a macro defined in stdbool.h or a C++
> builtin.  It's not a careless hack.

Right, but compiling with HAS_BOOL might break with certain Perl 
versions. After all, we use a different definition of 'bool' than what 
Perl was compiled with.

> *   If the Perl headers are conflicting with stdbool.h, that's messed up --
>      maybe it happens because Perl wants to use the keyword `bool`, but also
>      wants to specify `-std=c89`?

I think proper support for stdbool.h only came with Perl 5.16. 5.14 and 
below are conflicting with stdbool.h. See here:

https://metacpan.org/source/JESSE/perl-5.14.0/handy.h#L73
https://metacpan.org/source/RJBS/perl-5.16.2/handy.h#L73

Only the stock Perl on MacOS seems to be an exception where earlier 
versions are patched to use stdbool.h.

> *   If the Perl headers are conflicting with C++, that's more understandable.
>      But if the Perl devs have seen fit to provide a flag which enables C++
>      compatibility, let's take advantage of it.

I'm not sure whether HAS_BOOL is really meant to be used for C++ 
compatibility. I'd really like to see whether the bool changes work with 
oldest Perl we officially support.

Nick