You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2013/07/21 01:25:44 UTC

[lucy-dev] Decoupling from Charmonizer

Hello lucy-dev,

I think we should make it possible to create Clownfish-based projects which don't depend on Charmonizer. I already made some changes in that direction, but the following still needs to be done:

    * Don't include charmony.h from parcel.h but directly from
      source files.
    * Eliminate the dependency of XSBind.h on charmony.h.
    * Eliminate the dependency of CFC::Perl::Build on Charmonizer.

Any objections?

Nick


Re: [lucy-dev] Decoupling from Charmonizer

Posted by Nick Wellnhofer <we...@aevum.de>.
On 23/07/2013 06:17, Marvin Humphrey wrote:
> Endian helpers are definitely useful.  But are they an essential part of the
> core Clownfish runtime?  JSON parsing is also useful -- is it core to
> Clownfish?  Isn't Clownfish ambitious enough already?
>
> I could understand if you were making the argument that it's not worth the
> effort to move the endian stuff elsewhere, but if you're arguing that it
> really belongs where it is, we may have different visions for the scope of the
> Clownfish runtime.

I wouldn't say that the endian helpers are essential for Clownfish, but 
it is part of my vision that the Clownfish runtime provides some basics 
like that.

> Regardless, we still don't need to determine endianness with a probe.  Here's
> a static inline function for determining endianness which the compiler will
> optimize away:

> I'm not sure whether it is undefined behavior according to various C/C++
> standards to write to one member in a union then read a different member, but
> this code yielded identical results using gcc and g++ on Linux, and clang in
> both C and C++ mode on OS X.

This shouldn't be a problem. I'm more worried that some compilers might 
not optimize the probe away.

> Let's go back to your list of platform-specific code the runtime currently
> depends on:

Even if we find ways to detect every platform feature via macros or at 
runtime, what's wrong with the feature tests that we already have? They 
look more robust to me.

Nick


Re: [lucy-dev] Decoupling from Charmonizer

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Jul 22, 2013 at 1:43 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> On Jul 22, 2013, at 17:30 , Marvin Humphrey <ma...@rectangular.com> wrote:
>> On Sun, Jul 21, 2013 at 12:32 PM, Nick Wellnhofer <we...@aevum.de> wrote:
>>> Besides, I already introduced platform-dependent code in cfish_parcel.h. But
>>> we can put the platform-dependent parts in a separate header file like
>>> cfish_platform.h.
>>
>> OK, that works.
>
> This is now implemented in branch 'charmonizer-decoupling'.

Nice. :)

> The endian helpers might be useful for other modules. I'd keep them in
> Clownfish.

Endian helpers are definitely useful.  But are they an essential part of the
core Clownfish runtime?  JSON parsing is also useful -- is it core to
Clownfish?  Isn't Clownfish ambitious enough already?

I could understand if you were making the argument that it's not worth the
effort to move the endian stuff elsewhere, but if you're arguing that it
really belongs where it is, we may have different visions for the scope of the
Clownfish runtime.

Regardless, we still don't need to determine endianness with a probe.  Here's
a static inline function for determining endianness which the compiler will
optimize away:

    $ cat endian.c
    #include <stdio.h>

    static inline int
    SI_machine_is_big_endian(void) {
        union { int i; char chars[sizeof(int)]; } num = {1};
        return !num.chars[0];
    }

    void
    foo(void) {
      if (SI_machine_is_big_endian()) {
        printf("Big!\n");
      }
    }

    $ gcc -fomit-frame-pointer -O2 -c endian.c -o endian.o
    $ objdump -d endian.o

    endian.o:     file format elf32-i386

    Disassembly of section .text:

    00000000 <foo>:
       0: f3 c3                 repz ret
    $

I'm not sure whether it is undefined behavior according to various C/C++
standards to write to one member in a union then read a different member, but
this code yielded identical results using gcc and g++ on Linux, and clang in
both C and C++ mode on OS X.

Let's go back to your list of platform-specific code the runtime currently
depends on:

    * (Emulated) C99 bool type
    * (Emulated) C99 integer types

Only an issue on MSVC.  Should be doable.

    * alloca

This one's tricky, but maybe we can get away with preferring
`__builtin_alloca` except under MSVC.

    * Variadic macros

By now, I think we can assume good support for ISO-style variadic macros
except on MSVC.

    * Endianness

See above.

    * __func__ macro

Defined in C99, and the rest we can deal with via _MSC_VER.

    * Symbol visibility

This I'm less sure of, but I think we may be able to switch up around the GCC
version number.

    * u64 to double conversion for MSVC6

Seems like it shouldn't be a problem (but I didn't write that code).

Marvin Humphrey

Re: [lucy-dev] Decoupling from Charmonizer

Posted by Nick Wellnhofer <we...@aevum.de>.
On Jul 22, 2013, at 17:30 , Marvin Humphrey <ma...@rectangular.com> wrote:

> On Sun, Jul 21, 2013 at 12:32 PM, Nick Wellnhofer <we...@aevum.de> wrote:
>> Besides, I already introduced platform-dependent code in cfish_parcel.h. But
>> we can put the platform-dependent parts in a separate header file like
>> cfish_platform.h.
> 
> OK, that works.

This is now implemented in branch 'charmonizer-decoupling'.

> A while back I made some changes to Charmonizer in anticipation
> of supporting custom prefixes (swapping out ConfWriter_append_conf in most of
> the probes for functions like ConfWriter_add_def).

Yes, custom Charmonizer prefixes would work, too.

> The issue is very similar to the issue of installing a config.h file generated
> by Autoconf.
> 
>    http://inaugust.com/post/68
>    https://issues.apache.org/jira/browse/THRIFT-705
>    http://www.gnu.org/software/autoconf-archive/ax_prefix_config_h.html

Exactly.

> For instance, we can simply remove endianness from all public headers.
> Endianness isn't needed by Clownfish, only Lucy -- in fact, the entire
> Clownfish::Util::NumberUtils module shouldn't really be part of Clownfish, as
> it's barely used outside of Lucy.  To solve the problem, we can move the
> endian macros into an optional Charmonizer probe, so that they are defined
> only in a Lucy-specific charmony.h file which does not get installed.

The endian helpers might be useful for other modules. I'd keep them in Clownfish.

Nick



Re: [lucy-dev] Decoupling from Charmonizer

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Jul 21, 2013 at 12:32 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> Besides, I already introduced platform-dependent code in cfish_parcel.h. But
> we can put the platform-dependent parts in a separate header file like
> cfish_platform.h.

OK, that works.

> This would essentially be a minimal version of charmony.h
> which uses the CFISH namespace.

Sounds good.  A while back I made some changes to Charmonizer in anticipation
of supporting custom prefixes (swapping out ConfWriter_append_conf in most of
the probes for functions like ConfWriter_add_def).

> One of the problems I see with installing
> charmony.h is that it would clash with other user-generated versions.

The issue is very similar to the issue of installing a config.h file generated
by Autoconf.

    http://inaugust.com/post/68
    https://issues.apache.org/jira/browse/THRIFT-705
    http://www.gnu.org/software/autoconf-archive/ax_prefix_config_h.html

I say "similar" rather than "identical" because in some ways Charmonizer is
still attached to Clownfish the way Metaconfig is to Perl.  If Metaconfig had
clients other than Perl, it would need prefixes, too.

Perl gets away with installing a config.h file into
"$Config{installarchlib}/CORE", and Clownfish can get away with it too.  But
it complicates things and if Clownfish's headers can become completely
platform agnostic, that would be great.

> And I really don't want to force our users to always build charmony.h
> themselves.

Yes, absolutely.

Let's bear in mind that even if Charmonizer was always intended to be a
distinct layer -- and potentially an independent product, if justified by
user/developer interest -- it has evolved as support for a single distro.
This kind of weirdness is to be expected.  I'm glad we're cleaning things up!

> Then we could add the option to provide a special cfish_platform.h for a
> different platform to enable cross-compiling, but I don't think this should
> be a concern for now.
>
> Other platform-specific things we currently use in the generated code:
>
>     * (Emulated) C99 bool type
>     * (Emulated) C99 integer types
>     * alloca
>     * Variadic macros
>     * Endianness
>     * __func__ macro
>     * Symbol visibility
>     * u64 to double conversion for MSVC6
>
> I'm not sure we can derive all that from macros set by the compiler.

There may be creative solutions for those that prove difficult.

For instance, we can simply remove endianness from all public headers.
Endianness isn't needed by Clownfish, only Lucy -- in fact, the entire
Clownfish::Util::NumberUtils module shouldn't really be part of Clownfish, as
it's barely used outside of Lucy.  To solve the problem, we can move the
endian macros into an optional Charmonizer probe, so that they are defined
only in a Lucy-specific charmony.h file which does not get installed.

Marvin Humphrey

Re: [lucy-dev] Decoupling from Charmonizer

Posted by Nick Wellnhofer <we...@aevum.de>.
On Jul 21, 2013, at 20:40 , Marvin Humphrey <ma...@rectangular.com> wrote:

> This approach is problematic, IMO:
> 
>    https://git-wip-us.apache.org/repos/asf?p=lucy.git;a=commitdiff;h=c2ff0a87846f54d32c5928a4775470a167b74bf5
> 
> After that commit, the content of headers autogenerated by CFC depends on the
> system where CFC was built, not the platform where the extension is to be
> used.  Techniques such as cross-compiling won't work.  As far as I know, the
> output autogenerated by CFC right now is deterministic based solely on the
> .cfh input files -- I think we should preserve this invariant.

OK, I can change that, although the CFC output should never change on the same platform.

Besides, I already introduced platform-dependent code in cfish_parcel.h. But we can put the platform-dependent parts in a separate header file like cfish_platform.h. This would essentially be a minimal version of charmony.h which uses the CFISH namespace. One of the problems I see with installing charmony.h is that it would clash with other user-generated versions. And I really don't want to force our users to always build charmony.h themselves.

Then we could add the option to provide a special cfish_platform.h for a different platform to enable cross-compiling, but I don't think this should be a concern for now.

> We need atomics for LockFreeRegister, but we can probably do away with
> Clownfish::Util::Atomic and use them only within LockFreeRegister.c.

Other platform-specific things we currently use in the generated code:

    * (Emulated) C99 bool type
    * (Emulated) C99 integer types
    * alloca
    * Variadic macros
    * Endianness
    * __func__ macro
    * Symbol visibility
    * u64 to double conversion for MSVC6

I'm not sure we can derive all that from macros set by the compiler.

> It would also be nice if for MSVC we didn't require compilation in C++ mode.
> We might choose to require it for both the Clownfish runtime and Lucy
> (because we want line comments and mixed declarations and code in our
> implementation files), but we shouldn't impose that requirement on extensions
> if we can help it.

I don't see a problem with that. The generated code should already be C89-compliant. We probably missed a few things but that should be easy to fix.

Nick


Re: [lucy-dev] Decoupling from Charmonizer

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Jul 20, 2013 at 4:25 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> I think we should make it possible to create Clownfish-based projects which
> don't depend on Charmonizer.

If we can pull it off so that we don't need to install `charmony.h`, that
would be fantastic!

It's going to be challenging to do without the information captured by probes
into "charmony.h" for the headers generated by CFC, though:

*   How are "inline" functions specified on the target platform?
*   What's the size of an int?
*   Is "stdint.h" available?  How about "stdbool.h"?
*   What header specifies `alloca`?
*   Atomics?
*   ...

This approach is problematic, IMO:

    https://git-wip-us.apache.org/repos/asf?p=lucy.git;a=commitdiff;h=c2ff0a87846f54d32c5928a4775470a167b74bf5

After that commit, the content of headers autogenerated by CFC depends on the
system where CFC was built, not the platform where the extension is to be
used.  Techniques such as cross-compiling won't work.  As far as I know, the
output autogenerated by CFC right now is deterministic based solely on the
.cfh input files -- I think we should preserve this invariant.

To achieve charmony-free autogenerated header files, I think the first thing
we would need to do is depend more heavily on C99 and build in a few special
cases for the one non-C99 compiler we support, MSVC.

    #if __STDC_VERSION__ >= 199901L
      #define CFISH_INLINE static inline
    #elif defined(_MSC_VER)
      #define CFISH_INLINE __inline
    #else
      #error "C99 or MSVC required"
    #endif

    #if __STDC_VERSION__ >= 199901L
      #include <stdint.h>
      #include <stdbool.h>
    #elif defined(_MSC_VER)
      // tricky compiler-version-specific code to define int32_t etc...
    #else
      #error "C99 or MSVC required"
    #endif

We need atomics for LockFreeRegister, but we can probably do away with
Clownfish::Util::Atomic and use them only within LockFreeRegister.c.

It would also be nice if for MSVC we didn't require compilation in C++ mode.
We might choose to require it for both the Clownfish runtime and Lucy
(because we want line comments and mixed declarations and code in our
implementation files), but we shouldn't impose that requirement on extensions
if we can help it.

Marvin Humphrey