You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2012/12/17 02:48:21 UTC

[lucy-dev] Compiler flags

Greets,

Nick Wellnhofer pointed out off-list that a recent commit of mine made it
impossible to pass compiler flags on the command line via Module::Build's
`--extra_compiler_flags` param.

    http://s.apache.org/TIj

That's an undesirable side effect, but let me explain the rationale behind the
recent set of changes.

The goal of the chaz_compiler_flags branch is to consolidate as much as
possible of the compiler flag probing code into charmonizer so that it can be
shared across all host bindings -- decreasing the amount of build code which
must be written for each host.  When I set out, I had hoped to separate the
compilation of our C code into two silos:

Host-agnostic files:

*   All .c files under core/
*   autogen/source/parcel.c

Host-specific files (i.e. those which pound-include host-specific headers like
"XSUB.h"):

*   All .c files under perl/xs/
*   perl/lib/Lucy.xs (autogenerated, contains all source fragments from
    perl/buildlib/Lucy/Build/Binding/*.pm)
*   autogen/source/callbacks.c
*   autogen/source/lucy_boot.c

Ideally, the host-agnostic files would be compiled using flags specified using
charmonizer that the host wouldn't know anything about, and the host-specific
files would all be compiled using the defaults for building host extensions --
thus simplifying all host build scripts.  Things didn't work out perfectly
because it turns out MSVC needs `-TP` and `-DHAS_BOOL` when building our
extensions, but for the most part I think the initiative was a success.

The next logical step is to consolidate the compilation commands for building
the host-agnostic files, but this is annoying because the most obvious way to
do it is via Makefile proliferation.  (; Extending Charmonizer to handle
compilation might actually be the most elegant approach -- we've already
abstracted the compiler and the shell to an extent -- but heads may explode if
I propose writing build scripts in C so I'll save that for later. ;)

In any case, I'm still planning to merge the chaz_compiler_flags branch into
master.  If the ability to pass extra_compiler_flags via the command line is
important, we can hack it back in; it just means a small setback with regards
to the the task of simplifying host build files.

Marvin Humphrey

Re: [lucy-dev] Compiler flags

Posted by Nick Wellnhofer <we...@aevum.de>.
On 21/12/2012 18:15, Marvin Humphrey wrote:
> Heh, generating Makefiles wasn't the approach I was thinking of. :)  I meant
> writing actual build scripts in C.

Ah, I see. That's of course possible, but I think this can't be 
implemented in plain C89 (the up_to_date part, for example). So we'd 
have to run charmonizer first to configure the build script code, making 
the whole thing a two-step process.

Another advantage of Makefiles is that they're more hackable if 
adjustments are needed. If someone wants to make changes to the build 
script, he has to alter the charmonizer C code and regenerate 
charmonizer.c. This requires a lot more insight into how the build 
system works than simply changing the generated Makefile.

GNU make also offers nice features like parallel builds which would be 
hard to recreate.

I'm not really against the build script idea, but I think it's a lot 
more effort than generating Makefiles.

>> * Windows is probably the first platform that will break because of overlong
>>    command lines (8K limit).
>
> At some point we'll probably need to start generating scripts like
> ExtUtils::CBuilder does to address that problem.

Under nmake, there's also a very simple solution using "inline files" 
(kind of like here-docs in a Makefile).

> If we want to generate Makefiles using Charmonizer, I suggest we do something
> similar faking up inheritance with structs.
>
>      struct chaz_PosixMakefile {
>          struct chaz_Makefile base;
>          /* ... */
>      };

Maybe we don't even need separate "classes" for different make 
implementations. Most of the differences depend on the compiler and 
shell type.

> I'd hope we could stay away from parsing a template file (a la "Makefile.in"),
> because having to support parsing would complicate things a lot.

Yes, that was my idea as well.

> I'd argue against generating Makefiles.
>
> Makefile syntax is obtuse on its own, but the real problem is that generating
> Makefiles essentially means compiling down to shell code.  We do pretty well
> compiling down to C with the Clownfish compiler, but C compilers exhibit a lot
> less variability than shells and the external programs that they reference.
> Our task is made harder by the fact that we're contemplating targeting at
> least two shell environments -- POSIX-compliant sh and cmd.exe -- which means
> rearranging arguments, dealing with different quoting and splitting rules,
> invoking completely different commands, etc.

But doesn't the build script approach also mean to execute shell 
commands in the end? I don't really see the difference except for things 
like deleting files (make clean).

Using Makefiles to build a C project also seems like the most natural 
approach to me. As you wrote somewhere else, users generally expect to 
run "./configure && make && make install".

Nick


Re: [lucy-dev] Compiler flags

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Dec 17, 2012 at 2:58 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> Hey, how did you find that comment on Github? It was mainly meant as note to
> myself ;)

I got a notification, which I'll forward to you offlist.  Somehow I must have
started following <https://github.com/apache/lucy>.  (Maybe when I forked that
repo?)

>> (; Extending Charmonizer to handle compilation might actually be the most
>> elegant approach -- we've already abstracted the compiler and the shell to
>> an extent -- but heads may explode if I propose writing build scripts in C
>> so I'll save that for later. ;)

> I've also been thinking in that direction when working on the C bindings. It
> sounds like a crazy idea, but that could be said about Charmonizer as a
> whole.  OTOH, Charmonizer does its job very well, and using C is the only
> approach that really works cross-platform.

I'm pretty happy with how Charmonizer has evolved.  "Use C to configure C"
has worked out well in practice.

> FWIW, here is the preliminary Makefile that I currently use in my work on
> the C bindings:
>
>     http://s.apache.org/fBI
>
> If you ignore the huge list of source files, it's actually quite simple.
> So it doesn't sound like too much a stretch to generate it using C.

Heh, generating Makefiles wasn't the approach I was thinking of. :)  I meant
writing actual build scripts in C.

Here's a snippet from one of our Perl build scripts:

    for my $c_file (@$c_files) {
        my $o_file   = $c_file;
        my $ccs_file = $c_file;
        $o_file   =~ s/\.c$/$Config{_o}/ or die "no match";
        $ccs_file =~ s/\.c$/.ccs/        or die "no match";
        push @objects, $o_file;
        next if $self->up_to_date( $c_file, $o_file );
        $self->add_to_cleanup($o_file);
        $self->add_to_cleanup($ccs_file);
        $cbuilder->compile(
            source               => $c_file,
            extra_compiler_flags => $cc_flags,
            include_dirs         => $self->include_dirs,
            object_file          => $o_file,
        );
    }

Here's that code ported to C:

    /* Set extra compiler flags and include dirs. */
    chaz_CC_add_extra_cflags(cflags);
    for (i = 0; i < num_include_dirs; i++) {
        chaz_CC_add_include_dir(include_dirs[i]);
    }

    /* Compile all C source files. */
    for (i = 0; i < num_c_files; i++) {
        const char *c_file   = c_files[i];
        const char *o_file   = chaz_Util_swap_ext(c_file, chaz_CC_obj_ext());
        objects->[i] = o_file;
        if (!chaz_Util_up_to_date(c_file, o_file)) {
            const char *ccs_file = chaz_Util_swap_ext(c_file, ".ccs");
            S_add_to_cleanup(o_file);
            S_add_to_cleanup(ccs_file);
            free(ccs_file);
            chaz_CC_compile_obj(c_file, o_file);
        }
    }

We'd need to write a few helper subroutines to make that code sample actually
work, but the hard part -- abstracting the task of compilation -- is already
done.  (To give us a little more flexibility when writing those helper subs,
the build script could be separate from the configure script and could
pound-include charmony.h.)

> * Path names can sometimes be problematic. Windows generally accepts the
> forward slash as directory separator, but there are always some cases where
> it doesn't work or needs a work-around.

That's my experience as well.

> * Windows is probably the first platform that will break because of overlong
>   command lines (8K limit).

At some point we'll probably need to start generating scripts like
ExtUtils::CBuilder does to address that problem.

> * External commands can be hard to emulate on Windows. Even something simple
> like "rm" can be much more complicated.

Indeed.  See the "clean" targets in the various Charmonizer Makefiles. :P

> I made an initial attempt to put the bulk of the Makefile in a
> platform-independent file which then is included by platform-dependent
> Makefiles. This approach seems to work, but some of the corner cases need
> rather inelegant solutions.

I seem to recall trying that approach for Charmonizer's Makefiles at some
point, though I don't remember whether it was an original idea or a mod on
some of Joe Schaefer's work.

The approach that seemed most appealing in the end was to model Makefiles
using OO; that's what's in devel/bin/gen_charmonizer_makefiles.pl now.

    Charmonizer::Build::Makefile          <-- base class
    Charmonizer::Build::Makefile::Posix
    Charmonizer::Build::Makefile::MSVC
    Charmonizer::Build::Makefile::MinGW

If we want to generate Makefiles using Charmonizer, I suggest we do something
similar faking up inheritance with structs.

    struct chaz_PosixMakefile {
        struct chaz_Makefile base;
        /* ... */
    };

I'd hope we could stay away from parsing a template file (a la "Makefile.in"),
because having to support parsing would complicate things a lot.  Instead, I'd
suggest embedding the entire content of the Makefile within the generator app
-- again like gen_charmonizer_makefiles.pl -- so that we get to piggyback on
the C compiler's parser.

The downside of the OO approach is that the layout of the content does not
really look like the final Makefile.

> So I'd like to look into a way to generate Makefiles programmatically using
> parameters provided by Charmonizer.

I'd argue against generating Makefiles.

Makefile syntax is obtuse on its own, but the real problem is that generating
Makefiles essentially means compiling down to shell code.  We do pretty well
compiling down to C with the Clownfish compiler, but C compilers exhibit a lot
less variability than shells and the external programs that they reference.
Our task is made harder by the fact that we're contemplating targeting at
least two shell environments -- POSIX-compliant sh and cmd.exe -- which means
rearranging arguments, dealing with different quoting and splitting rules,
invoking completely different commands, etc.

I agree with Martin Fowler's take on the superiority of Rake's design over
that of Make:

    http://martinfowler.com/articles/rake.html#DomainSpecificLanguageForBuilds

    All my three build languages share another characteristic - they are all
    examples of a Domain Specific Language (DSL). However they are different
    kinds of DSL. In the terminology I've used before:

        * make is an external DSL using a custom syntax
        * ant (and nant) is an external DSL using an XML based syntax
        * rake is an internal DSL using Ruby.

    The fact that rake is an internal DSL for a general purpose language is a
    very important difference between it and the other two. It essentially
    allows me to use the full power of ruby any time I need it, at the cost of
    having to do a few odd looking things to ensure the rake scripts are valid
    ruby. Since ruby is a unobtrusive language, there's not much in the way of
    syntactic oddities. Furthermore since ruby is a full blown language, I
    don't need to drop out of the DSL to do interesting things - which has
    been a regular frustration using make and ant. Indeed I've come to view
    that a build language is really ideally suited to an internal DSL because
    you do need that full language power just often enough to make it
    worthwhile - and you don't get many non-programmers writing build scripts.

Similar arguments hold for Module::Build over ExtUtils::MakeMaker:

    http://perldoc.perl.org/5.16.1/Module/Build.html#MOTIVATIONS

Unfortunately, Rake has not yet achieved 100% market penetration and Ruby is
too big to bundle with Lucy. :)  However, bundling the source code for a C
library which provides some functions to support common build tasks gives
us some of the same advantages: portability problems are scoped to
individual subroutines, and we get to rely on the uniformity of C syntax
rather than shell and its quirks.  As computers have gotten bigger and faster,
compiling a bundled build tool from source incurs proportionally less
overhead.  It works for Lemon; it can work for Charmonizer, too.

Nevertheless, I suspect that generating Makefiles is still a workable solution
for this portion of Lucy at least because as you note, what we're doing here
isn't all that complicated.  I suspect that with generated Makefiles the slope
gets steeper the more complex the task, and thus that the approach is
self-limiting (see ExtUtils::MakeMaker), but we probably won't hit the wall.

Marvin Humphrey

Re: [lucy-dev] Compiler flags

Posted by Nick Wellnhofer <we...@aevum.de>.
On 17/12/2012 02:48, Marvin Humphrey wrote:
> Nick Wellnhofer pointed out off-list that a recent commit of mine made it
> impossible to pass compiler flags on the command line via Module::Build's
> `--extra_compiler_flags` param.

Hey, how did you find that comment on Github? It was mainly meant as 
note to myself ;)

> (; Extending Charmonizer to handle
> compilation might actually be the most elegant approach -- we've already
> abstracted the compiler and the shell to an extent -- but heads may explode if
> I propose writing build scripts in C so I'll save that for later. ;)

I've also been thinking in that direction when working on the C 
bindings. It sounds like a crazy idea, but that could be said about 
Charmonizer as a whole. OTOH, Charmonizer does its job very well, and 
using C is the only approach that really works cross-platform.

FWIW, here is the preliminary Makefile that I currently use in my work 
on the C bindings:

     http://s.apache.org/fBI

If you ignore the huge list of source files, it's actually quite simple.
So it doesn't sound like too much a stretch to generate it using C.

I also began to port this Makefile to Windows' nmake, and here are some 
observations:

* There is a small but usable subset of features that can be used with 
nmake as well as GNU make. So some parts of the Makefile can be shared 
verbatim.

* Rules for compiling and linking can be written in a cross-platform way 
using Makefile variables for all the separate parts of the commands.

* Path names can sometimes be problematic. Windows generally accepts the 
forward slash as directory separator, but there are always some cases 
where it doesn't work or needs a work-around.

* Windows is probably the first platform that will break because of 
overlong command lines (8K limit).

* External commands can be hard to emulate on Windows. Even something 
simple like "rm" can be much more complicated.

I made an initial attempt to put the bulk of the Makefile in a 
platform-independent file which then is included by platform-dependent 
Makefiles. This approach seems to work, but some of the corner cases 
need rather inelegant solutions. So I'd like to look into a way to 
generate Makefiles programmatically using parameters provided by 
Charmonizer.

> In any case, I'm still planning to merge the chaz_compiler_flags branch into
> master.  If the ability to pass extra_compiler_flags via the command line is
> important, we can hack it back in; it just means a small setback with regards
> to the the task of simplifying host build files.

Occasionally, I found it useful during development to pass compiler 
flags to Build.PL. It's also a nice feature for users with special 
compilation needs. But it shouldn't hold you back from merging the branch.

Nick