You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@buildr.apache.org by Ittay Dror <it...@gmail.com> on 2008/07/28 11:42:39 UTC

request for enhancement: compile, package and artifacts support for C++

Hi,

I'm working on adding C++ support to buildr. I already have a prototype 
that builds libraries and executables in Linux. I'd like to share some 
of the difficulties I had and request changes to buildr to accommodate 
C++ more easily. (Right now, I've created parallel route to that of 
building Java-like code)

compile
========
overview
--------------------
the compile method in project returns a CompileTask that is generic and 
uses a Compiler instance to do the actual compilation. In C++, 
compilation is also dependency based (.o => .cpp, sometimes precompiling 
headers). Also, the same code can produce several results (static and 
shared libraries, oj files with debug, profiling, preprocessor defines 
turned on and off). [1]

there is the 'build' task, which is used as a stub to attach 
dependencies to.

suggestion
---------------------
* there should be an array of compile tasks (as in packages)
* #compile should delegate the call to a factory method which returns a 
task (again, as in packages)
* generic pre-requisites (like 'resources') should either be tacked on 
'build' (relying on order of prerequisites), or the compile task can be 
defined to be a composite (that is, from the outside it is a single 
task, but it can use other tasks to accomplish its job).

package & artifacts
=========
overview
---------------
buildr has a cool concept that all dependencies (in 'compile.with') are 
converted to tasks that are then simple rake dependencies. However, the 
conversion is not generic enough. to compile C++ code against a 
dependency one needs 2 paths: a folder containing headers and another 
containing libraries. To put this in a repository, these need to be 
packaged into one file. To use after pulling from the repository, one 
needs to unpack. So a task representing a repository artifact is in fact 
an unzip task, that depends on the 'Artifact' task to pull the package 
from a remote repository.

furthermore, when building against another project, there is no need to 
pack and unpack in the repository. one can simply use the artifacts 
produced in the 'build' phase of the other project.

finally, in C++ in many cases you rely on a system library.

in all cases the resulting dependency is two-fold: on a include dir 
paths and on a library paths. note that these do not necessarily reside 
under a shared folder. for example, a dependency on another project may 
depend on two include folders: one just a folder in the source tree, the 
other of generated files in the target directory

suggestion
-------------------
While usage of Buildr.artifacts is only as a utility method, so one can 
easily write his own implementation and use that, I think it will be 
nice to be able to get some reuse.

* when given a project, use it as is (not 'spec.packages'), or allow it 
to return its artifacts ('spec.artifacts').
* if a symbol, recursively call on the spec from the namespace
* if a struct, recursively call
* otherwise, classify the artifact and call a factory method to create 
it. classification can be by packaging (e.g. jar). but actually, i don't 
have a very good idea here. note that for c++, there need to be a way of 
defining an artifact to look in the system for include files and 
libraries  (maybe something like 'openssl:system'? - version and group 
ids are meaningless).
  * the factory method can create different artifacts. for c++ there 
would be RepositoryArtifact (downloads and unpacks), ProjectArtifact 
(short circuit to the project's target and source directories) and 
SystemArtifact.

I think that the use of artifact namespaces can help here as it allows 
to create a more verbose syntax for declaring artifacts, while still 
allowing the user to create shorter names for them. (as an example in 
C++ it will allow me to add to the artifact the list of flags to use 
when compiling/linking with it, assuming they're not inherent to the 
artifact, e.g. turn debug on). The factory method receives the artifact 
definition (which can actually be defined by each plugin) and decides 
what to do with it.

I hope all this makes sense, and I'm looking forward to comments. I 
intend to share the code once I'm finished.

Thank you,
Ittay


Notes:
[1] I don't consider linking a library as packaging. First, the obj 
files are not used by themselves as in other languages. Second, 
packaging is required to manage dependencies, because in order for 
project P to be built against dependency D, D needs to contain both 
headers and libraries - this is the package.

-- 
--
Ittay Dror <it...@gmail.com>



Re: request for enhancement: compile, package and artifacts support for C++

Posted by Ittay Dror <it...@gmail.com>.
Hi,

I want to continue this discussion, since I'm not at a point where I have
C++ working for me for 52 projects, and I want to make it more up to buildr
standards. 




Assaf Arkin wrote:
> 
> On Mon, Jul 28, 2008 at 2:42 AM, Ittay Dror <it...@gmail.com> wrote:
>> Hi,
>>
>> I'm working on adding C++ support to buildr. I already have a prototype
>> that
>> builds libraries and executables in Linux. I'd like to share some of the
>> difficulties I had and request changes to buildr to accommodate C++ more
>> easily. (Right now, I've created parallel route to that of building
>> Java-like code)
>>
>> compile
>> ========
>> overview
>> --------------------
>> the compile method in project returns a CompileTask that is generic and
>> uses
>> a Compiler instance to do the actual compilation. In C++, compilation is
>> also dependency based (.o => .cpp, sometimes precompiling headers). Also,
>> the same code can produce several results (static and shared libraries,
>> oj
>> files with debug, profiling, preprocessor defines turned on and off). [1]
>>
>> there is the 'build' task, which is used as a stub to attach dependencies
>> to.
>>
>> suggestion
>> ---------------------
>> * there should be an array of compile tasks (as in packages)
>> * #compile should delegate the call to a factory method which returns a
>> task
>> (again, as in packages)
> 
> Yes.  And I know a few people just waiting for the change to compile
> multiple things in the same project, so here's another reason for
> adding this feature.
> 
> But I have to warn you, it's not as simple as it looks, I took a stab
> at it before and deciding to downscale support to one compiler per
> project.  It's worth doing because a lot of languages would benefit
> from it, but that's also what makes it tricky.  I think it would be
> easier to get C support working without it first, and separately work
> on this feature and then improve C support using it.
> 
> 
I'm adding C++ support for a client project, introducing buildr in the
process. Since it replaces an existing build system, I must provide the same
functionality.

I was able quite easily to have several C++ artifacts from a single build.
However, I could not use the 'compile' API since it was not suitable for
C++, both for lacking methods (defining external includes for example) and 
because it can't produce separate artifacts (which is a must as stated
above). another issue is that I need to consider what platform the build is
running on which is not supported by how compile task considers what
compilers to use.

What I did was:
* add a 'make' method that accepts a list of classifiers.
* for each classifier,  it finds a factory method that returns a task
creating the appropriate artifact which it calls if the task does not
already exist (similar to how packages are managed)
* the return value is a delegator object that forwards all method
invocations to the tasks.

So the project can now have:
  make(:shared, :static).with(....) 
which will create two tasks and configure both with the same dependencies.
The buildfile can then also have
  make(:shared).with(...)
which will add a dependency to the shared task only.

Module dependency (either depending on another project or a third party
artifact) is done by writing my own 'artifacts' method which does the right
thing for C++.

What I'm mostly not happy with is that I had to write parallel
implementations to 'compile' and 'artifacts'. For the first, it is because
C++ requires different API, for the latter it is because dependency is done
differently. 

Ittay


Assaf Arkin wrote:
> 
> 
> 
>> * generic pre-requisites (like 'resources') should either be tacked on
>> 'build' (relying on order of prerequisites), or the compile task can be
>> defined to be a composite (that is, from the outside it is a single task,
>> but it can use other tasks to accomplish its job).
> 
> compile already is: resources is a prerequisite for compile, some
> other tasks (e.g. byte code enhancing) are tacked on to compile by
> enhancing it.
> 
> 
>> package & artifacts
>> =========
>> overview
>> ---------------
>> buildr has a cool concept that all dependencies (in 'compile.with') are
>> converted to tasks that are then simple rake dependencies. However, the
>> conversion is not generic enough. to compile C++ code against a
>> dependency
>> one needs 2 paths: a folder containing headers and another containing
>> libraries. To put this in a repository, these need to be packaged into
>> one
>> file. To use after pulling from the repository, one needs to unpack. So a
>> task representing a repository artifact is in fact an unzip task, that
>> depends on the 'Artifact' task to pull the package from a remote
>> repository.
> 
> Let's take Java for example, let's say we have a task that depends on
> the contents of another WAR.  Specifically the classes (in
> WEB-INF/classes) and libraries (WEB-INF/lib).  A generic unzipping
> artifact won't help much, you'll get the root path which is useless.
> You need the classes path for one, and each file in the lib (pointing
> to the directory itself does nothing interesting).  It won't work with
> EAR either, when you unzip those, you end up with a WAR which you need
> to unzip again.
> 
> But this hypothetical task that uses WAR could be smarter.  It
> understands the semantics of the packages it uses, and all these
> packages follow a common convention, so it only needs to unpack the
> portions of the WAR it cares about, it knows how to construct the
> relevant paths, one to class and one to every JAR inside the lib
> directory.
> 
> I think the same analogy applies to C packages.  If by convention you
> always use include and lib, you can unpack only the portion of the
> package you need, find the relevant paths and use them appropriately.
> 
> 
>> furthermore, when building against another project, there is no need to
>> pack
>> and unpack in the repository. one can simply use the artifacts produced
>> in
>> the 'build' phase of the other project.
> 
> Yes.  Right now it points to the package, which gets invoked and so
> packs everything, whether you need the packing or not.  You don't,
> however, have to unpack it, if you know the packaging type you can be
> smarter and go directly to the source.
> 
>>
>> finally, in C++ in many cases you rely on a system library.
>>
>> in all cases the resulting dependency is two-fold: on a include dir paths
>> and on a library paths. note that these do not necessarily reside under a
>> shared folder. for example, a dependency on another project may depend on
>> two include folders: one just a folder in the source tree, the other of
>> generated files in the target directory
>>
>> suggestion
>> -------------------
>> While usage of Buildr.artifacts is only as a utility method, so one can
>> easily write his own implementation and use that, I think it will be nice
>> to
>> be able to get some reuse.
>>
>> * when given a project, use it as is (not 'spec.packages'), or allow it
>> to
>> return its artifacts ('spec.artifacts').
> 
> Yes.  Except we're missing that whole dependency later (that's
> something 1.4 will add).  Ideally the project would have dependency
> lists it can populates (at least compile and runtime), and other
> projects can get these dependency lists and pick what they want.  So
> the compile dependency list would be the place to put headers and
> libraries, without having to package them.  We don't have that right
> now.
> 
> 
>> * if a symbol, recursively call on the spec from the namespace
>> * if a struct, recursively call
>> * otherwise, classify the artifact and call a factory method to create
>> it.
>> classification can be by packaging (e.g. jar). but actually, i don't have
>> a
>> very good idea here. note that for c++, there need to be a way of
>> defining
>> an artifact to look in the system for include files and libraries  (maybe
>> something like 'openssl:system'? - version and group ids are
>> meaningless).
>>  * the factory method can create different artifacts. for c++ there would
>> be
>> RepositoryArtifact (downloads and unpacks), ProjectArtifact (short
>> circuit
>> to the project's target and source directories) and SystemArtifact.
>>
>> I think that the use of artifact namespaces can help here as it allows to
>> create a more verbose syntax for declaring artifacts, while still
>> allowing
>> the user to create shorter names for them. (as an example in C++ it will
>> allow me to add to the artifact the list of flags to use when
>> compiling/linking with it, assuming they're not inherent to the artifact,
>> e.g. turn debug on). The factory method receives the artifact definition
>> (which can actually be defined by each plugin) and decides what to do
>> with
>> it.
> 
> 1.4 will have a better dependency mechanism, and one thing I looked at
> is associating meta-data with each dependency.  So perhaps that would
> address things like compiling/linking flags.
> 
>> I hope all this makes sense, and I'm looking forward to comments. I
>> intend
>> to share the code once I'm finished.
> 
> Unfortunately, the last time I wrote C code was over tens years ago,
> so my rustiness is showing.  I'm sure I missed some points because of
> that.
> 
> Assaf
> 
> 
>>
>> Thank you,
>> Ittay
>>
>>
>> Notes:
>> [1] I don't consider linking a library as packaging. First, the obj files
>> are not used by themselves as in other languages. Second, packaging is
>> required to manage dependencies, because in order for project P to be
>> built
>> against dependency D, D needs to contain both headers and libraries -
>> this
>> is the package.
>>
>> --
>> --
>> Ittay Dror <it...@gmail.com>
>>
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/request-for-enhancement%3A-compile%2C-package-and-artifacts-support-for-C%2B%2B-tp18687046p19250068.html
Sent from the Buildr - Dev mailing list archive at Nabble.com.


Re: request for enhancement: compile, package and artifacts support for C++

Posted by Assaf Arkin <ar...@intalio.com>.
On Wed, Jul 30, 2008 at 12:18 AM, Ittay Dror <it...@gmail.com> wrote:
> Thank you for your reply and patience.
>
> I now understand what you meant, and you are quite right, it can be done
> this way.
>
> However, my aim was to create the task prerequisites tree before rake
> invokes the first task.
>
> First, it will make '-P' show the tree (according to your suggestion, -P
> won't show that 'compile' depends on 'libsomething.so' and 'libsometing.a',
> right). Secondly, having a complete tree of all tasks and prerequisites
>  allows to analyze it

It can build the tree during the definition or in after_define, as
long as it's only pointing to things it know exist.  You probably want
to delay most of that work into after_define, so the definition can
add/change stuff incrementally.  For example, you can never know when
javah will be used during the definition to add another includes
directory, but you can specify that it's a definition method, so by
the time you get to after_define, if javah is used on this project it
would have already.

Assaf

>
> Both these reasons are non-functional of course.
>
> Ittay
>
> Assaf Arkin wrote:
>>
>> On Tue, Jul 29, 2008 at 12:59 PM, Ittay Dror <it...@gmail.com> wrote:
>>
>>>
>>> can you give an example of how a task can orchestrate other tasks? also,
>>> as
>>> far as i could tell, the 'compile' method always create a CompileTask. i
>>> can't use it as is because it expects some compiler which i can't give it
>>> because i want to use tasks and also, i can't add dependencies to it
>>> because
>>> it depends directly on tasks like 'resources' which the prerequisites
>>> should
>>> depend on.
>>>
>>
>> If you look at the end of compile.rb you'll notice one of the things
>> it does is call  project.recursive_task('compile') which causes one
>> project's compile task to execute all its child projects's compile
>> tasks.  Likewise, if you look at test.rb at the very end, you'll
>> notice that it's tacking the test task to the very end of the build
>> task (always test after build).
>>
>> Another example is the XMLBeans task (in addon) which needs to
>> generate source code, that is added as prerequisite to compile, and
>> also copy files over to the target directory, which is done by the
>> compile task at the very end.
>>
>> From the compiler you can do whatever you need to, including invoking
>> as many tasks as necessary (let Rake worry whether to execute them or
>> not).  And like XMLBeans does, you can add additional prerequisites
>> when necessary, and make additional work happen after compilation.
>>
>>
>>>
>>> At the risk of spending a lot of time on the obvious (i have a feeling
>>> we're
>>> talking about different things):
>>>
>>> say a project has 2 cpp files A.cpp and B.cpp, with matching headers, and
>>> no
>>> other headers, which compile to shared and static libraries. my
>>> dependency
>>> tree is:
>>>
>>> compile:cpp ----- libsomething.so --- A.o --- A.cpp
>>>                   \                                \    /        \ A.h
>>>                    \                                 X
>>>           \                              /    \ B.o --- B.cpp
>>>                      \- libsomething.a/-----/          \ B.h
>>>
>>>
>>> these should be rake tasks for two reasons: timestamp checking and the
>>> fact
>>> that two artifacts rely on the same set of objects. also linking and
>>> compiling are two different commands and finally, if i call the compiler
>>> twice, it will do the work twice (that is, it doesn't have any internal
>>> mechanism that tells it there's no need to recreate the obj files or
>>> libraries).
>>>
>>
>> Yes.  If all these are separate tasks wired together, than Rake will
>> only compile what is necessary.  So let's say you have two tasks, just
>> to simplify (they have other prerequisite tasks), one for
>> libsomething.so and one for libsomething.a.  You have a compile task
>> that invokes these two tasks.  Rake only executes what is necessary by
>> checking dependencies on the object files, which in turn check
>> dependencies on the cpp and header files, etc.
>>
>> So now you have one forest of dependencies in the project, all of
>> which are executed as necessary by the project's compile task.  And
>> one forest of projects, all of which are also executed as necessary by
>> the project's compile task.
>>
>> Your compiler object now has two uses:
>> a) It makes sure all these tasks exist and get invoked.  There's no
>> need for it to run a single instance compiler on all the files.  We do
>> that for Javac because it's Javac, but the compile method can do
>> whatever it deems necessary.
>> b) You get an easy way to control compiler options across all of
>> these, and inherit them from parent projects.  So you could, say, pick
>> the target architecture in the top-level project, have all the
>> compilers inherit from it.
>> c) Your compiler can run all these tasks in parallel.
>>
>> And since libsomething.so is also a task, if you want you can control
>> some of these options directly on that task.
>>
>>
>>>
>>> note that all of this tree needs to rely on the 'resources' task, since
>>> some
>>> headers may be generated. so 'resources' need to run before all the
>>> timestamp checking and compilation is done.
>>>
>>
>> The resources task is specifically for copying files to the target
>> directory that are not handled by the compiler, like images, I18N
>> resources, configuration files, etc.  It's not for generating code
>> used during compilation.
>>
>>
>>>>>
>>>>> of course the factory method can create just one task that does all the
>>>>> rest
>>>>> in its action (compile obj files and link), but i do want to use tasks
>>>>> for
>>>>> the following reasons:
>>>>> 1. it makes the logic more like make, which will assist acceptance
>>>>> 2. it can use mechanisms in unix compilers to help make. specifically,
>>>>> most
>>>>> (if not all) unix compilers have an option to spit out dependencies of
>>>>> the
>>>>> source files on headers.
>>>>> 3. it reuses timestamp checking code in rake (and if ever rake
>>>>> implements
>>>>> checksum based recompilation)
>>>>> 4. if rake will implement a job execution engine (like -j in make),
>>>>> then
>>>>> structuring compilation by tasks will allow it to parallelize the
>>>>> execution.
>>>>>
>>>>> but, i think the solution is easy: similar to the 'build' "pseudo
>>>>> task",
>>>>> i
>>>>> can create a 'compile:prepare' pseudo task that depends on 'resources'
>>>>> etc.
>>>>> then, the factory method needs only to depend on 'compile:prepare' (the
>>>>> logic is that another extension can then add other things to do before
>>>>> compile without needing to change the compile extensions)
>>>>>
>>>>>
>>>>
>>>> We had compile:prepare in the past which invokes resources and ...
>>>> well, that's about it.  It turns out that just having compile and
>>>> doing everything else as prerequisite is good enough.
>>>>
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> package & artifacts
>>>>>>> =========
>>>>>>> overview
>>>>>>> ---------------
>>>>>>> buildr has a cool concept that all dependencies (in 'compile.with')
>>>>>>> are
>>>>>>> converted to tasks that are then simple rake dependencies. However,
>>>>>>> the
>>>>>>> conversion is not generic enough. to compile C++ code against a
>>>>>>> dependency
>>>>>>> one needs 2 paths: a folder containing headers and another containing
>>>>>>> libraries. To put this in a repository, these need to be packaged
>>>>>>> into
>>>>>>> one
>>>>>>> file. To use after pulling from the repository, one needs to unpack.
>>>>>>> So
>>>>>>> a
>>>>>>> task representing a repository artifact is in fact an unzip task,
>>>>>>> that
>>>>>>> depends on the 'Artifact' task to pull the package from a remote
>>>>>>> repository.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Let's take Java for example, let's say we have a task that depends on
>>>>>> the contents of another WAR.  Specifically the classes (in
>>>>>> WEB-INF/classes) and libraries (WEB-INF/lib).  A generic unzipping
>>>>>> artifact won't help much, you'll get the root path which is useless.
>>>>>> You need the classes path for one, and each file in the lib (pointing
>>>>>> to the directory itself does nothing interesting).  It won't work with
>>>>>> EAR either, when you unzip those, you end up with a WAR which you need
>>>>>> to unzip again.
>>>>>>
>>>>>> But this hypothetical task that uses WAR could be smarter.  It
>>>>>> understands the semantics of the packages it uses, and all these
>>>>>> packages follow a common convention, so it only needs to unpack the
>>>>>> portions of the WAR it cares about, it knows how to construct the
>>>>>> relevant paths, one to class and one to every JAR inside the lib
>>>>>> directory.
>>>>>>
>>>>>> I think the same analogy applies to C packages.  If by convention you
>>>>>> always use include and lib, you can unpack only the portion of the
>>>>>> package you need, find the relevant paths and use them appropriately.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> (note: not sure i'm following you here. )
>>>>>
>>>>>
>>>>
>>>> Artifacts by themselves are a generic mechanism for getting packages
>>>> into the local repository.  Their only responsibility if the artifact
>>>> and its metadata, so a task representing a repository artifact would
>>>> only know how to download it.
>>>>
>>>> You can have a separate task that knows how to extract an artifact
>>>> task and use it instead, that way you get the unpacking you need, but
>>>> not all downloaded artifacts have to be unpacked.
>>>>
>>>>
>>>
>>> yes, this is what i'm currently doing, as i explained below.
>>>
>>> but what i want is for me to be able to do that by integrating with the
>>> existing 'artifacts' task. right now it will only return Artifact
>>> objects.
>>> I'd like to have a more elegant solution than just to run over them and
>>> create my own objects, which i think will be more tricky with transitive
>>> dependencies (where transitivity may come from my artifacts, e.g. the
>>> project's artifacts)
>>>
>>>>
>>>>
>>>>>
>>>>> my current implementation creates classes that have methods to retrieve
>>>>> the
>>>>> include paths, the library paths and the library names. I don't use the
>>>>> task
>>>>> name, since it is useless (as you mentioned). so I have an
>>>>> ExtractedRepoArtifact FileTask class that implements these methods by
>>>>> relying on the structure of the package ('include' and 'lib'
>>>>> directories),
>>>>> it depends on the Artifact class and its action is to extract the
>>>>> artifact.
>>>>>
>>>>> When given a project dependency, i return the build task which
>>>>> implements
>>>>> the artifact methods mentioned above by returning the
>>>>> [:source,:main,:include] and [:target, Platform.id, :lib] paths. It
>>>>> also
>>>>> allows the user to add include paths (e.g., for generated files) which
>>>>> are
>>>>> then both used for compilation and returned by the artifact methods.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> furthermore, when building against another project, there is no need
>>>>>>> to
>>>>>>> pack
>>>>>>> and unpack in the repository. one can simply use the artifacts
>>>>>>> produced
>>>>>>> in
>>>>>>> the 'build' phase of the other project.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Yes.  Right now it points to the package, which gets invoked and so
>>>>>> packs everything, whether you need the packing or not.  You don't,
>>>>>> however, have to unpack it, if you know the packaging type you can be
>>>>>> smarter and go directly to the source.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> but i don't want to pack if there's no use for it. speed is critical in
>>>>> this
>>>>> project, since there's no eclipse to constantly compile code for you,
>>>>> so
>>>>> developers need to run the build after each change. having it pack
>>>>> unnecessarily wasts time.
>>>>>
>>>>>
>>>>
>>>> One step at a time.  I would worry if we can't do that at all, but if
>>>> it's just optimization, we can get to the more problematic issues
>>>> first.
>>>>
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> finally, in C++ in many cases you rely on a system library.
>>>>>>>
>>>>>>> in all cases the resulting dependency is two-fold: on a include dir
>>>>>>> paths
>>>>>>> and on a library paths. note that these do not necessarily reside
>>>>>>> under
>>>>>>> a
>>>>>>> shared folder. for example, a dependency on another project may
>>>>>>> depend
>>>>>>> on
>>>>>>> two include folders: one just a folder in the source tree, the other
>>>>>>> of
>>>>>>> generated files in the target directory
>>>>>>>
>>>>>>> suggestion
>>>>>>> -------------------
>>>>>>> While usage of Buildr.artifacts is only as a utility method, so one
>>>>>>> can
>>>>>>> easily write his own implementation and use that, I think it will be
>>>>>>> nice
>>>>>>> to
>>>>>>> be able to get some reuse.
>>>>>>>
>>>>>>> * when given a project, use it as is (not 'spec.packages'), or allow
>>>>>>> it
>>>>>>> to
>>>>>>> return its artifacts ('spec.artifacts').
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Yes.  Except we're missing that whole dependency later (that's
>>>>>> something 1.4 will add).  Ideally the project would have dependency
>>>>>> lists it can populates (at least compile and runtime), and other
>>>>>> projects can get these dependency lists and pick what they want.  So
>>>>>> the compile dependency list would be the place to put headers and
>>>>>> libraries, without having to package them.  We don't have that right
>>>>>> now.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> this is the purpose for the 'spec.artifacts' suggestion (that is, an
>>>>> 'artifacts' method in Project). maybe need to classify them similarly
>>>>> to
>>>>> my
>>>>> suggestion for 'compile', so the Buildr.artifacts method receives a
>>>>> 'classifier' argument, whose value can be, for example,  'java' and
>>>>> calls
>>>>> 'spec.artifacts(classifier)'. are we on the same page here?
>>>>>
>>>>>
>>>>
>>>> I'm looking at each of your use cases and trying to identify in my mind:
>>>> a)  What you can do right now to make it happen.
>>>> b)  What, if we added another feature, we should accommodate for.
>>>> c)  What new feature we would need for this.
>>>>
>>>> I'm starting with a) because you can get it working right now, it may
>>>> not be elegant and not work as fast, but we can get that out of the
>>>> way so we can focus about doing the rest.  There are some things we're
>>>> planning on changing anyway, so I'm also trying to see if future
>>>> changes would address the elegant/fast use cases, I can tell you what
>>>> I have in mind, but no code yet to make it happen.  And then identify
>>>> anything not addressed by current plans and decide how to support that
>>>> directly.
>>>>
>>>>
>>>
>>> i got it working now. but i'm doing several code paths in parallel. i
>>> have a
>>> 'make' method instead of 'compile'. the reason are both because i need to
>>> create several tasks, not a 'compiler' object (and i want to create them
>>> before rake's execution starts) , and because i need to create different
>>> implementations per platform.
>>>
>>>>
>>>> Right now, project.packages is good enough for what you need.  It's an
>>>> array of tasks, you can throw any task you want in there and the
>>>> dependent project would pick on it.  You don't have to throw ZIP files
>>>> in there, you can add a header file or a directory of header files, or
>>>> a task that knows it's a directly of header files.
>>>>
>>>> It's inelegant because project.packages is intent to be the list of
>>>> things that get installed and released, so it's an "off the label" use
>>>> for that part of the API.  But, it will work, and if you just add
>>>> things to the end of project.packages, they won't get installed or
>>>> released.  So project.packages is that same as project.artifacts, just
>>>> with a different name.
>>>>
>>>>
>>>
>>> or i can implement my own 'artifacts' method, which is what i did because
>>> i
>>> need different artifact objects than what Buildr.artifacts returns.
>>>
>>>>
>>>> Separately, we need (and planning and working on) a smarter dependency
>>>> management, which you can populate and anything referencing the
>>>> project can access.  It won't be called artifacts but dependencies, it
>>>> will do a lot more, and it will be more elegant and documented for
>>>> specific use cases like this.
>>>>
>>>>
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> * if a symbol, recursively call on the spec from the namespace
>>>>>>> * if a struct, recursively call
>>>>>>> * otherwise, classify the artifact and call a factory method to
>>>>>>> create
>>>>>>> it.
>>>>>>> classification can be by packaging (e.g. jar). but actually, i don't
>>>>>>> have
>>>>>>> a
>>>>>>> very good idea here. note that for c++, there need to be a way of
>>>>>>> defining
>>>>>>> an artifact to look in the system for include files and libraries
>>>>>>>  (maybe
>>>>>>> something like 'openssl:system'? - version and group ids are
>>>>>>> meaningless).
>>>>>>>  * the factory method can create different artifacts. for c++ there
>>>>>>> would
>>>>>>> be
>>>>>>> RepositoryArtifact (downloads and unpacks), ProjectArtifact (short
>>>>>>> circuit
>>>>>>> to the project's target and source directories) and SystemArtifact.
>>>>>>>
>>>>>>> I think that the use of artifact namespaces can help here as it
>>>>>>> allows
>>>>>>> to
>>>>>>> create a more verbose syntax for declaring artifacts, while still
>>>>>>> allowing
>>>>>>> the user to create shorter names for them. (as an example in C++ it
>>>>>>> will
>>>>>>> allow me to add to the artifact the list of flags to use when
>>>>>>> compiling/linking with it, assuming they're not inherent to the
>>>>>>> artifact,
>>>>>>> e.g. turn debug on). The factory method receives the artifact
>>>>>>> definition
>>>>>>> (which can actually be defined by each plugin) and decides what to do
>>>>>>> with
>>>>>>> it.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> 1.4 will have a better dependency mechanism, and one thing I looked at
>>>>>> is associating meta-data with each dependency.  So perhaps that would
>>>>>> address things like compiling/linking flags.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> ordering
>>>>>>> =========
>>>>>>> overview
>>>>>>> -------------------
>>>>>>> to support jni, one needs to first compile java classes, then run
>>>>>>> javah
>>>>>>> to
>>>>>>> generate headers and then compile c code that implements these
>>>>>>> headers.
>>>>>>> so
>>>>>>> the javah task should be able to specify it depends on the java
>>>>>>> compile
>>>>>>> task. this can't be by depending on all compile tasks of course or on
>>>>>>> 'build'.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Alternatively:
>>>>>>
>>>>>> compile do |task|
>>>>>>  javah task.target
>>>>>> end
>>>>>>
>>>>>> This will run javah each time the compiler runs.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> but running each time is what i want to avoid. not only do i want to
>>>>> avoid
>>>>> the invocation of 'javah', but when invoked it will change the
>>>>> timestamp
>>>>> of
>>>>> the generated headers and so many source files will get recompiled.
>>>>>
>>>>>
>>>>
>>>> Rake separates invocation from execution.  Invoking a task tells it to
>>>> invoke its prerequisites, then use those to decide if it needs
>>>> executing, and if so execute.  Whether you put javah at the end of
>>>> compile, or a prerequisite to build, it will get invoked and it should
>>>> be smart enough to decide whether there's any work to be done.
>>>>
>>>>
>>>
>>> i think i'm missing something here. in the code snippet above, didn't you
>>> add an action to 'compile' and in that action call the javah command? to
>>> me
>>> it looks like at the end of compile javah is run.
>>>
>>>>
>>>> But there is a significant difference between the two.  If you add it
>>>> to compile, it gets invoked during compilation -- and compilation
>>>> implies there's a change to the source code which might lead to change
>>>> in the header files -- and that happens as often as is necessary.  If
>>>> you put is as prerequisite to build, it only happens when the build
>>>> task runs.  If you run rake task, which doesn't run the build task,
>>>> you may end up testing the wrong header files.
>>>>
>>>>
>>>
>>> there should be a rule to the effect of:
>>> jni_headers_dir => [classes] do |task|
>>>  javah classes # with whatefer flags to put generated headers in
>>> jni_headers_dir
>>>  touch jni_headers_dir
>>> end
>>>
>>> so if the classes are newer than the directory (and only then) javah
>>> runs.
>>> if i run it every time it will generate headers, changing the timestamp,
>>> which will cause all dependent cpp classes to recompile which will take a
>>> lot of time.
>>>
>>
>> Again, if you do:
>>
>> compile do
>>  file(jni_headers_dir).invoke
>> end
>>
>> It gives you the same effect, except it happens earlier in the process
>> (e.g. before test, not just before build).  You invoke the task, the
>> task looks at the prerequisites, decides if anything needs to be done,
>> and executes only when necessary.
>>
>> Assaf
>>
>>
>>>>
>>>>
>>>>>
>>>>> note that compiling a C/C++ source file is a much slower process than
>>>>> compiling java.
>>>>>
>>>>>
>>>>>>>
>>>>>>> suggestion
>>>>>>> -------------------
>>>>>>> when creating a compile task (whose name can be, as in the case of
>>>>>>> c++,
>>>>>>> the
>>>>>>> result library name - to allow for dependency checking), also create
>>>>>>> a
>>>>>>> "for
>>>>>>> ordering only" task with a symbolic name (e.g., 'java:compile') which
>>>>>>> depends on the actual task. then other tasks can depend on that task
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> And yes, you'll still need that if you want to run the C compiler
>>>>>> after the Java compiler, so I think the right thing to do would have
>>>>>> separate compile tasks.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I hope all this makes sense, and I'm looking forward to comments. I
>>>>>>> intend
>>>>>>> to share the code once I'm finished.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Unfortunately, the last time I wrote C code was over tens years ago,
>>>>>> so my rustiness is showing.  I'm sure I missed some points because of
>>>>>> that.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> I hope I cleared things. I think it is worth investing in C/C++ as it
>>>>> is
>>>>> a
>>>>> space where there's still no solutions (that i know of) that handle
>>>>> module
>>>>> dependency.
>>>>>
>>>>>
>>>>
>>>> Definitely.
>>>>
>>>>
>>>>
>>>>>
>>>>> To make sure it is clear, I'm not asking for the buildr team to
>>>>> implement
>>>>> C/C++ building, I intend to do that, and have already made a demo of it
>>>>> working, but I do want to ask for the infrastructure in buildr to make
>>>>> it
>>>>> easier, since currently it looks like a "stepson".
>>>>>
>>>>>
>>>>
>>>> In addition, two things we should look at.
>>>>
>>>> First, find out a good intersection between C/C++ and other languages.
>>>>  There may be some changes that are only necessary for C/C++, but
>>>> hopefully most of these can be shared across languages, that way we
>>>> get better features all around.
>>>>
>>>> Second, make sure we exhausted all our options before making a change.
>>>>  If there's another way of doing something, even stop-gap measure
>>>> while we cook up a better feature all around, then we have less
>>>> changes to worry about.
>>>>
>>>> It's an exercise we did before with Groovy and Scala (earlier versions
>>>> were married to Java) and it worked out pretty well.  We started by
>>>> not making any changes in Buildr to accommodate it, instead using a
>>>> separate task specifically for compiling Scala code that relied on
>>>> some hacks and inelegant code to actually work.  Then took the time to
>>>> build multi-lingual support out of that.
>>>>
>>>>
>>>
>>> i'm already past that. i have ~20 modules compiling, with transitive
>>> dependencies on other modules and on third party modules.
>>>
>>> so i'm now at a stage where i want better integration with buildr.
>>>
>>>>
>>>> Assaf
>>>>
>>>>
>>>>
>>>>>
>>>>> Ittay
>>>>>
>>>>>
>>>>>>
>>>>>> Assaf
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Ittay
>>>>>>>
>>>>>>>
>>>>>>> Notes:
>>>>>>> [1] I don't consider linking a library as packaging. First, the obj
>>>>>>> files
>>>>>>> are not used by themselves as in other languages. Second, packaging
>>>>>>> is
>>>>>>> required to manage dependencies, because in order for project P to be
>>>>>>> built
>>>>>>> against dependency D, D needs to contain both headers and libraries -
>>>>>>> this
>>>>>>> is the package.
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Ittay Dror <it...@gmail.com>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Ittay Dror <it...@gmail.com>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> --
>>> Ittay Dror <it...@gmail.com>
>>>
>>>
>>>
>
> --
> --
> Ittay Dror <it...@gmail.com>
>
>

Re: request for enhancement: compile, package and artifacts support for C++

Posted by Ittay Dror <it...@gmail.com>.
Thank you for your reply and patience.

I now understand what you meant, and you are quite right, it can be done 
this way.

However, my aim was to create the task prerequisites tree before rake 
invokes the first task.

First, it will make '-P' show the tree (according to your suggestion, -P 
won't show that 'compile' depends on 'libsomething.so' and 
'libsometing.a', right). Secondly, having a complete tree of all tasks 
and prerequisites  allows to analyze it

Both these reasons are non-functional of course.

Ittay

Assaf Arkin wrote:
> On Tue, Jul 29, 2008 at 12:59 PM, Ittay Dror <it...@gmail.com> wrote:
>   
>> can you give an example of how a task can orchestrate other tasks? also, as
>> far as i could tell, the 'compile' method always create a CompileTask. i
>> can't use it as is because it expects some compiler which i can't give it
>> because i want to use tasks and also, i can't add dependencies to it because
>> it depends directly on tasks like 'resources' which the prerequisites should
>> depend on.
>>     
>
> If you look at the end of compile.rb you'll notice one of the things
> it does is call  project.recursive_task('compile') which causes one
> project's compile task to execute all its child projects's compile
> tasks.  Likewise, if you look at test.rb at the very end, you'll
> notice that it's tacking the test task to the very end of the build
> task (always test after build).
>
> Another example is the XMLBeans task (in addon) which needs to
> generate source code, that is added as prerequisite to compile, and
> also copy files over to the target directory, which is done by the
> compile task at the very end.
>
> From the compiler you can do whatever you need to, including invoking
> as many tasks as necessary (let Rake worry whether to execute them or
> not).  And like XMLBeans does, you can add additional prerequisites
> when necessary, and make additional work happen after compilation.
>
>   
>> At the risk of spending a lot of time on the obvious (i have a feeling we're
>> talking about different things):
>>
>> say a project has 2 cpp files A.cpp and B.cpp, with matching headers, and no
>> other headers, which compile to shared and static libraries. my dependency
>> tree is:
>>
>> compile:cpp ----- libsomething.so --- A.o --- A.cpp
>>                    \                                \    /        \ A.h
>>                     \                                 X
>>            \                              /    \ B.o --- B.cpp
>>                       \- libsomething.a/-----/          \ B.h
>>
>>
>> these should be rake tasks for two reasons: timestamp checking and the fact
>> that two artifacts rely on the same set of objects. also linking and
>> compiling are two different commands and finally, if i call the compiler
>> twice, it will do the work twice (that is, it doesn't have any internal
>> mechanism that tells it there's no need to recreate the obj files or
>> libraries).
>>     
>
> Yes.  If all these are separate tasks wired together, than Rake will
> only compile what is necessary.  So let's say you have two tasks, just
> to simplify (they have other prerequisite tasks), one for
> libsomething.so and one for libsomething.a.  You have a compile task
> that invokes these two tasks.  Rake only executes what is necessary by
> checking dependencies on the object files, which in turn check
> dependencies on the cpp and header files, etc.
>
> So now you have one forest of dependencies in the project, all of
> which are executed as necessary by the project's compile task.  And
> one forest of projects, all of which are also executed as necessary by
> the project's compile task.
>
> Your compiler object now has two uses:
> a) It makes sure all these tasks exist and get invoked.  There's no
> need for it to run a single instance compiler on all the files.  We do
> that for Javac because it's Javac, but the compile method can do
> whatever it deems necessary.
> b) You get an easy way to control compiler options across all of
> these, and inherit them from parent projects.  So you could, say, pick
> the target architecture in the top-level project, have all the
> compilers inherit from it.
> c) Your compiler can run all these tasks in parallel.
>
> And since libsomething.so is also a task, if you want you can control
> some of these options directly on that task.
>
>   
>> note that all of this tree needs to rely on the 'resources' task, since some
>> headers may be generated. so 'resources' need to run before all the
>> timestamp checking and compilation is done.
>>     
>
> The resources task is specifically for copying files to the target
> directory that are not handled by the compiler, like images, I18N
> resources, configuration files, etc.  It's not for generating code
> used during compilation.
>
>   
>>>> of course the factory method can create just one task that does all the
>>>> rest
>>>> in its action (compile obj files and link), but i do want to use tasks
>>>> for
>>>> the following reasons:
>>>> 1. it makes the logic more like make, which will assist acceptance
>>>> 2. it can use mechanisms in unix compilers to help make. specifically,
>>>> most
>>>> (if not all) unix compilers have an option to spit out dependencies of
>>>> the
>>>> source files on headers.
>>>> 3. it reuses timestamp checking code in rake (and if ever rake implements
>>>> checksum based recompilation)
>>>> 4. if rake will implement a job execution engine (like -j in make), then
>>>> structuring compilation by tasks will allow it to parallelize the
>>>> execution.
>>>>
>>>> but, i think the solution is easy: similar to the 'build' "pseudo task",
>>>> i
>>>> can create a 'compile:prepare' pseudo task that depends on 'resources'
>>>> etc.
>>>> then, the factory method needs only to depend on 'compile:prepare' (the
>>>> logic is that another extension can then add other things to do before
>>>> compile without needing to change the compile extensions)
>>>>
>>>>         
>>> We had compile:prepare in the past which invokes resources and ...
>>> well, that's about it.  It turns out that just having compile and
>>> doing everything else as prerequisite is good enough.
>>>
>>>
>>>       
>>>>>           
>>>>>> package & artifacts
>>>>>> =========
>>>>>> overview
>>>>>> ---------------
>>>>>> buildr has a cool concept that all dependencies (in 'compile.with') are
>>>>>> converted to tasks that are then simple rake dependencies. However, the
>>>>>> conversion is not generic enough. to compile C++ code against a
>>>>>> dependency
>>>>>> one needs 2 paths: a folder containing headers and another containing
>>>>>> libraries. To put this in a repository, these need to be packaged into
>>>>>> one
>>>>>> file. To use after pulling from the repository, one needs to unpack. So
>>>>>> a
>>>>>> task representing a repository artifact is in fact an unzip task, that
>>>>>> depends on the 'Artifact' task to pull the package from a remote
>>>>>> repository.
>>>>>>
>>>>>>
>>>>>>             
>>>>> Let's take Java for example, let's say we have a task that depends on
>>>>> the contents of another WAR.  Specifically the classes (in
>>>>> WEB-INF/classes) and libraries (WEB-INF/lib).  A generic unzipping
>>>>> artifact won't help much, you'll get the root path which is useless.
>>>>> You need the classes path for one, and each file in the lib (pointing
>>>>> to the directory itself does nothing interesting).  It won't work with
>>>>> EAR either, when you unzip those, you end up with a WAR which you need
>>>>> to unzip again.
>>>>>
>>>>> But this hypothetical task that uses WAR could be smarter.  It
>>>>> understands the semantics of the packages it uses, and all these
>>>>> packages follow a common convention, so it only needs to unpack the
>>>>> portions of the WAR it cares about, it knows how to construct the
>>>>> relevant paths, one to class and one to every JAR inside the lib
>>>>> directory.
>>>>>
>>>>> I think the same analogy applies to C packages.  If by convention you
>>>>> always use include and lib, you can unpack only the portion of the
>>>>> package you need, find the relevant paths and use them appropriately.
>>>>>
>>>>>
>>>>>           
>>>> (note: not sure i'm following you here. )
>>>>
>>>>         
>>> Artifacts by themselves are a generic mechanism for getting packages
>>> into the local repository.  Their only responsibility if the artifact
>>> and its metadata, so a task representing a repository artifact would
>>> only know how to download it.
>>>
>>> You can have a separate task that knows how to extract an artifact
>>> task and use it instead, that way you get the unpacking you need, but
>>> not all downloaded artifacts have to be unpacked.
>>>
>>>       
>> yes, this is what i'm currently doing, as i explained below.
>>
>> but what i want is for me to be able to do that by integrating with the
>> existing 'artifacts' task. right now it will only return Artifact objects.
>> I'd like to have a more elegant solution than just to run over them and
>> create my own objects, which i think will be more tricky with transitive
>> dependencies (where transitivity may come from my artifacts, e.g. the
>> project's artifacts)
>>     
>>>       
>>>> my current implementation creates classes that have methods to retrieve
>>>> the
>>>> include paths, the library paths and the library names. I don't use the
>>>> task
>>>> name, since it is useless (as you mentioned). so I have an
>>>> ExtractedRepoArtifact FileTask class that implements these methods by
>>>> relying on the structure of the package ('include' and 'lib'
>>>> directories),
>>>> it depends on the Artifact class and its action is to extract the
>>>> artifact.
>>>>
>>>> When given a project dependency, i return the build task which implements
>>>> the artifact methods mentioned above by returning the
>>>> [:source,:main,:include] and [:target, Platform.id, :lib] paths. It also
>>>> allows the user to add include paths (e.g., for generated files) which
>>>> are
>>>> then both used for compilation and returned by the artifact methods.
>>>>
>>>>         
>>>>>           
>>>>>> furthermore, when building against another project, there is no need to
>>>>>> pack
>>>>>> and unpack in the repository. one can simply use the artifacts produced
>>>>>> in
>>>>>> the 'build' phase of the other project.
>>>>>>
>>>>>>
>>>>>>             
>>>>> Yes.  Right now it points to the package, which gets invoked and so
>>>>> packs everything, whether you need the packing or not.  You don't,
>>>>> however, have to unpack it, if you know the packaging type you can be
>>>>> smarter and go directly to the source.
>>>>>
>>>>>
>>>>>           
>>>> but i don't want to pack if there's no use for it. speed is critical in
>>>> this
>>>> project, since there's no eclipse to constantly compile code for you, so
>>>> developers need to run the build after each change. having it pack
>>>> unnecessarily wasts time.
>>>>
>>>>         
>>> One step at a time.  I would worry if we can't do that at all, but if
>>> it's just optimization, we can get to the more problematic issues
>>> first.
>>>
>>>
>>>       
>>>>>           
>>>>>> finally, in C++ in many cases you rely on a system library.
>>>>>>
>>>>>> in all cases the resulting dependency is two-fold: on a include dir
>>>>>> paths
>>>>>> and on a library paths. note that these do not necessarily reside under
>>>>>> a
>>>>>> shared folder. for example, a dependency on another project may depend
>>>>>> on
>>>>>> two include folders: one just a folder in the source tree, the other of
>>>>>> generated files in the target directory
>>>>>>
>>>>>> suggestion
>>>>>> -------------------
>>>>>> While usage of Buildr.artifacts is only as a utility method, so one can
>>>>>> easily write his own implementation and use that, I think it will be
>>>>>> nice
>>>>>> to
>>>>>> be able to get some reuse.
>>>>>>
>>>>>> * when given a project, use it as is (not 'spec.packages'), or allow it
>>>>>> to
>>>>>> return its artifacts ('spec.artifacts').
>>>>>>
>>>>>>
>>>>>>             
>>>>> Yes.  Except we're missing that whole dependency later (that's
>>>>> something 1.4 will add).  Ideally the project would have dependency
>>>>> lists it can populates (at least compile and runtime), and other
>>>>> projects can get these dependency lists and pick what they want.  So
>>>>> the compile dependency list would be the place to put headers and
>>>>> libraries, without having to package them.  We don't have that right
>>>>> now.
>>>>>
>>>>>
>>>>>           
>>>> this is the purpose for the 'spec.artifacts' suggestion (that is, an
>>>> 'artifacts' method in Project). maybe need to classify them similarly to
>>>> my
>>>> suggestion for 'compile', so the Buildr.artifacts method receives a
>>>> 'classifier' argument, whose value can be, for example,  'java' and calls
>>>> 'spec.artifacts(classifier)'. are we on the same page here?
>>>>
>>>>         
>>> I'm looking at each of your use cases and trying to identify in my mind:
>>> a)  What you can do right now to make it happen.
>>> b)  What, if we added another feature, we should accommodate for.
>>> c)  What new feature we would need for this.
>>>
>>> I'm starting with a) because you can get it working right now, it may
>>> not be elegant and not work as fast, but we can get that out of the
>>> way so we can focus about doing the rest.  There are some things we're
>>> planning on changing anyway, so I'm also trying to see if future
>>> changes would address the elegant/fast use cases, I can tell you what
>>> I have in mind, but no code yet to make it happen.  And then identify
>>> anything not addressed by current plans and decide how to support that
>>> directly.
>>>
>>>       
>> i got it working now. but i'm doing several code paths in parallel. i have a
>> 'make' method instead of 'compile'. the reason are both because i need to
>> create several tasks, not a 'compiler' object (and i want to create them
>> before rake's execution starts) , and because i need to create different
>> implementations per platform.
>>     
>>> Right now, project.packages is good enough for what you need.  It's an
>>> array of tasks, you can throw any task you want in there and the
>>> dependent project would pick on it.  You don't have to throw ZIP files
>>> in there, you can add a header file or a directory of header files, or
>>> a task that knows it's a directly of header files.
>>>
>>> It's inelegant because project.packages is intent to be the list of
>>> things that get installed and released, so it's an "off the label" use
>>> for that part of the API.  But, it will work, and if you just add
>>> things to the end of project.packages, they won't get installed or
>>> released.  So project.packages is that same as project.artifacts, just
>>> with a different name.
>>>
>>>       
>> or i can implement my own 'artifacts' method, which is what i did because i
>> need different artifact objects than what Buildr.artifacts returns.
>>     
>>> Separately, we need (and planning and working on) a smarter dependency
>>> management, which you can populate and anything referencing the
>>> project can access.  It won't be called artifacts but dependencies, it
>>> will do a lot more, and it will be more elegant and documented for
>>> specific use cases like this.
>>>
>>>
>>>
>>>       
>>>>>           
>>>>>> * if a symbol, recursively call on the spec from the namespace
>>>>>> * if a struct, recursively call
>>>>>> * otherwise, classify the artifact and call a factory method to create
>>>>>> it.
>>>>>> classification can be by packaging (e.g. jar). but actually, i don't
>>>>>> have
>>>>>> a
>>>>>> very good idea here. note that for c++, there need to be a way of
>>>>>> defining
>>>>>> an artifact to look in the system for include files and libraries
>>>>>>  (maybe
>>>>>> something like 'openssl:system'? - version and group ids are
>>>>>> meaningless).
>>>>>>  * the factory method can create different artifacts. for c++ there
>>>>>> would
>>>>>> be
>>>>>> RepositoryArtifact (downloads and unpacks), ProjectArtifact (short
>>>>>> circuit
>>>>>> to the project's target and source directories) and SystemArtifact.
>>>>>>
>>>>>> I think that the use of artifact namespaces can help here as it allows
>>>>>> to
>>>>>> create a more verbose syntax for declaring artifacts, while still
>>>>>> allowing
>>>>>> the user to create shorter names for them. (as an example in C++ it
>>>>>> will
>>>>>> allow me to add to the artifact the list of flags to use when
>>>>>> compiling/linking with it, assuming they're not inherent to the
>>>>>> artifact,
>>>>>> e.g. turn debug on). The factory method receives the artifact
>>>>>> definition
>>>>>> (which can actually be defined by each plugin) and decides what to do
>>>>>> with
>>>>>> it.
>>>>>>
>>>>>>
>>>>>>             
>>>>> 1.4 will have a better dependency mechanism, and one thing I looked at
>>>>> is associating meta-data with each dependency.  So perhaps that would
>>>>> address things like compiling/linking flags.
>>>>>
>>>>>
>>>>>           
>>>>>> ordering
>>>>>> =========
>>>>>> overview
>>>>>> -------------------
>>>>>> to support jni, one needs to first compile java classes, then run javah
>>>>>> to
>>>>>> generate headers and then compile c code that implements these headers.
>>>>>> so
>>>>>> the javah task should be able to specify it depends on the java compile
>>>>>> task. this can't be by depending on all compile tasks of course or on
>>>>>> 'build'.
>>>>>>
>>>>>>             
>>>>> Alternatively:
>>>>>
>>>>> compile do |task|
>>>>>  javah task.target
>>>>> end
>>>>>
>>>>> This will run javah each time the compiler runs.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> but running each time is what i want to avoid. not only do i want to
>>>> avoid
>>>> the invocation of 'javah', but when invoked it will change the timestamp
>>>> of
>>>> the generated headers and so many source files will get recompiled.
>>>>
>>>>         
>>> Rake separates invocation from execution.  Invoking a task tells it to
>>> invoke its prerequisites, then use those to decide if it needs
>>> executing, and if so execute.  Whether you put javah at the end of
>>> compile, or a prerequisite to build, it will get invoked and it should
>>> be smart enough to decide whether there's any work to be done.
>>>
>>>       
>> i think i'm missing something here. in the code snippet above, didn't you
>> add an action to 'compile' and in that action call the javah command? to me
>> it looks like at the end of compile javah is run.
>>     
>>> But there is a significant difference between the two.  If you add it
>>> to compile, it gets invoked during compilation -- and compilation
>>> implies there's a change to the source code which might lead to change
>>> in the header files -- and that happens as often as is necessary.  If
>>> you put is as prerequisite to build, it only happens when the build
>>> task runs.  If you run rake task, which doesn't run the build task,
>>> you may end up testing the wrong header files.
>>>
>>>       
>> there should be a rule to the effect of:
>> jni_headers_dir => [classes] do |task|
>>  javah classes # with whatefer flags to put generated headers in
>> jni_headers_dir
>>  touch jni_headers_dir
>> end
>>
>> so if the classes are newer than the directory (and only then) javah runs.
>> if i run it every time it will generate headers, changing the timestamp,
>> which will cause all dependent cpp classes to recompile which will take a
>> lot of time.
>>     
>
> Again, if you do:
>
> compile do
>   file(jni_headers_dir).invoke
> end
>
> It gives you the same effect, except it happens earlier in the process
> (e.g. before test, not just before build).  You invoke the task, the
> task looks at the prerequisites, decides if anything needs to be done,
> and executes only when necessary.
>
> Assaf
>
>   
>>>       
>>>> note that compiling a C/C++ source file is a much slower process than
>>>> compiling java.
>>>>
>>>>         
>>>>>> suggestion
>>>>>> -------------------
>>>>>> when creating a compile task (whose name can be, as in the case of c++,
>>>>>> the
>>>>>> result library name - to allow for dependency checking), also create a
>>>>>> "for
>>>>>> ordering only" task with a symbolic name (e.g., 'java:compile') which
>>>>>> depends on the actual task. then other tasks can depend on that task
>>>>>>
>>>>>>
>>>>>>             
>>>>> And yes, you'll still need that if you want to run the C compiler
>>>>> after the Java compiler, so I think the right thing to do would have
>>>>> separate compile tasks.
>>>>>
>>>>>           
>>>>>> I hope all this makes sense, and I'm looking forward to comments. I
>>>>>> intend
>>>>>> to share the code once I'm finished.
>>>>>>
>>>>>>
>>>>>>             
>>>>> Unfortunately, the last time I wrote C code was over tens years ago,
>>>>> so my rustiness is showing.  I'm sure I missed some points because of
>>>>> that.
>>>>>
>>>>>
>>>>>           
>>>> I hope I cleared things. I think it is worth investing in C/C++ as it is
>>>> a
>>>> space where there's still no solutions (that i know of) that handle
>>>> module
>>>> dependency.
>>>>
>>>>         
>>> Definitely.
>>>
>>>
>>>       
>>>> To make sure it is clear, I'm not asking for the buildr team to implement
>>>> C/C++ building, I intend to do that, and have already made a demo of it
>>>> working, but I do want to ask for the infrastructure in buildr to make it
>>>> easier, since currently it looks like a "stepson".
>>>>
>>>>         
>>> In addition, two things we should look at.
>>>
>>> First, find out a good intersection between C/C++ and other languages.
>>>  There may be some changes that are only necessary for C/C++, but
>>> hopefully most of these can be shared across languages, that way we
>>> get better features all around.
>>>
>>> Second, make sure we exhausted all our options before making a change.
>>>  If there's another way of doing something, even stop-gap measure
>>> while we cook up a better feature all around, then we have less
>>> changes to worry about.
>>>
>>> It's an exercise we did before with Groovy and Scala (earlier versions
>>> were married to Java) and it worked out pretty well.  We started by
>>> not making any changes in Buildr to accommodate it, instead using a
>>> separate task specifically for compiling Scala code that relied on
>>> some hacks and inelegant code to actually work.  Then took the time to
>>> build multi-lingual support out of that.
>>>
>>>       
>> i'm already past that. i have ~20 modules compiling, with transitive
>> dependencies on other modules and on third party modules.
>>
>> so i'm now at a stage where i want better integration with buildr.
>>     
>>> Assaf
>>>
>>>
>>>       
>>>> Ittay
>>>>
>>>>         
>>>>> Assaf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Thank you,
>>>>>> Ittay
>>>>>>
>>>>>>
>>>>>> Notes:
>>>>>> [1] I don't consider linking a library as packaging. First, the obj
>>>>>> files
>>>>>> are not used by themselves as in other languages. Second, packaging is
>>>>>> required to manage dependencies, because in order for project P to be
>>>>>> built
>>>>>> against dependency D, D needs to contain both headers and libraries -
>>>>>> this
>>>>>> is the package.
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Ittay Dror <it...@gmail.com>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>> --
>>>> --
>>>> Ittay Dror <it...@gmail.com>
>>>>
>>>>
>>>>
>>>>
>>>>         
>> --
>> --
>> Ittay Dror <it...@gmail.com>
>>
>>
>>     

-- 
--
Ittay Dror <it...@gmail.com>


Re: request for enhancement: compile, package and artifacts support for C++

Posted by Assaf Arkin <ar...@intalio.com>.
On Tue, Jul 29, 2008 at 12:59 PM, Ittay Dror <it...@gmail.com> wrote:
>
>
> can you give an example of how a task can orchestrate other tasks? also, as
> far as i could tell, the 'compile' method always create a CompileTask. i
> can't use it as is because it expects some compiler which i can't give it
> because i want to use tasks and also, i can't add dependencies to it because
> it depends directly on tasks like 'resources' which the prerequisites should
> depend on.

If you look at the end of compile.rb you'll notice one of the things
it does is call  project.recursive_task('compile') which causes one
project's compile task to execute all its child projects's compile
tasks.  Likewise, if you look at test.rb at the very end, you'll
notice that it's tacking the test task to the very end of the build
task (always test after build).

Another example is the XMLBeans task (in addon) which needs to
generate source code, that is added as prerequisite to compile, and
also copy files over to the target directory, which is done by the
compile task at the very end.

>From the compiler you can do whatever you need to, including invoking
as many tasks as necessary (let Rake worry whether to execute them or
not).  And like XMLBeans does, you can add additional prerequisites
when necessary, and make additional work happen after compilation.

>
> At the risk of spending a lot of time on the obvious (i have a feeling we're
> talking about different things):
>
> say a project has 2 cpp files A.cpp and B.cpp, with matching headers, and no
> other headers, which compile to shared and static libraries. my dependency
> tree is:
>
> compile:cpp ----- libsomething.so --- A.o --- A.cpp
>                    \                                \    /        \ A.h
>                     \                                 X
>            \                              /    \ B.o --- B.cpp
>                       \- libsomething.a/-----/          \ B.h
>
>
> these should be rake tasks for two reasons: timestamp checking and the fact
> that two artifacts rely on the same set of objects. also linking and
> compiling are two different commands and finally, if i call the compiler
> twice, it will do the work twice (that is, it doesn't have any internal
> mechanism that tells it there's no need to recreate the obj files or
> libraries).

Yes.  If all these are separate tasks wired together, than Rake will
only compile what is necessary.  So let's say you have two tasks, just
to simplify (they have other prerequisite tasks), one for
libsomething.so and one for libsomething.a.  You have a compile task
that invokes these two tasks.  Rake only executes what is necessary by
checking dependencies on the object files, which in turn check
dependencies on the cpp and header files, etc.

So now you have one forest of dependencies in the project, all of
which are executed as necessary by the project's compile task.  And
one forest of projects, all of which are also executed as necessary by
the project's compile task.

Your compiler object now has two uses:
a) It makes sure all these tasks exist and get invoked.  There's no
need for it to run a single instance compiler on all the files.  We do
that for Javac because it's Javac, but the compile method can do
whatever it deems necessary.
b) You get an easy way to control compiler options across all of
these, and inherit them from parent projects.  So you could, say, pick
the target architecture in the top-level project, have all the
compilers inherit from it.
c) Your compiler can run all these tasks in parallel.

And since libsomething.so is also a task, if you want you can control
some of these options directly on that task.

>
> note that all of this tree needs to rely on the 'resources' task, since some
> headers may be generated. so 'resources' need to run before all the
> timestamp checking and compilation is done.

The resources task is specifically for copying files to the target
directory that are not handled by the compiler, like images, I18N
resources, configuration files, etc.  It's not for generating code
used during compilation.

>>> of course the factory method can create just one task that does all the
>>> rest
>>> in its action (compile obj files and link), but i do want to use tasks
>>> for
>>> the following reasons:
>>> 1. it makes the logic more like make, which will assist acceptance
>>> 2. it can use mechanisms in unix compilers to help make. specifically,
>>> most
>>> (if not all) unix compilers have an option to spit out dependencies of
>>> the
>>> source files on headers.
>>> 3. it reuses timestamp checking code in rake (and if ever rake implements
>>> checksum based recompilation)
>>> 4. if rake will implement a job execution engine (like -j in make), then
>>> structuring compilation by tasks will allow it to parallelize the
>>> execution.
>>>
>>> but, i think the solution is easy: similar to the 'build' "pseudo task",
>>> i
>>> can create a 'compile:prepare' pseudo task that depends on 'resources'
>>> etc.
>>> then, the factory method needs only to depend on 'compile:prepare' (the
>>> logic is that another extension can then add other things to do before
>>> compile without needing to change the compile extensions)
>>>
>>
>> We had compile:prepare in the past which invokes resources and ...
>> well, that's about it.  It turns out that just having compile and
>> doing everything else as prerequisite is good enough.
>>
>>
>>>>
>>>>
>>>>>
>>>>> package & artifacts
>>>>> =========
>>>>> overview
>>>>> ---------------
>>>>> buildr has a cool concept that all dependencies (in 'compile.with') are
>>>>> converted to tasks that are then simple rake dependencies. However, the
>>>>> conversion is not generic enough. to compile C++ code against a
>>>>> dependency
>>>>> one needs 2 paths: a folder containing headers and another containing
>>>>> libraries. To put this in a repository, these need to be packaged into
>>>>> one
>>>>> file. To use after pulling from the repository, one needs to unpack. So
>>>>> a
>>>>> task representing a repository artifact is in fact an unzip task, that
>>>>> depends on the 'Artifact' task to pull the package from a remote
>>>>> repository.
>>>>>
>>>>>
>>>>
>>>> Let's take Java for example, let's say we have a task that depends on
>>>> the contents of another WAR.  Specifically the classes (in
>>>> WEB-INF/classes) and libraries (WEB-INF/lib).  A generic unzipping
>>>> artifact won't help much, you'll get the root path which is useless.
>>>> You need the classes path for one, and each file in the lib (pointing
>>>> to the directory itself does nothing interesting).  It won't work with
>>>> EAR either, when you unzip those, you end up with a WAR which you need
>>>> to unzip again.
>>>>
>>>> But this hypothetical task that uses WAR could be smarter.  It
>>>> understands the semantics of the packages it uses, and all these
>>>> packages follow a common convention, so it only needs to unpack the
>>>> portions of the WAR it cares about, it knows how to construct the
>>>> relevant paths, one to class and one to every JAR inside the lib
>>>> directory.
>>>>
>>>> I think the same analogy applies to C packages.  If by convention you
>>>> always use include and lib, you can unpack only the portion of the
>>>> package you need, find the relevant paths and use them appropriately.
>>>>
>>>>
>>>
>>> (note: not sure i'm following you here. )
>>>
>>
>> Artifacts by themselves are a generic mechanism for getting packages
>> into the local repository.  Their only responsibility if the artifact
>> and its metadata, so a task representing a repository artifact would
>> only know how to download it.
>>
>> You can have a separate task that knows how to extract an artifact
>> task and use it instead, that way you get the unpacking you need, but
>> not all downloaded artifacts have to be unpacked.
>>
>
> yes, this is what i'm currently doing, as i explained below.
>
> but what i want is for me to be able to do that by integrating with the
> existing 'artifacts' task. right now it will only return Artifact objects.
> I'd like to have a more elegant solution than just to run over them and
> create my own objects, which i think will be more tricky with transitive
> dependencies (where transitivity may come from my artifacts, e.g. the
> project's artifacts)
>>
>>
>>>
>>> my current implementation creates classes that have methods to retrieve
>>> the
>>> include paths, the library paths and the library names. I don't use the
>>> task
>>> name, since it is useless (as you mentioned). so I have an
>>> ExtractedRepoArtifact FileTask class that implements these methods by
>>> relying on the structure of the package ('include' and 'lib'
>>> directories),
>>> it depends on the Artifact class and its action is to extract the
>>> artifact.
>>>
>>> When given a project dependency, i return the build task which implements
>>> the artifact methods mentioned above by returning the
>>> [:source,:main,:include] and [:target, Platform.id, :lib] paths. It also
>>> allows the user to add include paths (e.g., for generated files) which
>>> are
>>> then both used for compilation and returned by the artifact methods.
>>>
>>>>
>>>>
>>>>>
>>>>> furthermore, when building against another project, there is no need to
>>>>> pack
>>>>> and unpack in the repository. one can simply use the artifacts produced
>>>>> in
>>>>> the 'build' phase of the other project.
>>>>>
>>>>>
>>>>
>>>> Yes.  Right now it points to the package, which gets invoked and so
>>>> packs everything, whether you need the packing or not.  You don't,
>>>> however, have to unpack it, if you know the packaging type you can be
>>>> smarter and go directly to the source.
>>>>
>>>>
>>>
>>> but i don't want to pack if there's no use for it. speed is critical in
>>> this
>>> project, since there's no eclipse to constantly compile code for you, so
>>> developers need to run the build after each change. having it pack
>>> unnecessarily wasts time.
>>>
>>
>> One step at a time.  I would worry if we can't do that at all, but if
>> it's just optimization, we can get to the more problematic issues
>> first.
>>
>>
>>>>
>>>>
>>>>>
>>>>> finally, in C++ in many cases you rely on a system library.
>>>>>
>>>>> in all cases the resulting dependency is two-fold: on a include dir
>>>>> paths
>>>>> and on a library paths. note that these do not necessarily reside under
>>>>> a
>>>>> shared folder. for example, a dependency on another project may depend
>>>>> on
>>>>> two include folders: one just a folder in the source tree, the other of
>>>>> generated files in the target directory
>>>>>
>>>>> suggestion
>>>>> -------------------
>>>>> While usage of Buildr.artifacts is only as a utility method, so one can
>>>>> easily write his own implementation and use that, I think it will be
>>>>> nice
>>>>> to
>>>>> be able to get some reuse.
>>>>>
>>>>> * when given a project, use it as is (not 'spec.packages'), or allow it
>>>>> to
>>>>> return its artifacts ('spec.artifacts').
>>>>>
>>>>>
>>>>
>>>> Yes.  Except we're missing that whole dependency later (that's
>>>> something 1.4 will add).  Ideally the project would have dependency
>>>> lists it can populates (at least compile and runtime), and other
>>>> projects can get these dependency lists and pick what they want.  So
>>>> the compile dependency list would be the place to put headers and
>>>> libraries, without having to package them.  We don't have that right
>>>> now.
>>>>
>>>>
>>>
>>> this is the purpose for the 'spec.artifacts' suggestion (that is, an
>>> 'artifacts' method in Project). maybe need to classify them similarly to
>>> my
>>> suggestion for 'compile', so the Buildr.artifacts method receives a
>>> 'classifier' argument, whose value can be, for example,  'java' and calls
>>> 'spec.artifacts(classifier)'. are we on the same page here?
>>>
>>
>> I'm looking at each of your use cases and trying to identify in my mind:
>> a)  What you can do right now to make it happen.
>> b)  What, if we added another feature, we should accommodate for.
>> c)  What new feature we would need for this.
>>
>> I'm starting with a) because you can get it working right now, it may
>> not be elegant and not work as fast, but we can get that out of the
>> way so we can focus about doing the rest.  There are some things we're
>> planning on changing anyway, so I'm also trying to see if future
>> changes would address the elegant/fast use cases, I can tell you what
>> I have in mind, but no code yet to make it happen.  And then identify
>> anything not addressed by current plans and decide how to support that
>> directly.
>>
>
> i got it working now. but i'm doing several code paths in parallel. i have a
> 'make' method instead of 'compile'. the reason are both because i need to
> create several tasks, not a 'compiler' object (and i want to create them
> before rake's execution starts) , and because i need to create different
> implementations per platform.
>>
>> Right now, project.packages is good enough for what you need.  It's an
>> array of tasks, you can throw any task you want in there and the
>> dependent project would pick on it.  You don't have to throw ZIP files
>> in there, you can add a header file or a directory of header files, or
>> a task that knows it's a directly of header files.
>>
>> It's inelegant because project.packages is intent to be the list of
>> things that get installed and released, so it's an "off the label" use
>> for that part of the API.  But, it will work, and if you just add
>> things to the end of project.packages, they won't get installed or
>> released.  So project.packages is that same as project.artifacts, just
>> with a different name.
>>
>
> or i can implement my own 'artifacts' method, which is what i did because i
> need different artifact objects than what Buildr.artifacts returns.
>>
>> Separately, we need (and planning and working on) a smarter dependency
>> management, which you can populate and anything referencing the
>> project can access.  It won't be called artifacts but dependencies, it
>> will do a lot more, and it will be more elegant and documented for
>> specific use cases like this.
>>
>>
>>
>>>>
>>>>
>>>>>
>>>>> * if a symbol, recursively call on the spec from the namespace
>>>>> * if a struct, recursively call
>>>>> * otherwise, classify the artifact and call a factory method to create
>>>>> it.
>>>>> classification can be by packaging (e.g. jar). but actually, i don't
>>>>> have
>>>>> a
>>>>> very good idea here. note that for c++, there need to be a way of
>>>>> defining
>>>>> an artifact to look in the system for include files and libraries
>>>>>  (maybe
>>>>> something like 'openssl:system'? - version and group ids are
>>>>> meaningless).
>>>>>  * the factory method can create different artifacts. for c++ there
>>>>> would
>>>>> be
>>>>> RepositoryArtifact (downloads and unpacks), ProjectArtifact (short
>>>>> circuit
>>>>> to the project's target and source directories) and SystemArtifact.
>>>>>
>>>>> I think that the use of artifact namespaces can help here as it allows
>>>>> to
>>>>> create a more verbose syntax for declaring artifacts, while still
>>>>> allowing
>>>>> the user to create shorter names for them. (as an example in C++ it
>>>>> will
>>>>> allow me to add to the artifact the list of flags to use when
>>>>> compiling/linking with it, assuming they're not inherent to the
>>>>> artifact,
>>>>> e.g. turn debug on). The factory method receives the artifact
>>>>> definition
>>>>> (which can actually be defined by each plugin) and decides what to do
>>>>> with
>>>>> it.
>>>>>
>>>>>
>>>>
>>>> 1.4 will have a better dependency mechanism, and one thing I looked at
>>>> is associating meta-data with each dependency.  So perhaps that would
>>>> address things like compiling/linking flags.
>>>>
>>>>
>>>>>
>>>>> ordering
>>>>> =========
>>>>> overview
>>>>> -------------------
>>>>> to support jni, one needs to first compile java classes, then run javah
>>>>> to
>>>>> generate headers and then compile c code that implements these headers.
>>>>> so
>>>>> the javah task should be able to specify it depends on the java compile
>>>>> task. this can't be by depending on all compile tasks of course or on
>>>>> 'build'.
>>>>>
>>>>
>>>> Alternatively:
>>>>
>>>> compile do |task|
>>>>  javah task.target
>>>> end
>>>>
>>>> This will run javah each time the compiler runs.
>>>>
>>>>
>>>>
>>>>
>>>
>>> but running each time is what i want to avoid. not only do i want to
>>> avoid
>>> the invocation of 'javah', but when invoked it will change the timestamp
>>> of
>>> the generated headers and so many source files will get recompiled.
>>>
>>
>> Rake separates invocation from execution.  Invoking a task tells it to
>> invoke its prerequisites, then use those to decide if it needs
>> executing, and if so execute.  Whether you put javah at the end of
>> compile, or a prerequisite to build, it will get invoked and it should
>> be smart enough to decide whether there's any work to be done.
>>
>
> i think i'm missing something here. in the code snippet above, didn't you
> add an action to 'compile' and in that action call the javah command? to me
> it looks like at the end of compile javah is run.
>>
>> But there is a significant difference between the two.  If you add it
>> to compile, it gets invoked during compilation -- and compilation
>> implies there's a change to the source code which might lead to change
>> in the header files -- and that happens as often as is necessary.  If
>> you put is as prerequisite to build, it only happens when the build
>> task runs.  If you run rake task, which doesn't run the build task,
>> you may end up testing the wrong header files.
>>
>
> there should be a rule to the effect of:
> jni_headers_dir => [classes] do |task|
>  javah classes # with whatefer flags to put generated headers in
> jni_headers_dir
>  touch jni_headers_dir
> end
>
> so if the classes are newer than the directory (and only then) javah runs.
> if i run it every time it will generate headers, changing the timestamp,
> which will cause all dependent cpp classes to recompile which will take a
> lot of time.

Again, if you do:

compile do
  file(jni_headers_dir).invoke
end

It gives you the same effect, except it happens earlier in the process
(e.g. before test, not just before build).  You invoke the task, the
task looks at the prerequisites, decides if anything needs to be done,
and executes only when necessary.

Assaf

>>
>>
>>>
>>> note that compiling a C/C++ source file is a much slower process than
>>> compiling java.
>>>
>>>>>
>>>>> suggestion
>>>>> -------------------
>>>>> when creating a compile task (whose name can be, as in the case of c++,
>>>>> the
>>>>> result library name - to allow for dependency checking), also create a
>>>>> "for
>>>>> ordering only" task with a symbolic name (e.g., 'java:compile') which
>>>>> depends on the actual task. then other tasks can depend on that task
>>>>>
>>>>>
>>>>
>>>> And yes, you'll still need that if you want to run the C compiler
>>>> after the Java compiler, so I think the right thing to do would have
>>>> separate compile tasks.
>>>>
>>>>>
>>>>> I hope all this makes sense, and I'm looking forward to comments. I
>>>>> intend
>>>>> to share the code once I'm finished.
>>>>>
>>>>>
>>>>
>>>> Unfortunately, the last time I wrote C code was over tens years ago,
>>>> so my rustiness is showing.  I'm sure I missed some points because of
>>>> that.
>>>>
>>>>
>>>
>>> I hope I cleared things. I think it is worth investing in C/C++ as it is
>>> a
>>> space where there's still no solutions (that i know of) that handle
>>> module
>>> dependency.
>>>
>>
>> Definitely.
>>
>>
>>>
>>> To make sure it is clear, I'm not asking for the buildr team to implement
>>> C/C++ building, I intend to do that, and have already made a demo of it
>>> working, but I do want to ask for the infrastructure in buildr to make it
>>> easier, since currently it looks like a "stepson".
>>>
>>
>> In addition, two things we should look at.
>>
>> First, find out a good intersection between C/C++ and other languages.
>>  There may be some changes that are only necessary for C/C++, but
>> hopefully most of these can be shared across languages, that way we
>> get better features all around.
>>
>> Second, make sure we exhausted all our options before making a change.
>>  If there's another way of doing something, even stop-gap measure
>> while we cook up a better feature all around, then we have less
>> changes to worry about.
>>
>> It's an exercise we did before with Groovy and Scala (earlier versions
>> were married to Java) and it worked out pretty well.  We started by
>> not making any changes in Buildr to accommodate it, instead using a
>> separate task specifically for compiling Scala code that relied on
>> some hacks and inelegant code to actually work.  Then took the time to
>> build multi-lingual support out of that.
>>
>
> i'm already past that. i have ~20 modules compiling, with transitive
> dependencies on other modules and on third party modules.
>
> so i'm now at a stage where i want better integration with buildr.
>>
>> Assaf
>>
>>
>>>
>>> Ittay
>>>
>>>>
>>>> Assaf
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Thank you,
>>>>> Ittay
>>>>>
>>>>>
>>>>> Notes:
>>>>> [1] I don't consider linking a library as packaging. First, the obj
>>>>> files
>>>>> are not used by themselves as in other languages. Second, packaging is
>>>>> required to manage dependencies, because in order for project P to be
>>>>> built
>>>>> against dependency D, D needs to contain both headers and libraries -
>>>>> this
>>>>> is the package.
>>>>>
>>>>> --
>>>>> --
>>>>> Ittay Dror <it...@gmail.com>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> --
>>> Ittay Dror <it...@gmail.com>
>>>
>>>
>>>
>>>
>
> --
> --
> Ittay Dror <it...@gmail.com>
>
>

Re: request for enhancement: compile, package and artifacts support for C++

Posted by Ittay Dror <it...@gmail.com>.

Assaf Arkin wrote:
> On Mon, Jul 28, 2008 at 11:23 PM, Ittay Dror <it...@gmail.com> wrote:
>   
>> I merged the other email (ordering) and comments. My comments inline
>>
>> Assaf Arkin wrote:
>>     
>>> On Mon, Jul 28, 2008 at 2:42 AM, Ittay Dror <it...@gmail.com> wrote:
>>>
>>>       
>>>> Hi,
>>>>
>>>> I'm working on adding C++ support to buildr. I already have a prototype
>>>> that
>>>> builds libraries and executables in Linux. I'd like to share some of the
>>>> difficulties I had and request changes to buildr to accommodate C++ more
>>>> easily. (Right now, I've created parallel route to that of building
>>>> Java-like code)
>>>>
>>>> compile
>>>> ========
>>>> overview
>>>> --------------------
>>>> the compile method in project returns a CompileTask that is generic and
>>>> uses
>>>> a Compiler instance to do the actual compilation. In C++, compilation is
>>>> also dependency based (.o => .cpp, sometimes precompiling headers). Also,
>>>> the same code can produce several results (static and shared libraries,
>>>> oj
>>>> files with debug, profiling, preprocessor defines turned on and off). [1]
>>>>
>>>> there is the 'build' task, which is used as a stub to attach dependencies
>>>> to.
>>>>
>>>> suggestion
>>>> ---------------------
>>>> * there should be an array of compile tasks (as in packages)
>>>> * #compile should delegate the call to a factory method which returns a
>>>> task
>>>> (again, as in packages)
>>>>
>>>>         
>>> Yes.  And I know a few people just waiting for the change to compile
>>> multiple things in the same project, so here's another reason for
>>> adding this feature.
>>>
>>> But I have to warn you, it's not as simple as it looks, I took a stab
>>> at it before and deciding to downscale support to one compiler per
>>> project.  It's worth doing because a lot of languages would benefit
>>> from it, but that's also what makes it tricky.  I think it would be
>>> easier to get C support working without it first, and separately work
>>> on this feature and then improve C support using it.
>>>
>>>       
>> How about this: classify compile commands with symbolic names. like
>> compile('java') or compile('c++:shared') ? on bootstrap, the different
>> extensions can create compile tasks based on directory structure (so the
>> Java extension can see that the directory [:source, :main, :java] exists and
>> create compile('java') with some default values.
>>
>> All compile tasks are prerequisits of 'build'
>>
>> Then 'package :jar' can create a package that depends on compile('java'),
>> compile('groovy') or whatever makes sense to put in a jar, as long as the
>> compile task exist of course (not to create them if they don't) (BTW, I have
>> some issues with the lack of command-query separation, normally when using a
>> query method, I wouldn't want a task to be created if it doesn't exist)
>>     
>
> Rake::Task.task_defined? will tell if if a task is defined without
> creating it.  Rake::Task[] (same as calling task) would find you the
> task, creating it if necessary by looking at the rules, existing files
> or creating a generic task.
>
> I want to avoid discussing the issues with
> compile('java')/compile('groovy') here.  It's a big issue that belongs
> in its own thread and affects more than just C/C++.  I'm just pointing
> out that it looks as easy as adding a language flag to compile, but
> when you get down to look at all the details involved, it's a pretty
> damn big change.
>
> And separately, see comments below, it will not replace the generic
> compile task but add more tasks for compile to orchestrate.
>
>   
>>>       
>>>> * generic pre-requisites (like 'resources') should either be tacked on
>>>> 'build' (relying on order of prerequisites), or the compile task can be
>>>> defined to be a composite (that is, from the outside it is a single task,
>>>> but it can use other tasks to accomplish its job).
>>>>
>>>>         
>>> compile already is: resources is a prerequisite for compile, some
>>> other tasks (e.g. byte code enhancing) are tacked on to compile by
>>> enhancing it.
>>>
>>>
>>>       
>> yes, but the compilation of the java family of languages is one task
>> (calling javac), while compiling c++ is several tasks: task per obj file and
>> task per link. so there's a chain of tasks already. having a generic method
>> receive a task from the factory method and make it depend on 'resources'
>> won't do, since the lower level tasks should be the ones that depend.
>>     
>
> I don't see why the existing compile task can't orchestrate all the
> smaller compile tasks.  It already orchestrates several tasks,
> compiling a project will compile all its sub-projects, dependencies,
> resources, etc.  Think of it as the compile stage of the build, more
> than just running the compiler.  In fact all top-level projects have a
> compile task, but many don't have anything to compile, just use it to
> orchestate compilation of all their child projects.
>
> If you let compile orchestrate smaller tasks, you can get the Rake
> dependency mechanism working for you to handle individual object
> files, compiling only that which is necessary, but also get the Buildr
> dependency mechanism orchestrating the different steps of the build
> and dependencies between projects.
>   
can you give an example of how a task can orchestrate other tasks? also, 
as far as i could tell, the 'compile' method always create a 
CompileTask. i can't use it as is because it expects some compiler which 
i can't give it because i want to use tasks and also, i can't add 
dependencies to it because it depends directly on tasks like 'resources' 
which the prerequisites should depend on.

At the risk of spending a lot of time on the obvious (i have a feeling 
we're talking about different things):

say a project has 2 cpp files A.cpp and B.cpp, with matching headers, 
and no other headers, which compile to shared and static libraries. my 
dependency tree is:

compile:cpp ----- libsomething.so --- A.o --- A.cpp
                     \                                \    /        \ A.h
                      \                                 X          
                       \                              /    \ B.o --- B.cpp
                        \- libsomething.a/-----/          \ B.h


these should be rake tasks for two reasons: timestamp checking and the 
fact that two artifacts rely on the same set of objects. also linking 
and compiling are two different commands and finally, if i call the 
compiler twice, it will do the work twice (that is, it doesn't have any 
internal mechanism that tells it there's no need to recreate the obj 
files or libraries).

note that all of this tree needs to rely on the 'resources' task, since 
some headers may be generated. so 'resources' need to run before all the 
timestamp checking and compilation is done.
>
>   
>> of course the factory method can create just one task that does all the rest
>> in its action (compile obj files and link), but i do want to use tasks for
>> the following reasons:
>> 1. it makes the logic more like make, which will assist acceptance
>> 2. it can use mechanisms in unix compilers to help make. specifically, most
>> (if not all) unix compilers have an option to spit out dependencies of the
>> source files on headers.
>> 3. it reuses timestamp checking code in rake (and if ever rake implements
>> checksum based recompilation)
>> 4. if rake will implement a job execution engine (like -j in make), then
>> structuring compilation by tasks will allow it to parallelize the execution.
>>
>> but, i think the solution is easy: similar to the 'build' "pseudo task", i
>> can create a 'compile:prepare' pseudo task that depends on 'resources' etc.
>> then, the factory method needs only to depend on 'compile:prepare' (the
>> logic is that another extension can then add other things to do before
>> compile without needing to change the compile extensions)
>>     
>
> We had compile:prepare in the past which invokes resources and ...
> well, that's about it.  It turns out that just having compile and
> doing everything else as prerequisite is good enough.
>
>   
>>>       
>>>> package & artifacts
>>>> =========
>>>> overview
>>>> ---------------
>>>> buildr has a cool concept that all dependencies (in 'compile.with') are
>>>> converted to tasks that are then simple rake dependencies. However, the
>>>> conversion is not generic enough. to compile C++ code against a
>>>> dependency
>>>> one needs 2 paths: a folder containing headers and another containing
>>>> libraries. To put this in a repository, these need to be packaged into
>>>> one
>>>> file. To use after pulling from the repository, one needs to unpack. So a
>>>> task representing a repository artifact is in fact an unzip task, that
>>>> depends on the 'Artifact' task to pull the package from a remote
>>>> repository.
>>>>
>>>>         
>>> Let's take Java for example, let's say we have a task that depends on
>>> the contents of another WAR.  Specifically the classes (in
>>> WEB-INF/classes) and libraries (WEB-INF/lib).  A generic unzipping
>>> artifact won't help much, you'll get the root path which is useless.
>>> You need the classes path for one, and each file in the lib (pointing
>>> to the directory itself does nothing interesting).  It won't work with
>>> EAR either, when you unzip those, you end up with a WAR which you need
>>> to unzip again.
>>>
>>> But this hypothetical task that uses WAR could be smarter.  It
>>> understands the semantics of the packages it uses, and all these
>>> packages follow a common convention, so it only needs to unpack the
>>> portions of the WAR it cares about, it knows how to construct the
>>> relevant paths, one to class and one to every JAR inside the lib
>>> directory.
>>>
>>> I think the same analogy applies to C packages.  If by convention you
>>> always use include and lib, you can unpack only the portion of the
>>> package you need, find the relevant paths and use them appropriately.
>>>
>>>       
>> (note: not sure i'm following you here. )
>>     
>
> Artifacts by themselves are a generic mechanism for getting packages
> into the local repository.  Their only responsibility if the artifact
> and its metadata, so a task representing a repository artifact would
> only know how to download it.
>
> You can have a separate task that knows how to extract an artifact
> task and use it instead, that way you get the unpacking you need, but
> not all downloaded artifacts have to be unpacked.
>   
yes, this is what i'm currently doing, as i explained below.

but what i want is for me to be able to do that by integrating with the 
existing 'artifacts' task. right now it will only return Artifact 
objects. I'd like to have a more elegant solution than just to run over 
them and create my own objects, which i think will be more tricky with 
transitive dependencies (where transitivity may come from my artifacts, 
e.g. the project's artifacts)
>
>   
>> my current implementation creates classes that have methods to retrieve the
>> include paths, the library paths and the library names. I don't use the task
>> name, since it is useless (as you mentioned). so I have an
>> ExtractedRepoArtifact FileTask class that implements these methods by
>> relying on the structure of the package ('include' and 'lib' directories),
>> it depends on the Artifact class and its action is to extract the artifact.
>>
>> When given a project dependency, i return the build task which implements
>> the artifact methods mentioned above by returning the
>> [:source,:main,:include] and [:target, Platform.id, :lib] paths. It also
>> allows the user to add include paths (e.g., for generated files) which are
>> then both used for compilation and returned by the artifact methods.
>>     
>>>       
>>>> furthermore, when building against another project, there is no need to
>>>> pack
>>>> and unpack in the repository. one can simply use the artifacts produced
>>>> in
>>>> the 'build' phase of the other project.
>>>>
>>>>         
>>> Yes.  Right now it points to the package, which gets invoked and so
>>> packs everything, whether you need the packing or not.  You don't,
>>> however, have to unpack it, if you know the packaging type you can be
>>> smarter and go directly to the source.
>>>
>>>       
>> but i don't want to pack if there's no use for it. speed is critical in this
>> project, since there's no eclipse to constantly compile code for you, so
>> developers need to run the build after each change. having it pack
>> unnecessarily wasts time.
>>     
>
> One step at a time.  I would worry if we can't do that at all, but if
> it's just optimization, we can get to the more problematic issues
> first.
>
>   
>   
>>>       
>>>> finally, in C++ in many cases you rely on a system library.
>>>>
>>>> in all cases the resulting dependency is two-fold: on a include dir paths
>>>> and on a library paths. note that these do not necessarily reside under a
>>>> shared folder. for example, a dependency on another project may depend on
>>>> two include folders: one just a folder in the source tree, the other of
>>>> generated files in the target directory
>>>>
>>>> suggestion
>>>> -------------------
>>>> While usage of Buildr.artifacts is only as a utility method, so one can
>>>> easily write his own implementation and use that, I think it will be nice
>>>> to
>>>> be able to get some reuse.
>>>>
>>>> * when given a project, use it as is (not 'spec.packages'), or allow it
>>>> to
>>>> return its artifacts ('spec.artifacts').
>>>>
>>>>         
>>> Yes.  Except we're missing that whole dependency later (that's
>>> something 1.4 will add).  Ideally the project would have dependency
>>> lists it can populates (at least compile and runtime), and other
>>> projects can get these dependency lists and pick what they want.  So
>>> the compile dependency list would be the place to put headers and
>>> libraries, without having to package them.  We don't have that right
>>> now.
>>>
>>>       
>> this is the purpose for the 'spec.artifacts' suggestion (that is, an
>> 'artifacts' method in Project). maybe need to classify them similarly to my
>> suggestion for 'compile', so the Buildr.artifacts method receives a
>> 'classifier' argument, whose value can be, for example,  'java' and calls
>> 'spec.artifacts(classifier)'. are we on the same page here?
>>     
>
> I'm looking at each of your use cases and trying to identify in my mind:
> a)  What you can do right now to make it happen.
> b)  What, if we added another feature, we should accommodate for.
> c)  What new feature we would need for this.
>
> I'm starting with a) because you can get it working right now, it may
> not be elegant and not work as fast, but we can get that out of the
> way so we can focus about doing the rest.  There are some things we're
> planning on changing anyway, so I'm also trying to see if future
> changes would address the elegant/fast use cases, I can tell you what
> I have in mind, but no code yet to make it happen.  And then identify
> anything not addressed by current plans and decide how to support that
> directly.
>   
i got it working now. but i'm doing several code paths in parallel. i 
have a 'make' method instead of 'compile'. the reason are both because i 
need to create several tasks, not a 'compiler' object (and i want to 
create them before rake's execution starts) , and because i need to 
create different implementations per platform.
>
> Right now, project.packages is good enough for what you need.  It's an
> array of tasks, you can throw any task you want in there and the
> dependent project would pick on it.  You don't have to throw ZIP files
> in there, you can add a header file or a directory of header files, or
> a task that knows it's a directly of header files.
>
> It's inelegant because project.packages is intent to be the list of
> things that get installed and released, so it's an "off the label" use
> for that part of the API.  But, it will work, and if you just add
> things to the end of project.packages, they won't get installed or
> released.  So project.packages is that same as project.artifacts, just
> with a different name.
>   
or i can implement my own 'artifacts' method, which is what i did 
because i need different artifact objects than what Buildr.artifacts 
returns.
> Separately, we need (and planning and working on) a smarter dependency
> management, which you can populate and anything referencing the
> project can access.  It won't be called artifacts but dependencies, it
> will do a lot more, and it will be more elegant and documented for
> specific use cases like this.
>
>
>   
>>>       
>>>> * if a symbol, recursively call on the spec from the namespace
>>>> * if a struct, recursively call
>>>> * otherwise, classify the artifact and call a factory method to create
>>>> it.
>>>> classification can be by packaging (e.g. jar). but actually, i don't have
>>>> a
>>>> very good idea here. note that for c++, there need to be a way of
>>>> defining
>>>> an artifact to look in the system for include files and libraries  (maybe
>>>> something like 'openssl:system'? - version and group ids are
>>>> meaningless).
>>>>  * the factory method can create different artifacts. for c++ there would
>>>> be
>>>> RepositoryArtifact (downloads and unpacks), ProjectArtifact (short
>>>> circuit
>>>> to the project's target and source directories) and SystemArtifact.
>>>>
>>>> I think that the use of artifact namespaces can help here as it allows to
>>>> create a more verbose syntax for declaring artifacts, while still
>>>> allowing
>>>> the user to create shorter names for them. (as an example in C++ it will
>>>> allow me to add to the artifact the list of flags to use when
>>>> compiling/linking with it, assuming they're not inherent to the artifact,
>>>> e.g. turn debug on). The factory method receives the artifact definition
>>>> (which can actually be defined by each plugin) and decides what to do
>>>> with
>>>> it.
>>>>
>>>>         
>>> 1.4 will have a better dependency mechanism, and one thing I looked at
>>> is associating meta-data with each dependency.  So perhaps that would
>>> address things like compiling/linking flags.
>>>
>>>       
>>>> ordering
>>>> =========
>>>> overview
>>>> -------------------
>>>> to support jni, one needs to first compile java classes, then run javah
>>>> to
>>>> generate headers and then compile c code that implements these headers.
>>>> so
>>>> the javah task should be able to specify it depends on the java compile
>>>> task. this can't be by depending on all compile tasks of course or on
>>>> 'build'.
>>>>         
>>> Alternatively:
>>>
>>> compile do |task|
>>>  javah task.target
>>> end
>>>
>>> This will run javah each time the compiler runs.
>>>
>>>
>>>
>>>       
>> but running each time is what i want to avoid. not only do i want to avoid
>> the invocation of 'javah', but when invoked it will change the timestamp of
>> the generated headers and so many source files will get recompiled.
>>     
>
> Rake separates invocation from execution.  Invoking a task tells it to
> invoke its prerequisites, then use those to decide if it needs
> executing, and if so execute.  Whether you put javah at the end of
> compile, or a prerequisite to build, it will get invoked and it should
> be smart enough to decide whether there's any work to be done.
>   
i think i'm missing something here. in the code snippet above, didn't 
you add an action to 'compile' and in that action call the javah 
command? to me it looks like at the end of compile javah is run.
> But there is a significant difference between the two.  If you add it
> to compile, it gets invoked during compilation -- and compilation
> implies there's a change to the source code which might lead to change
> in the header files -- and that happens as often as is necessary.  If
> you put is as prerequisite to build, it only happens when the build
> task runs.  If you run rake task, which doesn't run the build task,
> you may end up testing the wrong header files.
>   
there should be a rule to the effect of:
jni_headers_dir => [classes] do |task|
  javah classes # with whatefer flags to put generated headers in 
jni_headers_dir
  touch jni_headers_dir
end

so if the classes are newer than the directory (and only then) javah 
runs. if i run it every time it will generate headers, changing the 
timestamp, which will cause all dependent cpp classes to recompile which 
will take a lot of time.
>
>   
>> note that compiling a C/C++ source file is a much slower process than
>> compiling java.
>>     
>>>> suggestion
>>>> -------------------
>>>> when creating a compile task (whose name can be, as in the case of c++,
>>>> the
>>>> result library name - to allow for dependency checking), also create a
>>>> "for
>>>> ordering only" task with a symbolic name (e.g., 'java:compile') which
>>>> depends on the actual task. then other tasks can depend on that task
>>>>
>>>>         
>>> And yes, you'll still need that if you want to run the C compiler
>>> after the Java compiler, so I think the right thing to do would have
>>> separate compile tasks.
>>>       
>>>> I hope all this makes sense, and I'm looking forward to comments. I
>>>> intend
>>>> to share the code once I'm finished.
>>>>
>>>>         
>>> Unfortunately, the last time I wrote C code was over tens years ago,
>>> so my rustiness is showing.  I'm sure I missed some points because of
>>> that.
>>>
>>>       
>> I hope I cleared things. I think it is worth investing in C/C++ as it is a
>> space where there's still no solutions (that i know of) that handle module
>> dependency.
>>     
>
> Definitely.
>
>   
>> To make sure it is clear, I'm not asking for the buildr team to implement
>> C/C++ building, I intend to do that, and have already made a demo of it
>> working, but I do want to ask for the infrastructure in buildr to make it
>> easier, since currently it looks like a "stepson".
>>     
>
> In addition, two things we should look at.
>
> First, find out a good intersection between C/C++ and other languages.
>  There may be some changes that are only necessary for C/C++, but
> hopefully most of these can be shared across languages, that way we
> get better features all around.
>
> Second, make sure we exhausted all our options before making a change.
>  If there's another way of doing something, even stop-gap measure
> while we cook up a better feature all around, then we have less
> changes to worry about.
>
> It's an exercise we did before with Groovy and Scala (earlier versions
> were married to Java) and it worked out pretty well.  We started by
> not making any changes in Buildr to accommodate it, instead using a
> separate task specifically for compiling Scala code that relied on
> some hacks and inelegant code to actually work.  Then took the time to
> build multi-lingual support out of that.
>   
i'm already past that. i have ~20 modules compiling, with transitive 
dependencies on other modules and on third party modules.

so i'm now at a stage where i want better integration with buildr.
> Assaf
>
>   
>> Ittay
>>     
>>> Assaf
>>>
>>>
>>>
>>>       
>>>> Thank you,
>>>> Ittay
>>>>
>>>>
>>>> Notes:
>>>> [1] I don't consider linking a library as packaging. First, the obj files
>>>> are not used by themselves as in other languages. Second, packaging is
>>>> required to manage dependencies, because in order for project P to be
>>>> built
>>>> against dependency D, D needs to contain both headers and libraries -
>>>> this
>>>> is the package.
>>>>
>>>> --
>>>> --
>>>> Ittay Dror <it...@gmail.com>
>>>>
>>>>
>>>>
>>>>
>>>>         
>> --
>> --
>> Ittay Dror <it...@gmail.com>
>>
>>
>>
>>     

-- 
--
Ittay Dror <it...@gmail.com>


Re: request for enhancement: compile, package and artifacts support for C++

Posted by Assaf Arkin <ar...@intalio.com>.
On Mon, Jul 28, 2008 at 11:23 PM, Ittay Dror <it...@gmail.com> wrote:
> I merged the other email (ordering) and comments. My comments inline
>
> Assaf Arkin wrote:
>>
>> On Mon, Jul 28, 2008 at 2:42 AM, Ittay Dror <it...@gmail.com> wrote:
>>
>>>
>>> Hi,
>>>
>>> I'm working on adding C++ support to buildr. I already have a prototype
>>> that
>>> builds libraries and executables in Linux. I'd like to share some of the
>>> difficulties I had and request changes to buildr to accommodate C++ more
>>> easily. (Right now, I've created parallel route to that of building
>>> Java-like code)
>>>
>>> compile
>>> ========
>>> overview
>>> --------------------
>>> the compile method in project returns a CompileTask that is generic and
>>> uses
>>> a Compiler instance to do the actual compilation. In C++, compilation is
>>> also dependency based (.o => .cpp, sometimes precompiling headers). Also,
>>> the same code can produce several results (static and shared libraries,
>>> oj
>>> files with debug, profiling, preprocessor defines turned on and off). [1]
>>>
>>> there is the 'build' task, which is used as a stub to attach dependencies
>>> to.
>>>
>>> suggestion
>>> ---------------------
>>> * there should be an array of compile tasks (as in packages)
>>> * #compile should delegate the call to a factory method which returns a
>>> task
>>> (again, as in packages)
>>>
>>
>> Yes.  And I know a few people just waiting for the change to compile
>> multiple things in the same project, so here's another reason for
>> adding this feature.
>>
>> But I have to warn you, it's not as simple as it looks, I took a stab
>> at it before and deciding to downscale support to one compiler per
>> project.  It's worth doing because a lot of languages would benefit
>> from it, but that's also what makes it tricky.  I think it would be
>> easier to get C support working without it first, and separately work
>> on this feature and then improve C support using it.
>>
>
> How about this: classify compile commands with symbolic names. like
> compile('java') or compile('c++:shared') ? on bootstrap, the different
> extensions can create compile tasks based on directory structure (so the
> Java extension can see that the directory [:source, :main, :java] exists and
> create compile('java') with some default values.
>
> All compile tasks are prerequisits of 'build'
>
> Then 'package :jar' can create a package that depends on compile('java'),
> compile('groovy') or whatever makes sense to put in a jar, as long as the
> compile task exist of course (not to create them if they don't) (BTW, I have
> some issues with the lack of command-query separation, normally when using a
> query method, I wouldn't want a task to be created if it doesn't exist)

Rake::Task.task_defined? will tell if if a task is defined without
creating it.  Rake::Task[] (same as calling task) would find you the
task, creating it if necessary by looking at the rules, existing files
or creating a generic task.

I want to avoid discussing the issues with
compile('java')/compile('groovy') here.  It's a big issue that belongs
in its own thread and affects more than just C/C++.  I'm just pointing
out that it looks as easy as adding a language flag to compile, but
when you get down to look at all the details involved, it's a pretty
damn big change.

And separately, see comments below, it will not replace the generic
compile task but add more tasks for compile to orchestrate.

>>
>>
>>>
>>> * generic pre-requisites (like 'resources') should either be tacked on
>>> 'build' (relying on order of prerequisites), or the compile task can be
>>> defined to be a composite (that is, from the outside it is a single task,
>>> but it can use other tasks to accomplish its job).
>>>
>>
>> compile already is: resources is a prerequisite for compile, some
>> other tasks (e.g. byte code enhancing) are tacked on to compile by
>> enhancing it.
>>
>>
>
> yes, but the compilation of the java family of languages is one task
> (calling javac), while compiling c++ is several tasks: task per obj file and
> task per link. so there's a chain of tasks already. having a generic method
> receive a task from the factory method and make it depend on 'resources'
> won't do, since the lower level tasks should be the ones that depend.

I don't see why the existing compile task can't orchestrate all the
smaller compile tasks.  It already orchestrates several tasks,
compiling a project will compile all its sub-projects, dependencies,
resources, etc.  Think of it as the compile stage of the build, more
than just running the compiler.  In fact all top-level projects have a
compile task, but many don't have anything to compile, just use it to
orchestate compilation of all their child projects.

If you let compile orchestrate smaller tasks, you can get the Rake
dependency mechanism working for you to handle individual object
files, compiling only that which is necessary, but also get the Buildr
dependency mechanism orchestrating the different steps of the build
and dependencies between projects.


> of course the factory method can create just one task that does all the rest
> in its action (compile obj files and link), but i do want to use tasks for
> the following reasons:
> 1. it makes the logic more like make, which will assist acceptance
> 2. it can use mechanisms in unix compilers to help make. specifically, most
> (if not all) unix compilers have an option to spit out dependencies of the
> source files on headers.
> 3. it reuses timestamp checking code in rake (and if ever rake implements
> checksum based recompilation)
> 4. if rake will implement a job execution engine (like -j in make), then
> structuring compilation by tasks will allow it to parallelize the execution.
>
> but, i think the solution is easy: similar to the 'build' "pseudo task", i
> can create a 'compile:prepare' pseudo task that depends on 'resources' etc.
> then, the factory method needs only to depend on 'compile:prepare' (the
> logic is that another extension can then add other things to do before
> compile without needing to change the compile extensions)

We had compile:prepare in the past which invokes resources and ...
well, that's about it.  It turns out that just having compile and
doing everything else as prerequisite is good enough.

>>
>>
>>>
>>> package & artifacts
>>> =========
>>> overview
>>> ---------------
>>> buildr has a cool concept that all dependencies (in 'compile.with') are
>>> converted to tasks that are then simple rake dependencies. However, the
>>> conversion is not generic enough. to compile C++ code against a
>>> dependency
>>> one needs 2 paths: a folder containing headers and another containing
>>> libraries. To put this in a repository, these need to be packaged into
>>> one
>>> file. To use after pulling from the repository, one needs to unpack. So a
>>> task representing a repository artifact is in fact an unzip task, that
>>> depends on the 'Artifact' task to pull the package from a remote
>>> repository.
>>>
>>
>> Let's take Java for example, let's say we have a task that depends on
>> the contents of another WAR.  Specifically the classes (in
>> WEB-INF/classes) and libraries (WEB-INF/lib).  A generic unzipping
>> artifact won't help much, you'll get the root path which is useless.
>> You need the classes path for one, and each file in the lib (pointing
>> to the directory itself does nothing interesting).  It won't work with
>> EAR either, when you unzip those, you end up with a WAR which you need
>> to unzip again.
>>
>> But this hypothetical task that uses WAR could be smarter.  It
>> understands the semantics of the packages it uses, and all these
>> packages follow a common convention, so it only needs to unpack the
>> portions of the WAR it cares about, it knows how to construct the
>> relevant paths, one to class and one to every JAR inside the lib
>> directory.
>>
>> I think the same analogy applies to C packages.  If by convention you
>> always use include and lib, you can unpack only the portion of the
>> package you need, find the relevant paths and use them appropriately.
>>
>
> (note: not sure i'm following you here. )

Artifacts by themselves are a generic mechanism for getting packages
into the local repository.  Their only responsibility if the artifact
and its metadata, so a task representing a repository artifact would
only know how to download it.

You can have a separate task that knows how to extract an artifact
task and use it instead, that way you get the unpacking you need, but
not all downloaded artifacts have to be unpacked.


>
> my current implementation creates classes that have methods to retrieve the
> include paths, the library paths and the library names. I don't use the task
> name, since it is useless (as you mentioned). so I have an
> ExtractedRepoArtifact FileTask class that implements these methods by
> relying on the structure of the package ('include' and 'lib' directories),
> it depends on the Artifact class and its action is to extract the artifact.
>
> When given a project dependency, i return the build task which implements
> the artifact methods mentioned above by returning the
> [:source,:main,:include] and [:target, Platform.id, :lib] paths. It also
> allows the user to add include paths (e.g., for generated files) which are
> then both used for compilation and returned by the artifact methods.
>>
>>
>>>
>>> furthermore, when building against another project, there is no need to
>>> pack
>>> and unpack in the repository. one can simply use the artifacts produced
>>> in
>>> the 'build' phase of the other project.
>>>
>>
>> Yes.  Right now it points to the package, which gets invoked and so
>> packs everything, whether you need the packing or not.  You don't,
>> however, have to unpack it, if you know the packaging type you can be
>> smarter and go directly to the source.
>>
>
> but i don't want to pack if there's no use for it. speed is critical in this
> project, since there's no eclipse to constantly compile code for you, so
> developers need to run the build after each change. having it pack
> unnecessarily wasts time.

One step at a time.  I would worry if we can't do that at all, but if
it's just optimization, we can get to the more problematic issues
first.


>>
>>
>>>
>>> finally, in C++ in many cases you rely on a system library.
>>>
>>> in all cases the resulting dependency is two-fold: on a include dir paths
>>> and on a library paths. note that these do not necessarily reside under a
>>> shared folder. for example, a dependency on another project may depend on
>>> two include folders: one just a folder in the source tree, the other of
>>> generated files in the target directory
>>>
>>> suggestion
>>> -------------------
>>> While usage of Buildr.artifacts is only as a utility method, so one can
>>> easily write his own implementation and use that, I think it will be nice
>>> to
>>> be able to get some reuse.
>>>
>>> * when given a project, use it as is (not 'spec.packages'), or allow it
>>> to
>>> return its artifacts ('spec.artifacts').
>>>
>>
>> Yes.  Except we're missing that whole dependency later (that's
>> something 1.4 will add).  Ideally the project would have dependency
>> lists it can populates (at least compile and runtime), and other
>> projects can get these dependency lists and pick what they want.  So
>> the compile dependency list would be the place to put headers and
>> libraries, without having to package them.  We don't have that right
>> now.
>>
>
> this is the purpose for the 'spec.artifacts' suggestion (that is, an
> 'artifacts' method in Project). maybe need to classify them similarly to my
> suggestion for 'compile', so the Buildr.artifacts method receives a
> 'classifier' argument, whose value can be, for example,  'java' and calls
> 'spec.artifacts(classifier)'. are we on the same page here?

I'm looking at each of your use cases and trying to identify in my mind:
a)  What you can do right now to make it happen.
b)  What, if we added another feature, we should accommodate for.
c)  What new feature we would need for this.

I'm starting with a) because you can get it working right now, it may
not be elegant and not work as fast, but we can get that out of the
way so we can focus about doing the rest.  There are some things we're
planning on changing anyway, so I'm also trying to see if future
changes would address the elegant/fast use cases, I can tell you what
I have in mind, but no code yet to make it happen.  And then identify
anything not addressed by current plans and decide how to support that
directly.


Right now, project.packages is good enough for what you need.  It's an
array of tasks, you can throw any task you want in there and the
dependent project would pick on it.  You don't have to throw ZIP files
in there, you can add a header file or a directory of header files, or
a task that knows it's a directly of header files.

It's inelegant because project.packages is intent to be the list of
things that get installed and released, so it's an "off the label" use
for that part of the API.  But, it will work, and if you just add
things to the end of project.packages, they won't get installed or
released.  So project.packages is that same as project.artifacts, just
with a different name.

Separately, we need (and planning and working on) a smarter dependency
management, which you can populate and anything referencing the
project can access.  It won't be called artifacts but dependencies, it
will do a lot more, and it will be more elegant and documented for
specific use cases like this.


>>
>>
>>>
>>> * if a symbol, recursively call on the spec from the namespace
>>> * if a struct, recursively call
>>> * otherwise, classify the artifact and call a factory method to create
>>> it.
>>> classification can be by packaging (e.g. jar). but actually, i don't have
>>> a
>>> very good idea here. note that for c++, there need to be a way of
>>> defining
>>> an artifact to look in the system for include files and libraries  (maybe
>>> something like 'openssl:system'? - version and group ids are
>>> meaningless).
>>>  * the factory method can create different artifacts. for c++ there would
>>> be
>>> RepositoryArtifact (downloads and unpacks), ProjectArtifact (short
>>> circuit
>>> to the project's target and source directories) and SystemArtifact.
>>>
>>> I think that the use of artifact namespaces can help here as it allows to
>>> create a more verbose syntax for declaring artifacts, while still
>>> allowing
>>> the user to create shorter names for them. (as an example in C++ it will
>>> allow me to add to the artifact the list of flags to use when
>>> compiling/linking with it, assuming they're not inherent to the artifact,
>>> e.g. turn debug on). The factory method receives the artifact definition
>>> (which can actually be defined by each plugin) and decides what to do
>>> with
>>> it.
>>>
>>
>> 1.4 will have a better dependency mechanism, and one thing I looked at
>> is associating meta-data with each dependency.  So perhaps that would
>> address things like compiling/linking flags.
>>
>> > ordering
>> > =========
>> > overview
>> > -------------------
>> > to support jni, one needs to first compile java classes, then run javah
>> > to
>> > generate headers and then compile c code that implements these headers.
>> > so
>> > the javah task should be able to specify it depends on the java compile
>> > task. this can't be by depending on all compile tasks of course or on
>> > 'build'.
>>
>>
>> Alternatively:
>>
>> compile do |task|
>>  javah task.target
>> end
>>
>> This will run javah each time the compiler runs.
>>
>>
>>
>
> but running each time is what i want to avoid. not only do i want to avoid
> the invocation of 'javah', but when invoked it will change the timestamp of
> the generated headers and so many source files will get recompiled.

Rake separates invocation from execution.  Invoking a task tells it to
invoke its prerequisites, then use those to decide if it needs
executing, and if so execute.  Whether you put javah at the end of
compile, or a prerequisite to build, it will get invoked and it should
be smart enough to decide whether there's any work to be done.

But there is a significant difference between the two.  If you add it
to compile, it gets invoked during compilation -- and compilation
implies there's a change to the source code which might lead to change
in the header files -- and that happens as often as is necessary.  If
you put is as prerequisite to build, it only happens when the build
task runs.  If you run rake task, which doesn't run the build task,
you may end up testing the wrong header files.


> note that compiling a C/C++ source file is a much slower process than
> compiling java.
>>>
>>> suggestion
>>> -------------------
>>> when creating a compile task (whose name can be, as in the case of c++,
>>> the
>>> result library name - to allow for dependency checking), also create a
>>> "for
>>> ordering only" task with a symbolic name (e.g., 'java:compile') which
>>> depends on the actual task. then other tasks can depend on that task
>>>
>>
>> And yes, you'll still need that if you want to run the C compiler
>> after the Java compiler, so I think the right thing to do would have
>> separate compile tasks.
>>>
>>> I hope all this makes sense, and I'm looking forward to comments. I
>>> intend
>>> to share the code once I'm finished.
>>>
>>
>> Unfortunately, the last time I wrote C code was over tens years ago,
>> so my rustiness is showing.  I'm sure I missed some points because of
>> that.
>>
>
> I hope I cleared things. I think it is worth investing in C/C++ as it is a
> space where there's still no solutions (that i know of) that handle module
> dependency.

Definitely.

>
> To make sure it is clear, I'm not asking for the buildr team to implement
> C/C++ building, I intend to do that, and have already made a demo of it
> working, but I do want to ask for the infrastructure in buildr to make it
> easier, since currently it looks like a "stepson".

In addition, two things we should look at.

First, find out a good intersection between C/C++ and other languages.
 There may be some changes that are only necessary for C/C++, but
hopefully most of these can be shared across languages, that way we
get better features all around.

Second, make sure we exhausted all our options before making a change.
 If there's another way of doing something, even stop-gap measure
while we cook up a better feature all around, then we have less
changes to worry about.

It's an exercise we did before with Groovy and Scala (earlier versions
were married to Java) and it worked out pretty well.  We started by
not making any changes in Buildr to accommodate it, instead using a
separate task specifically for compiling Scala code that relied on
some hacks and inelegant code to actually work.  Then took the time to
build multi-lingual support out of that.

Assaf

>
> Ittay
>>
>> Assaf
>>
>>
>>
>>>
>>> Thank you,
>>> Ittay
>>>
>>>
>>> Notes:
>>> [1] I don't consider linking a library as packaging. First, the obj files
>>> are not used by themselves as in other languages. Second, packaging is
>>> required to manage dependencies, because in order for project P to be
>>> built
>>> against dependency D, D needs to contain both headers and libraries -
>>> this
>>> is the package.
>>>
>>> --
>>> --
>>> Ittay Dror <it...@gmail.com>
>>>
>>>
>>>
>>>
>
> --
> --
> Ittay Dror <it...@gmail.com>
>
>
>

Re: request for enhancement: compile, package and artifacts support for C++

Posted by Ittay Dror <it...@gmail.com>.
I merged the other email (ordering) and comments. My comments inline

Assaf Arkin wrote:
> On Mon, Jul 28, 2008 at 2:42 AM, Ittay Dror <it...@gmail.com> wrote:
>   
>> Hi,
>>
>> I'm working on adding C++ support to buildr. I already have a prototype that
>> builds libraries and executables in Linux. I'd like to share some of the
>> difficulties I had and request changes to buildr to accommodate C++ more
>> easily. (Right now, I've created parallel route to that of building
>> Java-like code)
>>
>> compile
>> ========
>> overview
>> --------------------
>> the compile method in project returns a CompileTask that is generic and uses
>> a Compiler instance to do the actual compilation. In C++, compilation is
>> also dependency based (.o => .cpp, sometimes precompiling headers). Also,
>> the same code can produce several results (static and shared libraries, oj
>> files with debug, profiling, preprocessor defines turned on and off). [1]
>>
>> there is the 'build' task, which is used as a stub to attach dependencies
>> to.
>>
>> suggestion
>> ---------------------
>> * there should be an array of compile tasks (as in packages)
>> * #compile should delegate the call to a factory method which returns a task
>> (again, as in packages)
>>     
>
> Yes.  And I know a few people just waiting for the change to compile
> multiple things in the same project, so here's another reason for
> adding this feature.
>
> But I have to warn you, it's not as simple as it looks, I took a stab
> at it before and deciding to downscale support to one compiler per
> project.  It's worth doing because a lot of languages would benefit
> from it, but that's also what makes it tricky.  I think it would be
> easier to get C support working without it first, and separately work
> on this feature and then improve C support using it.
>   
How about this: classify compile commands with symbolic names. like 
compile('java') or compile('c++:shared') ? on bootstrap, the different 
extensions can create compile tasks based on directory structure (so the 
Java extension can see that the directory [:source, :main, :java] exists 
and create compile('java') with some default values.

All compile tasks are prerequisits of 'build'

Then 'package :jar' can create a package that depends on 
compile('java'), compile('groovy') or whatever makes sense to put in a 
jar, as long as the compile task exist of course (not to create them if 
they don't) (BTW, I have some issues with the lack of command-query 
separation, normally when using a query method, I wouldn't want a task 
to be created if it doesn't exist)
>
>   
>> * generic pre-requisites (like 'resources') should either be tacked on
>> 'build' (relying on order of prerequisites), or the compile task can be
>> defined to be a composite (that is, from the outside it is a single task,
>> but it can use other tasks to accomplish its job).
>>     
>
> compile already is: resources is a prerequisite for compile, some
> other tasks (e.g. byte code enhancing) are tacked on to compile by
> enhancing it.
>
>   
yes, but the compilation of the java family of languages is one task 
(calling javac), while compiling c++ is several tasks: task per obj file 
and task per link. so there's a chain of tasks already. having a generic 
method receive a task from the factory method and make it depend on 
'resources' won't do, since the lower level tasks should be the ones 
that depend.

of course the factory method can create just one task that does all the 
rest in its action (compile obj files and link), but i do want to use 
tasks for the following reasons:
1. it makes the logic more like make, which will assist acceptance
2. it can use mechanisms in unix compilers to help make. specifically, 
most (if not all) unix compilers have an option to spit out dependencies 
of the source files on headers.
3. it reuses timestamp checking code in rake (and if ever rake 
implements checksum based recompilation)
4. if rake will implement a job execution engine (like -j in make), then 
structuring compilation by tasks will allow it to parallelize the execution.

but, i think the solution is easy: similar to the 'build' "pseudo task", 
i can create a 'compile:prepare' pseudo task that depends on 'resources' 
etc. then, the factory method needs only to depend on 'compile:prepare' 
(the logic is that another extension can then add other things to do 
before compile without needing to change the compile extensions)
>   
>> package & artifacts
>> =========
>> overview
>> ---------------
>> buildr has a cool concept that all dependencies (in 'compile.with') are
>> converted to tasks that are then simple rake dependencies. However, the
>> conversion is not generic enough. to compile C++ code against a dependency
>> one needs 2 paths: a folder containing headers and another containing
>> libraries. To put this in a repository, these need to be packaged into one
>> file. To use after pulling from the repository, one needs to unpack. So a
>> task representing a repository artifact is in fact an unzip task, that
>> depends on the 'Artifact' task to pull the package from a remote repository.
>>     
>
> Let's take Java for example, let's say we have a task that depends on
> the contents of another WAR.  Specifically the classes (in
> WEB-INF/classes) and libraries (WEB-INF/lib).  A generic unzipping
> artifact won't help much, you'll get the root path which is useless.
> You need the classes path for one, and each file in the lib (pointing
> to the directory itself does nothing interesting).  It won't work with
> EAR either, when you unzip those, you end up with a WAR which you need
> to unzip again.
>
> But this hypothetical task that uses WAR could be smarter.  It
> understands the semantics of the packages it uses, and all these
> packages follow a common convention, so it only needs to unpack the
> portions of the WAR it cares about, it knows how to construct the
> relevant paths, one to class and one to every JAR inside the lib
> directory.
>
> I think the same analogy applies to C packages.  If by convention you
> always use include and lib, you can unpack only the portion of the
> package you need, find the relevant paths and use them appropriately.
>   
(note: not sure i'm following you here. )

my current implementation creates classes that have methods to retrieve 
the include paths, the library paths and the library names. I don't use 
the task name, since it is useless (as you mentioned). so I have an 
ExtractedRepoArtifact FileTask class that implements these methods by 
relying on the structure of the package ('include' and 'lib' 
directories), it depends on the Artifact class and its action is to 
extract the artifact.

When given a project dependency, i return the build task which 
implements the artifact methods mentioned above by returning the 
[:source,:main,:include] and [:target, Platform.id, :lib] paths. It also 
allows the user to add include paths (e.g., for generated files) which 
are then both used for compilation and returned by the artifact methods.
>
>   
>> furthermore, when building against another project, there is no need to pack
>> and unpack in the repository. one can simply use the artifacts produced in
>> the 'build' phase of the other project.
>>     
>
> Yes.  Right now it points to the package, which gets invoked and so
> packs everything, whether you need the packing or not.  You don't,
> however, have to unpack it, if you know the packaging type you can be
> smarter and go directly to the source.
>   
but i don't want to pack if there's no use for it. speed is critical in 
this project, since there's no eclipse to constantly compile code for 
you, so developers need to run the build after each change. having it 
pack unnecessarily wasts time.
>   
>> finally, in C++ in many cases you rely on a system library.
>>
>> in all cases the resulting dependency is two-fold: on a include dir paths
>> and on a library paths. note that these do not necessarily reside under a
>> shared folder. for example, a dependency on another project may depend on
>> two include folders: one just a folder in the source tree, the other of
>> generated files in the target directory
>>
>> suggestion
>> -------------------
>> While usage of Buildr.artifacts is only as a utility method, so one can
>> easily write his own implementation and use that, I think it will be nice to
>> be able to get some reuse.
>>
>> * when given a project, use it as is (not 'spec.packages'), or allow it to
>> return its artifacts ('spec.artifacts').
>>     
>
> Yes.  Except we're missing that whole dependency later (that's
> something 1.4 will add).  Ideally the project would have dependency
> lists it can populates (at least compile and runtime), and other
> projects can get these dependency lists and pick what they want.  So
> the compile dependency list would be the place to put headers and
> libraries, without having to package them.  We don't have that right
> now.
>   
this is the purpose for the 'spec.artifacts' suggestion (that is, an 
'artifacts' method in Project). maybe need to classify them similarly to 
my suggestion for 'compile', so the Buildr.artifacts method receives a 
'classifier' argument, whose value can be, for example,  'java' and 
calls 'spec.artifacts(classifier)'. are we on the same page here?
>
>   
>> * if a symbol, recursively call on the spec from the namespace
>> * if a struct, recursively call
>> * otherwise, classify the artifact and call a factory method to create it.
>> classification can be by packaging (e.g. jar). but actually, i don't have a
>> very good idea here. note that for c++, there need to be a way of defining
>> an artifact to look in the system for include files and libraries  (maybe
>> something like 'openssl:system'? - version and group ids are meaningless).
>>  * the factory method can create different artifacts. for c++ there would be
>> RepositoryArtifact (downloads and unpacks), ProjectArtifact (short circuit
>> to the project's target and source directories) and SystemArtifact.
>>
>> I think that the use of artifact namespaces can help here as it allows to
>> create a more verbose syntax for declaring artifacts, while still allowing
>> the user to create shorter names for them. (as an example in C++ it will
>> allow me to add to the artifact the list of flags to use when
>> compiling/linking with it, assuming they're not inherent to the artifact,
>> e.g. turn debug on). The factory method receives the artifact definition
>> (which can actually be defined by each plugin) and decides what to do with
>> it.
>>     
>
> 1.4 will have a better dependency mechanism, and one thing I looked at
> is associating meta-data with each dependency.  So perhaps that would
> address things like compiling/linking flags.
>
> > ordering
> > =========
> > overview
> > -------------------
> > to support jni, one needs to first compile java classes, then run javah to
> > generate headers and then compile c code that implements these headers. so
> > the javah task should be able to specify it depends on the java compile
> > task. this can't be by depending on all compile tasks of course or on
> > 'build'.
>
>
> Alternatively:
>
> compile do |task|
>   javah task.target
> end
>
> This will run javah each time the compiler runs.
>
>
>   
but running each time is what i want to avoid. not only do i want to 
avoid the invocation of 'javah', but when invoked it will change the 
timestamp of the generated headers and so many source files will get 
recompiled.

note that compiling a C/C++ source file is a much slower process than 
compiling java.
>> suggestion
>> -------------------
>> when creating a compile task (whose name can be, as in the case of c++, the
>> result library name - to allow for dependency checking), also create a "for
>> ordering only" task with a symbolic name (e.g., 'java:compile') which
>> depends on the actual task. then other tasks can depend on that task
>>     
>
> And yes, you'll still need that if you want to run the C compiler
> after the Java compiler, so I think the right thing to do would have
> separate compile tasks.
>> I hope all this makes sense, and I'm looking forward to comments. I intend
>> to share the code once I'm finished.
>>     
>
> Unfortunately, the last time I wrote C code was over tens years ago,
> so my rustiness is showing.  I'm sure I missed some points because of
> that.
>   
I hope I cleared things. I think it is worth investing in C/C++ as it is 
a space where there's still no solutions (that i know of) that handle 
module dependency.

To make sure it is clear, I'm not asking for the buildr team to 
implement C/C++ building, I intend to do that, and have already made a 
demo of it working, but I do want to ask for the infrastructure in 
buildr to make it easier, since currently it looks like a "stepson".

Ittay
> Assaf
>
>
>   
>> Thank you,
>> Ittay
>>
>>
>> Notes:
>> [1] I don't consider linking a library as packaging. First, the obj files
>> are not used by themselves as in other languages. Second, packaging is
>> required to manage dependencies, because in order for project P to be built
>> against dependency D, D needs to contain both headers and libraries - this
>> is the package.
>>
>> --
>> --
>> Ittay Dror <it...@gmail.com>
>>
>>
>>
>>     

-- 
--
Ittay Dror <it...@gmail.com>



Re: request for enhancement: compile, package and artifacts support for C++

Posted by Assaf Arkin <ar...@intalio.com>.
On Mon, Jul 28, 2008 at 2:42 AM, Ittay Dror <it...@gmail.com> wrote:
> Hi,
>
> I'm working on adding C++ support to buildr. I already have a prototype that
> builds libraries and executables in Linux. I'd like to share some of the
> difficulties I had and request changes to buildr to accommodate C++ more
> easily. (Right now, I've created parallel route to that of building
> Java-like code)
>
> compile
> ========
> overview
> --------------------
> the compile method in project returns a CompileTask that is generic and uses
> a Compiler instance to do the actual compilation. In C++, compilation is
> also dependency based (.o => .cpp, sometimes precompiling headers). Also,
> the same code can produce several results (static and shared libraries, oj
> files with debug, profiling, preprocessor defines turned on and off). [1]
>
> there is the 'build' task, which is used as a stub to attach dependencies
> to.
>
> suggestion
> ---------------------
> * there should be an array of compile tasks (as in packages)
> * #compile should delegate the call to a factory method which returns a task
> (again, as in packages)

Yes.  And I know a few people just waiting for the change to compile
multiple things in the same project, so here's another reason for
adding this feature.

But I have to warn you, it's not as simple as it looks, I took a stab
at it before and deciding to downscale support to one compiler per
project.  It's worth doing because a lot of languages would benefit
from it, but that's also what makes it tricky.  I think it would be
easier to get C support working without it first, and separately work
on this feature and then improve C support using it.


> * generic pre-requisites (like 'resources') should either be tacked on
> 'build' (relying on order of prerequisites), or the compile task can be
> defined to be a composite (that is, from the outside it is a single task,
> but it can use other tasks to accomplish its job).

compile already is: resources is a prerequisite for compile, some
other tasks (e.g. byte code enhancing) are tacked on to compile by
enhancing it.


> package & artifacts
> =========
> overview
> ---------------
> buildr has a cool concept that all dependencies (in 'compile.with') are
> converted to tasks that are then simple rake dependencies. However, the
> conversion is not generic enough. to compile C++ code against a dependency
> one needs 2 paths: a folder containing headers and another containing
> libraries. To put this in a repository, these need to be packaged into one
> file. To use after pulling from the repository, one needs to unpack. So a
> task representing a repository artifact is in fact an unzip task, that
> depends on the 'Artifact' task to pull the package from a remote repository.

Let's take Java for example, let's say we have a task that depends on
the contents of another WAR.  Specifically the classes (in
WEB-INF/classes) and libraries (WEB-INF/lib).  A generic unzipping
artifact won't help much, you'll get the root path which is useless.
You need the classes path for one, and each file in the lib (pointing
to the directory itself does nothing interesting).  It won't work with
EAR either, when you unzip those, you end up with a WAR which you need
to unzip again.

But this hypothetical task that uses WAR could be smarter.  It
understands the semantics of the packages it uses, and all these
packages follow a common convention, so it only needs to unpack the
portions of the WAR it cares about, it knows how to construct the
relevant paths, one to class and one to every JAR inside the lib
directory.

I think the same analogy applies to C packages.  If by convention you
always use include and lib, you can unpack only the portion of the
package you need, find the relevant paths and use them appropriately.


> furthermore, when building against another project, there is no need to pack
> and unpack in the repository. one can simply use the artifacts produced in
> the 'build' phase of the other project.

Yes.  Right now it points to the package, which gets invoked and so
packs everything, whether you need the packing or not.  You don't,
however, have to unpack it, if you know the packaging type you can be
smarter and go directly to the source.

>
> finally, in C++ in many cases you rely on a system library.
>
> in all cases the resulting dependency is two-fold: on a include dir paths
> and on a library paths. note that these do not necessarily reside under a
> shared folder. for example, a dependency on another project may depend on
> two include folders: one just a folder in the source tree, the other of
> generated files in the target directory
>
> suggestion
> -------------------
> While usage of Buildr.artifacts is only as a utility method, so one can
> easily write his own implementation and use that, I think it will be nice to
> be able to get some reuse.
>
> * when given a project, use it as is (not 'spec.packages'), or allow it to
> return its artifacts ('spec.artifacts').

Yes.  Except we're missing that whole dependency later (that's
something 1.4 will add).  Ideally the project would have dependency
lists it can populates (at least compile and runtime), and other
projects can get these dependency lists and pick what they want.  So
the compile dependency list would be the place to put headers and
libraries, without having to package them.  We don't have that right
now.


> * if a symbol, recursively call on the spec from the namespace
> * if a struct, recursively call
> * otherwise, classify the artifact and call a factory method to create it.
> classification can be by packaging (e.g. jar). but actually, i don't have a
> very good idea here. note that for c++, there need to be a way of defining
> an artifact to look in the system for include files and libraries  (maybe
> something like 'openssl:system'? - version and group ids are meaningless).
>  * the factory method can create different artifacts. for c++ there would be
> RepositoryArtifact (downloads and unpacks), ProjectArtifact (short circuit
> to the project's target and source directories) and SystemArtifact.
>
> I think that the use of artifact namespaces can help here as it allows to
> create a more verbose syntax for declaring artifacts, while still allowing
> the user to create shorter names for them. (as an example in C++ it will
> allow me to add to the artifact the list of flags to use when
> compiling/linking with it, assuming they're not inherent to the artifact,
> e.g. turn debug on). The factory method receives the artifact definition
> (which can actually be defined by each plugin) and decides what to do with
> it.

1.4 will have a better dependency mechanism, and one thing I looked at
is associating meta-data with each dependency.  So perhaps that would
address things like compiling/linking flags.

> I hope all this makes sense, and I'm looking forward to comments. I intend
> to share the code once I'm finished.

Unfortunately, the last time I wrote C code was over tens years ago,
so my rustiness is showing.  I'm sure I missed some points because of
that.

Assaf


>
> Thank you,
> Ittay
>
>
> Notes:
> [1] I don't consider linking a library as packaging. First, the obj files
> are not used by themselves as in other languages. Second, packaging is
> required to manage dependencies, because in order for project P to be built
> against dependency D, D needs to contain both headers and libraries - this
> is the package.
>
> --
> --
> Ittay Dror <it...@gmail.com>
>
>
>

Re: request for enhancement: compile, package and artifacts support for C++

Posted by Assaf Arkin <ar...@intalio.com>.
On Mon, Jul 28, 2008 at 5:18 AM, Ittay Dror <it...@gmail.com> wrote:
> another issue, inline
>
> Ittay Dror wrote:
>>
>> Hi,
>>
>> I'm working on adding C++ support to buildr. I already have a prototype
>> that builds libraries and executables in Linux. I'd like to share some of
>> the difficulties I had and request changes to buildr to accommodate C++ more
>> easily. (Right now, I've created parallel route to that of building
>> Java-like code)
>>
>> compile
>> ========
>> overview
>> --------------------
>> the compile method in project returns a CompileTask that is generic and
>> uses a Compiler instance to do the actual compilation. In C++, compilation
>> is also dependency based (.o => .cpp, sometimes precompiling headers). Also,
>> the same code can produce several results (static and shared libraries, oj
>> files with debug, profiling, preprocessor defines turned on and off). [1]
>>
>> there is the 'build' task, which is used as a stub to attach dependencies
>> to.
>>
>> suggestion
>> ---------------------
>> * there should be an array of compile tasks (as in packages)
>> * #compile should delegate the call to a factory method which returns a
>> task (again, as in packages)
>> * generic pre-requisites (like 'resources') should either be tacked on
>> 'build' (relying on order of prerequisites), or the compile task can be
>> defined to be a composite (that is, from the outside it is a single task,
>> but it can use other tasks to accomplish its job).
>>
>> package & artifacts
>> =========
>> overview
>> ---------------
>> buildr has a cool concept that all dependencies (in 'compile.with') are
>> converted to tasks that are then simple rake dependencies. However, the
>> conversion is not generic enough. to compile C++ code against a dependency
>> one needs 2 paths: a folder containing headers and another containing
>> libraries. To put this in a repository, these need to be packaged into one
>> file. To use after pulling from the repository, one needs to unpack. So a
>> task representing a repository artifact is in fact an unzip task, that
>> depends on the 'Artifact' task to pull the package from a remote repository.
>>
>> furthermore, when building against another project, there is no need to
>> pack and unpack in the repository. one can simply use the artifacts produced
>> in the 'build' phase of the other project.
>>
>> finally, in C++ in many cases you rely on a system library.
>>
>> in all cases the resulting dependency is two-fold: on a include dir paths
>> and on a library paths. note that these do not necessarily reside under a
>> shared folder. for example, a dependency on another project may depend on
>> two include folders: one just a folder in the source tree, the other of
>> generated files in the target directory
>>
>> suggestion
>> -------------------
>> While usage of Buildr.artifacts is only as a utility method, so one can
>> easily write his own implementation and use that, I think it will be nice to
>> be able to get some reuse.
>>
>> * when given a project, use it as is (not 'spec.packages'), or allow it to
>> return its artifacts ('spec.artifacts').
>> * if a symbol, recursively call on the spec from the namespace
>> * if a struct, recursively call
>> * otherwise, classify the artifact and call a factory method to create it.
>> classification can be by packaging (e.g. jar). but actually, i don't have a
>> very good idea here. note that for c++, there need to be a way of defining
>> an artifact to look in the system for include files and libraries  (maybe
>> something like 'openssl:system'? - version and group ids are meaningless).
>>  * the factory method can create different artifacts. for c++ there would
>> be RepositoryArtifact (downloads and unpacks), ProjectArtifact (short
>> circuit to the project's target and source directories) and SystemArtifact.
>>
>> I think that the use of artifact namespaces can help here as it allows to
>> create a more verbose syntax for declaring artifacts, while still allowing
>> the user to create shorter names for them. (as an example in C++ it will
>> allow me to add to the artifact the list of flags to use when
>> compiling/linking with it, assuming they're not inherent to the artifact,
>> e.g. turn debug on). The factory method receives the artifact definition
>> (which can actually be defined by each plugin) and decides what to do with
>> it.
>>
> ordering
> =========
> overview
> -------------------
> to support jni, one needs to first compile java classes, then run javah to
> generate headers and then compile c code that implements these headers. so
> the javah task should be able to specify it depends on the java compile
> task. this can't be by depending on all compile tasks of course or on
> 'build'.

Alternatively:

compile do |task|
  javah task.target
end

This will run javah each time the compiler runs.

>
> suggestion
> -------------------
> when creating a compile task (whose name can be, as in the case of c++, the
> result library name - to allow for dependency checking), also create a "for
> ordering only" task with a symbolic name (e.g., 'java:compile') which
> depends on the actual task. then other tasks can depend on that task

And yes, you'll still need that if you want to run the C compiler
after the Java compiler, so I think the right thing to do would have
separate compile tasks.

Assaf

>
>
>
>> I hope all this makes sense, and I'm looking forward to comments. I intend
>> to share the code once I'm finished.
>>
>> Thank you,
>> Ittay
>>
>>
>> Notes:
>> [1] I don't consider linking a library as packaging. First, the obj files
>> are not used by themselves as in other languages. Second, packaging is
>> required to manage dependencies, because in order for project P to be built
>> against dependency D, D needs to contain both headers and libraries - this
>> is the package.
>>
>
> --
> --
> Ittay Dror <it...@gmail.com>
>
>
>

Re: request for enhancement: compile, package and artifacts support for C++

Posted by Ittay Dror <it...@gmail.com>.
another issue, inline

Ittay Dror wrote:
> Hi,
>
> I'm working on adding C++ support to buildr. I already have a 
> prototype that builds libraries and executables in Linux. I'd like to 
> share some of the difficulties I had and request changes to buildr to 
> accommodate C++ more easily. (Right now, I've created parallel route 
> to that of building Java-like code)
>
> compile
> ========
> overview
> --------------------
> the compile method in project returns a CompileTask that is generic 
> and uses a Compiler instance to do the actual compilation. In C++, 
> compilation is also dependency based (.o => .cpp, sometimes 
> precompiling headers). Also, the same code can produce several results 
> (static and shared libraries, oj files with debug, profiling, 
> preprocessor defines turned on and off). [1]
>
> there is the 'build' task, which is used as a stub to attach 
> dependencies to.
>
> suggestion
> ---------------------
> * there should be an array of compile tasks (as in packages)
> * #compile should delegate the call to a factory method which returns 
> a task (again, as in packages)
> * generic pre-requisites (like 'resources') should either be tacked on 
> 'build' (relying on order of prerequisites), or the compile task can 
> be defined to be a composite (that is, from the outside it is a single 
> task, but it can use other tasks to accomplish its job).
>
> package & artifacts
> =========
> overview
> ---------------
> buildr has a cool concept that all dependencies (in 'compile.with') 
> are converted to tasks that are then simple rake dependencies. 
> However, the conversion is not generic enough. to compile C++ code 
> against a dependency one needs 2 paths: a folder containing headers 
> and another containing libraries. To put this in a repository, these 
> need to be packaged into one file. To use after pulling from the 
> repository, one needs to unpack. So a task representing a repository 
> artifact is in fact an unzip task, that depends on the 'Artifact' task 
> to pull the package from a remote repository.
>
> furthermore, when building against another project, there is no need 
> to pack and unpack in the repository. one can simply use the artifacts 
> produced in the 'build' phase of the other project.
>
> finally, in C++ in many cases you rely on a system library.
>
> in all cases the resulting dependency is two-fold: on a include dir 
> paths and on a library paths. note that these do not necessarily 
> reside under a shared folder. for example, a dependency on another 
> project may depend on two include folders: one just a folder in the 
> source tree, the other of generated files in the target directory
>
> suggestion
> -------------------
> While usage of Buildr.artifacts is only as a utility method, so one 
> can easily write his own implementation and use that, I think it will 
> be nice to be able to get some reuse.
>
> * when given a project, use it as is (not 'spec.packages'), or allow 
> it to return its artifacts ('spec.artifacts').
> * if a symbol, recursively call on the spec from the namespace
> * if a struct, recursively call
> * otherwise, classify the artifact and call a factory method to create 
> it. classification can be by packaging (e.g. jar). but actually, i 
> don't have a very good idea here. note that for c++, there need to be 
> a way of defining an artifact to look in the system for include files 
> and libraries  (maybe something like 'openssl:system'? - version and 
> group ids are meaningless).
>  * the factory method can create different artifacts. for c++ there 
> would be RepositoryArtifact (downloads and unpacks), ProjectArtifact 
> (short circuit to the project's target and source directories) and 
> SystemArtifact.
>
> I think that the use of artifact namespaces can help here as it allows 
> to create a more verbose syntax for declaring artifacts, while still 
> allowing the user to create shorter names for them. (as an example in 
> C++ it will allow me to add to the artifact the list of flags to use 
> when compiling/linking with it, assuming they're not inherent to the 
> artifact, e.g. turn debug on). The factory method receives the 
> artifact definition (which can actually be defined by each plugin) and 
> decides what to do with it.
>
ordering
=========
overview
-------------------
to support jni, one needs to first compile java classes, then run javah 
to generate headers and then compile c code that implements these 
headers. so the javah task should be able to specify it depends on the 
java compile task. this can't be by depending on all compile tasks of 
course or on 'build'.

suggestion
-------------------
when creating a compile task (whose name can be, as in the case of c++, 
the result library name - to allow for dependency checking), also create 
a "for ordering only" task with a symbolic name (e.g., 'java:compile') 
which depends on the actual task. then other tasks can depend on that task



> I hope all this makes sense, and I'm looking forward to comments. I 
> intend to share the code once I'm finished.
>
> Thank you,
> Ittay
>
>
> Notes:
> [1] I don't consider linking a library as packaging. First, the obj 
> files are not used by themselves as in other languages. Second, 
> packaging is required to manage dependencies, because in order for 
> project P to be built against dependency D, D needs to contain both 
> headers and libraries - this is the package.
>

-- 
--
Ittay Dror <it...@gmail.com>