You are viewing a plain text version of this content. The canonical link for it is here.
Posted to m2-dev@maven.apache.org by Brett Porter <br...@apache.org> on 2005/01/06 10:57:29 UTC

#1 thoughts on the goal chain

Hi,

This and the following documents are a bunch of thoughts I brainstormed 
the other morning. I've since spent more time thinking about them and 
cleaning them up.
The earlier ones I think are the more solid ones, and there are 
definitely some dud ideas in here, but anything people agree on I'd be 
happy to document properly and look towards implementing.

This particular one discusses the goal chain, thinking through the 
different use cases for adding new plugins down the track.

Cheers,
Brett

Re: #1 thoughts on the goal chain

Posted by "Mark H. Wilkinson" <mh...@kremvax.net>.
On Wed, 2005-01-12 at 20:55 +1100, Brett Porter wrote:
> Mark H. Wilkinson wrote:
> > [virtual targets]
> Isn't this what goals are now? What are we losing by using only them to 
> build the dependencies?

Hmm; kind of, I suppose. I think you're suggesting that we have a two-
level dependency system: one set of dependencies between goals as now,
used to build the goal execution chain, and a second set of file
dependencies on a per-goal basis, used to work out whether a goal should
be executed or not. This would be similar to the way that ant (and
maven) work currently, but with the file dependency checking refactored
from the goals back into the maven core.

I guess what I'm saying is that once you've pushed the file dependency
checking into the core, it seems to make sense to have a single
dependency graph rather than a two-level hierarchy of smaller graphs.
>>From a functional point of view though, I think we're getting pretty
close to the two approaches being equivalent.

I have the feeling that we might wind up with inconsistencies between
the two sets of dependencies specified by the goals. For example, the
goal dependencies might say that jar depends on javac, but the files
that must be up-to-date for jar to run don't include all those specified
as being touched by javac. I don't know how significant that might turn
out to be though.

> Me too, but there's no way I'm typing that. I'd prefer the current m2 
> surefire:test -Dtestcase=org.codehaus.plexus.PlexusTestCaseTest.
> It's roughly the same length, but easier to read and unambiguous.

True. But what's the equivalent command for regenerating
target/docs/mail-lists.html, say? I think being able to regenerate any
file produced during the build process using a consistent command syntax
is a tidier solution.

>  In your example,
> 
> m2 target/surefire-reports/org.codehaus.plexus.PlexusTestCaseTest.*xml*
> 
> would do the same thing (and generate both the txt and xml versions 
> again most likely). I think that kind of amiguity is harmful.

Yes, it would generate both files. I don't really see that as ambiguity
though: building the .xml file also builds the .txt file as a side
effect. We're not saying that running that command will touch only the
file mentioned on the command line, just that we'll do the minimum work
necessary to bring that file up to date. Other files may be changed in
the process.

> I'm not sure how passionately you want to stick to integrating file 
> dependencies into the current mojo dependency scheme, but I Can think of 
> a couple of things that would be helpful to get out from this:
> 1) apply timestamp checking rules to the existing goal based resolution 
> to achieve the benefits you are aiming for
> 2) look for any specific flaws in the current resolution docs based on 
> this argument so alternatives can be investigated
> 
> WDYT?

Well, my aim for bringing this up in the first place was to see whether
there was any interest in the approach. Given that it seems like there
is, I'll have a bit more of a dig to see what can be done. I'll probably
look at prototyping the file-based dependencies and see where the goal-
based and file-based approaches conflict with each other. If nothing
else, that should given us the two-level dependency approach (with
timestamp checking) I mentioned above.

>>From a quick look at goal-resolution.apt I don't see any obvious flaws
with the current system. That said, the goal model has always seemed to
be overly complex to me: with pre-goals, post-goals and prerequisites
there seems to be more types of dependency than I'd expect. As a result
it seems difficult to know how to add new steps to the build process, or
to predict how changes will affect things. This is probably what led me
to thinking about alternative dependency strategies.

-Mark.


Re: #1 thoughts on the goal chain

Posted by Brett Porter <br...@apache.org>.
Mark H. Wilkinson wrote:

>On Tue, 2005-01-11 at 20:20 +1100, Brett Porter wrote:
>  
>
>This is typically handled (in Unix make) by using virtual (or 'phony')
>targets: targets that exist in the dependency graph, but which don't map
>to real files in the file system. 
>
Isn't this what goals are now? What are we losing by using only them to 
build the dependencies?
Don't get me wrong - I think having each goal report on its required 
files and modified files is fantastic, but I don't think it should be 
the basis for connecting the mojos.

>There are two possible ways to handle this that I'm aware of:
>     1. The configuration information is held in a file somewhere, so
>        include that file (or files) in the dependency graph. At the
>        worst, touching ~/build.properties will cause everything to
>        rebuild.
>     2. Ignore configuration information held in files. Make the users
>        aware that changing configuration will normally require a build
>        from clean afterwards.
>  
>
I agree. I think we can afford to start at (2) and work at implementing (1).

>I don't imagine
>that this would be a problem for maven as I'm not suggesting that a
>representation of the dependency graph become part of the project source
>tree.
>  
>
Correct. There will probably be a static, installation wide cache of the 
base DAG, but it does get modified project to project, and caching that 
is probably not worthwhile.

>For the end users? I would expect that the standard plugins would
>include a number of convenience virtual targets. In the example in my
>previous email I included a 'jar:jar' virtual target that depended on
>the project's artifact being up to date. Normal operation for the end
>user will likely use the short names they are using today.
>  
>
I don't know that its good to leave such a wide gap between end users 
and developers. Users should be able to easily write plugins, and 
developers should be using the system day to day as a user would.

>>>     * The user can create any intermediate target. For example, you can
>>>       'm2 target/surefire-reports/org.codehaus.plexus.PlexusTestCaseTest.txt'
>>>      
>>>
>>I'm not sure this is entirely useful, sorry.
>>    
>>
>
>Our experiences must differ then - I've often wanted to regenerate a
>single report in the generated site over and over again (e.g. getting
>the checkstyle errors out of a source tree).
>  
>
Me too, but there's no way I'm typing that. I'd prefer the current m2 
surefire:test -Dtestcase=org.codehaus.plexus.PlexusTestCaseTest.
It's roughly the same length, but easier to read and unambiguous. In 
your example,

m2 target/surefire-reports/org.codehaus.plexus.PlexusTestCaseTest.*xml*

would do the same thing (and generate both the txt and xml versions 
again most likely). I think that kind of amiguity is harmful.

>Agreed - if antrl were required for the build it should be included in
>the POM. Perhaps that was a bad example. There are some other use cases
>where being able to just name a plugin on the command line would be
>useful:
>  
>
I think those given are all useful, and definitely are allowed to be run 
from the command line.

I'm not sure how passionately you want to stick to integrating file 
dependencies into the current mojo dependency scheme, but I Can think of 
a couple of things that would be helpful to get out from this:
1) apply timestamp checking rules to the existing goal based resolution 
to achieve the benefits you are aiming for
2) look for any specific flaws in the current resolution docs based on 
this argument so alternatives can be investigated

WDYT?

Thanks,
Brett


Re: #1 thoughts on the goal chain

Posted by "Mark H. Wilkinson" <mh...@kremvax.net>.
On Tue, 2005-01-11 at 20:20 +1100, Brett Porter wrote:
> I'm all for plugins acting appropriately on the correct set of files, 
> and for correctly defining their inputs and outputs. But what about the 
> following situations:
> - a plugin doesn't actually affect any files (scm, deployment, ...)

This is typically handled (in Unix make) by using virtual (or 'phony')
targets: targets that exist in the dependency graph, but which don't map
to real files in the file system. They are always assumed to be out of
date with respect to their dependents, so they will always be built when
the user asks for them to be built. So, jar:deploy would be a virtual
target that depends on jar:jar; the action to bring jar:deploy 'up to
date' would be to deploy the jar file to the remote repository.

Effectively this gets the user to do the decision making about when
virtual targets should be built.

> - the timestamp of the file is not relevant to when a file needs to be 
> processed - for instance configuration has changed that is within the 
> scope of the plugin, not the file itself.

There are two possible ways to handle this that I'm aware of:
     1. The configuration information is held in a file somewhere, so
        include that file (or files) in the dependency graph. At the
        worst, touching ~/build.properties will cause everything to
        rebuild.
     2. Ignore configuration information held in files. Make the users
        aware that changing configuration will normally require a build
        from clean afterwards.
I've seen both approaches used in Unix make build systems. The first
approach tends to work better for more complex builds, and is used by
the GNU automake program. It can be confusing in that environment
because the generation of the Makefile (i.e. the dependency graph) is
included in the Makefile, which is somewhat circular. I don't imagine
that this would be a problem for maven as I'm not suggesting that a
representation of the dependency graph become part of the project source
tree.

> I also don't think that targetting individual artifacts for a build is 
> very easy to use.

For the end users? I would expect that the standard plugins would
include a number of convenience virtual targets. In the example in my
previous email I included a 'jar:jar' virtual target that depended on
the project's artifact being up to date. Normal operation for the end
user will likely use the short names they are using today.

> >      * The user can create any intermediate target. For example, you can
> >        'm2 target/surefire-reports/org.codehaus.plexus.PlexusTestCaseTest.txt'

> I'm not sure this is entirely useful, sorry.

Our experiences must differ then - I've often wanted to regenerate a
single report in the generated site over and over again (e.g. getting
the checkstyle errors out of a source tree).

> >An addition is that the user could explicitly force a plugin to be
> >loaded by referring to one of its targets on the command line. Hence 'm2
> >antlr:translate jar:jar' would force the antrl plugin to load and its
> >dependencies to be added to the dependency graph.
> >
> That's not autonomous enough. A user should be able to pick up any 
> source tree and run "m2 jar:jar" and get a complete jar.
> If antlr:translate were required - they'd get compile errors and 
> confusion first up. The need for antlr must be defined somewhere in the POM.

Agreed - if antrl were required for the build it should be included in
the POM. Perhaps that was a bad example. There are some other use cases
where being able to just name a plugin on the command line would be
useful:
      * When you have no source tree checked out. For example, building
        new project templates, checking out source trees and so on.
      * When you want to apply a tool to a project without integrating
        it into the build. For example, you might download a project and
        run 'm2 clover:report' to have clover do its stuff just that one
        time. If things work out you might then include it in the pom,
        but being able to trigger things from the command line allows
        experimentation.
      * Limiting the amount of work that m2 has to do. I guess most
        maven projects currently have two main goal chains: build the
        jar file and build the web site. If the user says 'm2 jar:jar'
        we don't need to load the site generation plugin, so we don't
        need to process all the extra dependency information.
        Conversely, 'm2 site:generate' would force the site plugin to
        load, pulling in the extra dependencies that it defines.

-Mark.


Re: #1 thoughts on the goal chain

Posted by Brett Porter <br...@apache.org>.
Hi Mark,

Mark H. Wilkinson wrote:

>What I'm thinking of is for m2 to build a dependency graph containing
>all the files used in the build process, in a similar way to the
>venerable Unix make utility. Recast in this model, each plugin would
>declare the set of files that it depends on and the set of files that it
>will create, based on the existing information in the pom and the
>contents of the file system. m2 would then schedule plugin execution
>based on the timestamps of the files in the file system.
>  
>
I'm all for plugins acting appropriately on the correct set of files, 
and for correctly defining their inputs and outputs. But what about the 
following situations:
- a plugin doesn't actually affect any files (scm, deployment, ...)
- the timestamp of the file is not relevant to when a file needs to be 
processed - for instance configuration has changed that is within the 
scope of the plugin, not the file itself.

I also don't think that targetting individual artifacts for a build is 
very easy to use.

>      * The user can create any intermediate target. For example, you can
>        'm2 target/surefire-reports/org.codehaus.plexus.PlexusTestCaseTest.txt'
>  
>
I'm not sure this is entirely useful, sorry.

>I said the timing seemed right to bring this issue up: currently m2
>doesn't do any dependency checking to reduce the amount of work that it
>does (last time i checked, anyway). I can't imagine this situation will
>persist, but the I'm guessing the likely solution will be to add
>dependency checking code to the mojos that seem to require it. Surely a
>better approach would be to pull this basic behaviour into the m2 core
>rather than duplicating it throughout the mojos?
>  
>
I agree - or at least providing it as a service to the mojos so it isn't 
duplicated.

>An addition is that the user could explicitly force a plugin to be
>loaded by referring to one of its targets on the command line. Hence 'm2
>antlr:translate jar:jar' would force the antrl plugin to load and its
>dependencies to be added to the dependency graph.
>  
>
That's not autonomous enough. A user should be able to pick up any 
source tree and run "m2 jar:jar" and get a complete jar.
If antlr:translate were required - they'd get compile errors and 
confusion first up. The need for antlr must be defined somewhere in the POM.

Cheers,
Brett

Re: #1 thoughts on the goal chain

Posted by "Mark H. Wilkinson" <mh...@kremvax.net>.
On Thu, 2005-01-06 at 20:57 +1100, Brett Porter wrote:
> Currently, we have the following series of stages when building a JAR:
> 
>                            jar:jar
>                               |
>         +---------------------+-------------------+
>         |                                         |
>         v                                         |
>   surefire:test                                   |
>         |                                         |
>         +------------+------------+----------+    |
>         |            |            |          |    |
>         v            |            |          v    v
> compiler:testCompile |            |    resources:resources
>         |            |            |
>         v            v            v
>        compiler:compile  resources:testResources
> 
> result:
> compiler:compile -> compiler:testCompile -> resources:resources -> resources:testResources -> surefire:test -> jar:jar
> 
> in stages:
> compile -> resources -> test -> jar
> (this is the plugin chain, not necessarily the particular goals)
> 
> Building a WAR:
> compile -> resources -> test -> war
> 
> Building an artifact as determined by type
> <type> -> build
> eg. jar -> build
> 
> Installing an artifact
> build -> install
> 
> Each goal has its inputs (sources/resources/classes directory) and
> outputs (classes directories/jar).

This seems like as good a time as any to bring up a basic issue that I
have with maven (and ant). I realise that I'm questioning fundamental
principles of the design, but the timing seems right given the current
position of m2's development.

Basically, I'm not convinced that specifying dependencies between
plugins is the right model to use for a build system. It works, sure,
but it leaves plugin developers the job of thinking about the actions of
other plugins as though they are side-effects. I'd argue that it's not
particularly transparent to the end users either: I find it difficult to
work out how to extend the maven build system because I don't have a
good overview of how it fits together. Better documentation would help
here (Brett mentions having plugins create graphs showing their
dependencies further on), but an approach based on concrete and
observable artifacts might be more transparent to everyone.

What I'm thinking of is for m2 to build a dependency graph containing
all the files used in the build process, in a similar way to the
venerable Unix make utility. Recast in this model, each plugin would
declare the set of files that it depends on and the set of files that it
will create, based on the existing information in the pom and the
contents of the file system. m2 would then schedule plugin execution
based on the timestamps of the files in the file system.

I think this would be an improvement because it grounds the build system
on concrete things that are visible to the user (files), rather than the
abstract concept of 'goal'. So, restated as file dependencies, the jar
build might become:

  [dependencies from pom]           [test dependencies from pom]
           |             \                   |
           |              \                  |
           |               \                 |
           v                \                v
  src/main/java/**/*.java    \      src/test/java/**/*.java
           |                  \              |
           | java:compile      \             | java:compile
           |                    ----         |
           v                        \        v
  target/classes/**/*.class ------> target/classes/test-classes/**/*.class
           |               \                 |
           | jar:jar        \                | surefire:test
           |                 -------         |
           v                        \        v
  target/${artifactId}.jar <------- target/surefire-reports/*.txt
           |
           |
           |
           v
        jar:jar

Some immediate advantages: 
      * m2 always does the minimum amount of work possible. Only sources
        that have changed get recompiled, for example. This could be a big
        win for large reactored builds.
      * The user can create any intermediate target. For example, you can
        'm2 target/surefire-reports/org.codehaus.plexus.PlexusTestCaseTest.txt'

I said the timing seemed right to bring this issue up: currently m2
doesn't do any dependency checking to reduce the amount of work that it
does (last time i checked, anyway). I can't imagine this situation will
persist, but the I'm guessing the likely solution will be to add
dependency checking code to the mojos that seem to require it. Surely a
better approach would be to pull this basic behaviour into the m2 core
rather than duplicating it throughout the mojos?

> What if we add, say, antlr? The antlr:antlr goal creates sources for a
> parser definition.
> 
> antlr -> compile -> ...

With file dependencies you'd tackle this by adding this to the dependency
graph:

   src/main/antlr/*
           |
           | antrl:translate
           |
           v
   target/antlr-src/**/*.java
           |
           | java:compile
           |
           v
   target/classes/**/*.class

I'd imagine these additional dependencies being declared by the antlr mojo.

> Should the user should just have to declare "I'm using antlr"?

That's the conclusion I'd come to when thinking about the same problem:
with file dependencies you need some way to work out which plugins get
to add dependencies to the dependency graph. We might have a default
list that covers things like java and site generation, but having some
way to say that non-default plugins should be activated seems like the
sensible thing to do.

An addition is that the user could explicitly force a plugin to be
loaded by referring to one of its targets on the command line. Hence 'm2
antlr:translate jar:jar' would force the antrl plugin to load and its
dependencies to be added to the dependency graph.

> What about a plugin that needs to be in between two existing goals?

This one's more tricky: the swizzle plugin is really overriding the
normal behaviour of the modello plugin. Using file dependencies I can
think of a couple of ways to tackle this: we could have some way to tell
the modello plugin that its generated sources need to placed somewhere
different, and then have the swizzle plugin do its work putting the
result where the modello files would have gone originally. Alternatively
the swizzle plugin could provide its own set of dependencies, but we'd
still need some way to say that the swizzle dependencies should be used
in preference to the modello ones.

These examples highlight the need for the plugins that ship with m2 to
implement a logical and extensible build process. I don't think file-
based dependencies remove the need, but I do think they would make it
easier to make the build process extensible. I've done quite a bit of
thinking about how this could be made to work, and skimmed over some
details here, so please ask questions...

-Mark.


RE: #1 thoughts on the goal chain

Posted by Vincent Massol <vm...@pivolis.com>.
On the same topic, another idea (discussed some long time back) is to have a
goalmap (aka goal workflow), i.e. the definition of goal chains. There would
be a default goalmap which maps to the goal chain you describe in your
goalchain.txt document. If the user wants to have a different goalmap, he
would be able to define the flow of goals he wants, possibly inserting an
antlr goal, an xdoclet one, etc.

If we do this, it means that plugins should take no assumptions about what
goals are called before or after each other. But in some cases, they'll need
to know that. The solution could then be to introduce interface goals (this
is what I've tried to do with the caller plugin), so that they can still do
prereqs on those goals.

So in practice the goalmap will be the mapping of real goals on goal
interfaces (same as the mapping of aspects on pointcuts for the AOP
frameworks).

I guess the hard part here is the creation of interface goals as it requires
refactoring to existing plugins. But I guess it can be done and I suspect
this solution is a clean way of doing it. It also allows people to develop
replacement goals (for example, I may want to develop a compiler plugin that
takes groovy code and generate bytecode, and replace the default java plugin
with this one, etc).

I would envision something similar to the Eclipse plugin descriptor, where
the plugin says with extension points it extends.

Thoughts?

Thanks
-Vincent

> -----Original Message-----
> From: Brett Porter [mailto:brett@apache.org]
> Sent: jeudi 6 janvier 2005 10:57
> To: Maven 2 Developers List
> Subject: #1 thoughts on the goal chain
> 
> Hi,
> 
> This and the following documents are a bunch of thoughts I brainstormed
> the other morning. I've since spent more time thinking about them and
> cleaning them up.
> The earlier ones I think are the more solid ones, and there are
> definitely some dud ideas in here, but anything people agree on I'd be
> happy to document properly and look towards implementing.
> 
> This particular one discusses the goal chain, thinking through the
> different use cases for adding new plugins down the track.
> 
> Cheers,
> Brett