You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ant.apache.org by "Craeg K. Strong" <cs...@arielpartners.com> on 2001/07/19 01:15:05 UTC

[SUBMIT] New task, a task to manage arbitrary dependencies between files

Enclosed please find the code and docs for a new proposed Ant task: 
<dependset>,
a task to manage arbitrary dependencies between files.

DESCRIPTION

The dependset task compares a set of source files with a set of target 
files. If any of the source files is more recent than any of the target 
files, all of the target files are removed.

Source files and target files are specified via nested FileSets. 
 Arbitrarily many source and target filesets may be specified, but at 
least one of each is required.

DependSet is useful to capture dependencies that are not or cannot be 
determined algorithmically. For example, the <style> task only compares 
the source XML file and XSLT stylesheet against the target file to 
determined whether to restyle the source. Using dependset you can extend 
this dependency checking to include a DTD or XSD file as well as other 
stylesheets imported by the main stylesheet.

IMPLEMENTATION

There are many different options for how this could be implemented. 
 Since I am so lazy :-)  I decided
to reuse all of that wonderful functionality in the SourceFileScanner 
class.   It fits in principle--
since SourceFileScanner compares two sets of files and returns a 
restricted subset.  

The problem is that I needed to invert the logic in SourceFileScanner.   
The comparison algorithm is hardcoded
in SourceFileScanner to look for newer files, when in this case I 
actually want older files.  

If this were my own private project, I would have probably generalized 
SourceFileScanner and made two subclasses.
However, I wanted to avoid breaking backward compatibility.   Therefore, 
I chose to parameterize SourceFileScanner's
comparison algorithm.  By default it works as before, but you can invert 
it by choosing a different FileComparator
a la Strategy pattern.

You will notice that I also had to enhance MergingMapper slightly to 
accomodate multiple files.  This tiny change
does not break backward compatibility and (I hope) is still in keeping 
with the fundamental concept of the MergingMapper.

I am very interested in feedback on design, implementation, etc.

Thanks,

--Craeg

Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by "Craeg K. Strong" <cs...@arielpartners.com>.
>
>
>>[snip]
>>The dependset task compares a set of source files with a set of target
>>files. If any of the source files is more recent than any of the target
>>files, all of the target files are removed. [snip]
>>
>
>I think this would be cool--at least I've had the mentioned problem with
><style> and had to work around it by writing one hack target which runs
><uptodate>, another which runs the first and then deletes the target file(s)
>if the first reported it was not up-to-date, and then the real target which
>runs both of the preceding before running <style>. Simplifying the idiom here
>would be nice.
>
>-Jesse
>
Exactly so!     But-- there are other cases that would affect many more 
Ant users.  For example,

1) what happens when you change the ant build file?   Absolute 
correctness demands that a change to the
buildfile forces a complete rebuild.  Of course it would be nice to know 
whether that change actually
affects the build in question (or if it is simply a doc change or other 
innocuous).  However, it is better to be
safe than sorry.  

2) What about large property files?     <property 
file="lots-of-global-properties.properties"/>
If this file changes, we would probably like to force a complete rebuild 
as well.  See arguments above.

By judicious use of both the <dependset> task as well as the existing 
<uptodate> and <depend> tasks,
I feel we now can optimally handle all of these types of issues....

--Craeg




Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by Jesse Glick <Je...@netbeans.com>.
"Craeg K. Strong" wrote:
> [snip]
> The dependset task compares a set of source files with a set of target
> files. If any of the source files is more recent than any of the target
> files, all of the target files are removed. [snip]

I think this would be cool--at least I've had the mentioned problem with
<style> and had to work around it by writing one hack target which runs
<uptodate>, another which runs the first and then deletes the target file(s)
if the first reported it was not up-to-date, and then the real target which
runs both of the preceding before running <style>. Simplifying the idiom here
would be nice.

-Jesse

-- 
Jesse Glick   <ma...@netbeans.com>
NetBeans, Open APIs  <http://www.netbeans.org/>
tel (+4202) 3300-9161 Sun Micro x49161 Praha CR

Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by Stefan Bodewig <bo...@apache.org>.
On Fri, 03 Aug 2001, Craeg K. Strong <cs...@arielpartners.com>
wrote:

> OK-- Just to confirm... your preference is for a new DataType rather
> than extending FileSets to support explicitly named files,

Correct.

> DependSet will support an arbitrary number of filesets and
> filelists, for both source and target files.

Sounds good.

Stefan

Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by "Craeg K. Strong" <cs...@arielpartners.com>.
Stefan Bodewig wrote:

>>- you could create a new DataType of ExplicitFileList or something
>>that accepted no wildcards
>>
>Yep.
>
OK-- Just to confirm... your preference is for a new DataType rather 
than extending FileSets to
support explicitly named files, something like:

<fileset dir="bar">
  <patternset includes="**/*.html" excludes="not-me.html">
   <namedfile file="blat.inc"/>
</fileset>

I understand now that FileSets are really filters, not lists of files. 
 I therefore propose to create
a new data type called FileList.  For now, FileList will simply support 
a comma-separated list of
files.  I will send a patch for both the FileSet and new FileList 
documentation to try to make the distinction
between the two more clear.

DependSet will support an arbitrary number of filesets and filelists, 
for both source and target files.

This means that an example usage of <dependset> would look something 
like the following:

<dependset>
   <srcfileset dir="buildfiles" includes="*.xml"/>
   <srcfilelist dir="dtdfiles" files="foo.dtd,bar.dtd"/>
   <targetfileset dir="${basedir}" includes="**/*.html"/>
   <targetfilelist dir="other-dir" files="one.html, two.html"/>
</dependset>

--Craeg


Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by Stefan Bodewig <bo...@apache.org>.
On Thu, 02 Aug 2001, Craeg K. Strong <cs...@arielpartners.com>
wrote:

> This is why you mentioned the possibility of including both FileSets
> and FileLists, is it?   Now I get it.

Good ;-)

> - you could have some sort of option that would make
> DirectoryScanner return explicitly-named files even if they did not
> exist

Not DirectoryScanner's responsibilty IMHO, it is supposed to discover
files that follow certain patterns - not to list arbitrary files.

> - you could create a new DataType of ExplicitFileList or something
> that accepted no wildcards

Yep.

> - you could use FileSets, but have them behave a little differently
> in the case of <dependset>

I wouldn't like that too much either, but it is not without precedent,
see <zipfileset> and <tarfileset>, although they just "decorate" plain
filesets.

Stefan

Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by "Craeg K. Strong" <cs...@arielpartners.com>.
>
>
>> But I see your point of using a fileset for target files as well,
>> there are situations where you don't know all the names of the target
>> files or it would be cumbersome to list them.
>>
>> Maybe we'd need a targetfileset and a targetfilelist or similar, where
>> a missing file from the targetfilelist would be treated the same way
>> as an out-of-date target file?
>>
Oh-oh.  I just carefully read the source code for DirectoryScanner and 
noticed that
using this class in the usual way:

DirectoryScanner ds = sourceFileSet.getDirectoryScanner(project);
String[] sourceFiles = ds.getIncludedFiles();

will not return files that do not exist even if they were explicitly 
named like so:

<srcfiles dir="foo" includes="this-file-does-not-exist.xml"/>    

Holy incorrect assumptions, batman!

This is why you mentioned the possibility of including both FileSets and 
FileLists, is it?   Now I get it.
OK, this being the case, how to proceed?    There are several options I 
can imagine:

- you could have some sort of option that would make DirectoryScanner 
return explicitly-named
    files even if they did not exist

- you could create a new DataType of ExplicitFileList or something that 
accepted no wildcards

- you could use FileSets, but have them behave a little differently in 
the case of <dependset>

I guess I am leaning towards option "B," based on the principle of least 
surprise...

Something like:

<dependset missingSource="outofdate">
  <srcfile dir="bar" file="common.xsl"/>
   <srcfile dir="bar" file="tables.xsl"/>
  <srcfiles dir="foo" includes="*.xml"/>
   <targetfiles dir="mumble" includes="**/*.html"/>
    <targetfile dir="blat" file="generate-me.txt"/>
</dependset>

...thoughts?

--Craeg





Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by "Craeg K. Strong" <cs...@arielpartners.com>.
>
>
>Let's take a silly example - my <style> (using redirect extensions or
>something) generates two .html files - the name of one of them will be
>passed in as a parameter, something like
>
><style in="source.xml" 
>       out="output1.html"
>       style="style-with-redirect.xsl">
>  <param name="other-file"
>         value="${a-property-only-known-at-build-time}" />
></style>
>
<snip/>

>- maybe output1.html contains a
>link to the second file?  If any target file is out of date with
>respect to a given source, chances are high that all of them are IMHO.
>
I liked your example-- point taken.  I will modify the code and docs to 
delete ALL targetfiles
if ANY are out of date.

>But I see your point of using a fileset for target files as well,
>there are situations where you don't know all the names of the target
>files or it would be cumbersome to list them.
>
>Maybe we'd need a targetfileset and a targetfilelist or similar, where
>a missing file from the targetfilelist would be treated the same way
>as an out-of-date target file?
>
I see your point here as well.  If someone has gone to all the trouble 
to explicitly name target files as in

<targetfiles dir="mumble" includes="foo.html,bar.html,baz.html"/>

rather than simply

<targetfiles dir="mumble" includes="*.html"/>

Then if one of them does not exist, ALL should be removed in the 
interest of correctness.  I will try
to make a special note of this intricacy somehow in the docs...

>>A real issue is what to do when one or more SOURCES (not targets)
>>doesn't exist?  In the example above, that would mean that a
>>buildfile does not exist.  This is not so preposterous when you
>>realize that I might be generating my buildfile using XSLT or
>>whatnot.
>>
>Then place the dependset task after the one that generates the build
>file?
>
Yes in general I think you are right-- you would place the call to 
<dependset> in logical order
-after- the rule that generates the buildfile or common XSLT include or 
etc...  but see below.

>>Other times you might want this situation to be interpreted as if
>>the target files are NOT out of date.  For example, you might have a
>>boilerplate <dependset> rule that includes "*.dtd".  But one of your
>>projects does not yet happen to have any DTD files.  Setting
>>ignore=true means that this will not result in any target files
>>being removed.
>>
>Searching for sources that have an out-of-date target instead of
>targets that are out of date would eliminate this problem, you
>wouldn't even consider the *.dtd part.
>
I don't understand your argument here.  If <srcfiles> specifies "*.dtd" 
and there aren't any dtd files,
I wouldn't consider it either way.  The problem arises when <srcfiles> 
specifies an _explicit_
file like "foo.dtd" but "foo.dtd" does not exist.  (I should have used 
that for my earlier example...)

I still think that you may want to give the buildfile writer a choice in 
how to handle this situation.
n'est-ce pas?

--Craeg


Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by Stefan Bodewig <bo...@apache.org>.
On Fri, 27 Jul 2001, Craeg K. Strong <cs...@arielpartners.com>
wrote:

> an example might help.  
>
> <dependset>
>    <srcfiles dir="bar" includes="my-buildfile.xml"/>
>    <targetfiles dir="foo" includes = "**/*.html"/>
>    <targetfiles dir="baz" includes="**/*.class"/>
> </dependset>
> 
> Here we want to ensure that our java and XML source files get
> re-compiled if the buildfile changes.
> 
> If any of the HTML or class files does not exist yet, no problem--
> they will be generated in the usual way using <javac> <style> or
> whatever.

In your case, but what if the task doesn't know it will have to run as
it cannot compute the specific missing file?

Let's take a silly example - my <style> (using redirect extensions or
something) generates two .html files - the name of one of them will be
passed in as a parameter, something like

<style in="source.xml" 
       out="output1.html"
       style="style-with-redirect.xsl">
  <param name="other-file"
         value="${a-property-only-known-at-build-time}" />
</style>

This style task won't run if it gets invoked in two consecutive builds
with different values of the property (as the task doesn't know about
the second file).  If you use a targetfiles-fileset, you won't catch
that either.

> If any of the HTML or class files is NEWER than the buildfile, I
> don't need to remove them.

Also not true in my scenario above - maybe output1.html contains a
link to the second file?  If any target file is out of date with
respect to a given source, chances are high that all of them are IMHO.

But I see your point of using a fileset for target files as well,
there are situations where you don't know all the names of the target
files or it would be cumbersome to list them.

Maybe we'd need a targetfileset and a targetfilelist or similar, where
a missing file from the targetfilelist would be treated the same way
as an out-of-date target file?

I still would remove all targets if one is out-of-date.

> The reason I use a fileset for the sources and not just a list of
> files is for convenience.

No argument here.

> A real issue is what to do when one or more SOURCES (not targets)
> doesn't exist?  In the example above, that would mean that a
> buildfile does not exist.  This is not so preposterous when you
> realize that I might be generating my buildfile using XSLT or
> whatnot.

Then place the dependset task after the one that generates the build
file?

> Other times you might want this situation to be interpreted as if
> the target files are NOT out of date.  For example, you might have a
> boilerplate <dependset> rule that includes "*.dtd".  But one of your
> projects does not yet happen to have any DTD files.  Setting
> ignore=true means that this will not result in any target files
> being removed.

Searching for sources that have an out-of-date target instead of
targets that are out of date would eliminate this problem, you
wouldn't even consider the *.dtd part.

Stefan

Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by "Craeg K. Strong" <cs...@arielpartners.com>.
>
>
>this task (which will be
>a great built-in task IMHO).
>
gracias

>>DESCRIPTION
>>
>>The dependset task compares a set of source files with a set of
>>target files. If any of the source files is more recent than any of
>>the target files, all of the target files are removed.
>>
>
>Yes, this is the way it would be most useful. 8-) But the way you've
>implemented it, seems to be backwards to me.
>
>You compare each file from the target filesets to all files from the
>source filesets with some special treatment for target files that
>don't have a corresponding source.  This will not capture missing
>target files.  You then determine all target files that are out of
>date with respect to any source file and delete these files.
>
>To me the more logical approach would be to compare source files to
>target files, not giving targets as filesets but as a list of
>filenames for the targets and remove all targets (not just the
>outdated ones) if any of the source files is newer than the target
>files or a target file doesn't exist?
>
The logic _is_ a bit confusing.   I had to wrestle with this a bit 
myself.  I believe
an example might help.  

<dependset>
   <srcfiles dir="bar" includes="my-buildfile.xml"/>
   <targetfiles dir="foo" includes = "**/*.html"/>
    <targetfiles dir="baz" includes="**/*.class"/>
</dependset>

Here we want to ensure that our java and XML source files get 
re-compiled if the buildfile changes.

If any of the HTML or class files does not exist yet, no problem-- they 
will be generated in the
usual way using <javac> <style> or whatever.

If any of the HTML or class files is NEWER than the buildfile, I don't 
need to remove them.  They
are OK.

If any of the HTML or class files is OLDER than the buildfile, nuke 'em.

So-- I pass the target filesets to the SourceFileScanner as the files() 
param.
I pass the source fileset ("my-buildfile.xml") to the SourceFileScanner 
using a MergingMapper.
I tell the SourceFileScanner to restrict using the OLDER-THAN comparator.
I get back the set of files that are OLDER than the buildfile.    I 
delete them.

The reason I use a fileset for the sources and not just a list of files 
is for convenience.
- you might have a whole directory of buildfiles, sub-buildfiles, 
included buildfiles, etc.
- you might have a set of included XSLT "common" transformers
- you might have a whole directory of DTD files
- etc.

Using FileSets enables you to easily account for all of these in a 
single invocation of <dependset>.
In fact, this just about sums up our own build environment :-)

***********************

A real issue is what to do when one or more SOURCES (not targets) 
doesn't exist?  
In the example above, that would mean that a buildfile does not exist.  
This is not so preposterous when you realize that I might be generating
my buildfile using XSLT or whatnot.

This is where I believe you need a choice: to ignore or not to ignore.

NOT IGNORE:  Sometimes you want this situation to be interpreted as 
though ALL target files
are out of date.  For example, there is some intermediate file 
(copyright notice?) that has not been
generated yet but needs to be incorporated in all target files-- so kill 
em all...

IGNORE:  Other times you might want this situation to be interpreted as 
if the target files
are NOT out of date.  For example, you might have a boilerplate 
<dependset> rule that includes
"*.dtd".  But one of your projects does not yet happen to have any DTD 
files.  Setting ignore=true means
that this will not result in any target files being removed.

>>Therefore, I chose to parameterize SourceFileScanner's comparison
>>algorithm.  By default it works as before, but you can invert it by
>>choosing a different FileComparator a la Strategy pattern.
>>
>Even though you wouldn't need it if you followed my route, I think
>this is a useful extension and a better approach than subclassing
>IMHO.  We may need to compare other things than timestamps in the
>future ...
>
agreed

>>You will notice that I also had to enhance MergingMapper slightly to
>>accomodate multiple files.
>>
>Here I may have created a new Mapper type instead, but it works for me
>both ways.
>
agreed.  Does anyone in ant-dev land have a strong opinion about this?

>Stefan
>
Craeg



Re: [SUBMIT] New task, a task to manage arbitrary dependencies between files

Posted by Stefan Bodewig <bo...@apache.org>.
I have some issues with the implementation of this task (which will be
a great built-in task IMHO).

On Wed, 18 Jul 2001, Craeg K. Strong <cs...@arielpartners.com>
wrote:

> DESCRIPTION
> 
> The dependset task compares a set of source files with a set of
> target files. If any of the source files is more recent than any of
> the target files, all of the target files are removed.

Yes, this is the way it would be most useful. 8-) But the way you've
implemented it, seems to be backwards to me.

You compare each file from the target filesets to all files from the
source filesets with some special treatment for target files that
don't have a corresponding source.  This will not capture missing
target files.  You then determine all target files that are out of
date with respect to any source file and delete these files.

To me the more logical approach would be to compare source files to
target files, not giving targets as filesets but as a list of
filenames for the targets and remove all targets (not just the
outdated ones) if any of the source files is newer than the target
files or a target file doesn't exist?

Maybe I'm a little bit wrong wired this morning and totally wrong.

> Therefore, I chose to parameterize SourceFileScanner's comparison
> algorithm.  By default it works as before, but you can invert it by
> choosing a different FileComparator a la Strategy pattern.

Even though you wouldn't need it if you followed my route, I think
this is a useful extension and a better approach than subclassing
IMHO.  We may need to compare other things than timestamps in the
future ...

> You will notice that I also had to enhance MergingMapper slightly to
> accomodate multiple files.

Here I may have created a new Mapper type instead, but it works for me
both ways.

Stefan