You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ant.apache.org by Nicolas Lalevée <ni...@hibnet.org> on 2010/06/14 19:28:17 UTC

A performance issue in 'restrict'

When working on IvyDE's build to get dependencies from Eclipse mirrors, I tried to use resourcelist, as documented in the examples of the doc [1].

But it was very slow because Ant is checking every urls and some mirrors are not very responsive. So I had to workaround and filter the urls to select only the first one and hoping it is available:

<copy todir="${basedir}">
    <first>
        <restrict>
            <resourcelist>
                <!-- get the xml mirror list -->
                <url url="${hudson.download.baseurl}?file=@{dropdir}/@{file}&amp;protocol=http&amp;format=xml" />
                <filterchain> <!-- change the xml into flat url list -->
                    <linecontainsregexp>
                        <regexp pattern="^.*mirror url=&quot;([^&quot;]*)&quot;.*$" />
                    </linecontainsregexp>
                    <tokenfilter>
                        <replaceregex pattern="^.*mirror url=&quot;([^&quot;]*)&quot;.*$" replace="\1" flags="gi" />
                    </tokenfilter>
                    <headfilter lines="1" /> <!-- HACK to force Ant to not check every urls -->
                </filterchain>
            </resourcelist>
            <exists/> <!-- restrict to only responsive mirror--> 
        </restrict>
    </first> <!-- copy the first responsive one -->
    <flattenmapper/> <!-- avoid the useless creation of folders-->
</copy>

I looked into the code, it seems org.apache.tools.ant.types.resources.Restrict is the culprit. It uses BaseResourceCollectionWrapper which loads the entire underlying resource collection.
I searched for the use of BaseResourceCollectionWrapper in Ant and I think that this performance issue may also affect 
org.apache.tools.ant.types.resources.SizeLimitCollection and org.apache.tools.ant.types.resources.Tokens.

I started to think about a fix but it seems non trivial. If I'm not mistaken we would have to implement an iterator which should deal with an optionnal cache and concurrency.
I am also worried about the isFilesystemOnly implementation in Restrict. Should it return true if actually selected resources are files, or return true if the entire set of candidates are files ? In the first case it would make the iterator useless.

Side note: I didn't checked that piece of script in as our Hudson instance doesn't have Ant 1.8 installed. Not yet. I'll ask for it.

Nicolas

[1] http://ant.apache.org/manual/Types/resources.html#resourcelist

PS: I started recently to intensively use loadresource and all the resource related stuff in my build scripts, it is amazingly porwerful ! It reminds me when I was 'pipelining' in cocoon ;)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: A performance issue in 'restrict'

Posted by Nicolas Lalevée <ni...@hibnet.org>.

Le 19 juin 2010 à 15:05, Nicolas Lalevée a écrit :

> 
> Le 15 juin 2010 à 18:38, Matt Benson a écrit :
> 
>> On 6/15/10, Nicolas Lalevée <ni...@hibnet.org> wrote:
>>> On Monday 14 June 2010 20:01:08 Matt Benson wrote:
>>>> Hi, Nicolas:
>>>> 
>>>> [SNIP]
>>>> 
>>>>> I looked into the code, it seems
>>>>> org.apache.tools.ant.types.resources.Restrict is the culprit. It uses
>>>>> BaseResourceCollectionWrapper which loads the entire underlying resource
>>>>> collection.
>>>>> I searched for the use of BaseResourceCollectionWrapper in Ant and I
>>>>> think that this performance issue may also affect
>>>>> org.apache.tools.ant.types.resources.SizeLimitCollection and
>>>>> org.apache.tools.ant.types.resources.Tokens.
>>>>> 
>>>>> I started to think about a fix but it seems non trivial. If I'm not
>>>>> mistaken we would have to implement an iterator which should deal with
>>>>> an
>>>>> optionnal cache and concurrency.
>>>> 
>>>> While it would be possible to reimplement Restrict's internal wrapper
>>>> such that it would dynamically calculate the included resources while
>>>> iterating, I don't see any way to do the same when the *size* of the
>>>> collection is requested...
>>> 
>>> Yep if size is called then the whole collection has to be "fetched".
>>> I just hope that in my use case where I just want the first element of the
>>> restrict, the whole resource collection won't be fetched. And as I keep
>>> looking into the code, it seems that the implementation of
>>> org.apache.tools.ant.types.resources.First does that. So size() shouldn't be
>>> an issue here.
>>> 
>>>>> I am also worried about the isFilesystemOnly implementation in Restrict.
>>>>> Should it return true if actually selected resources are files, or
>>>>> return
>>>>> true if the entire set of candidates are files ? In the first case it
>>>>> would make the iterator useless.
>>>> 
>>>> Typically the implementation of #isFilesystemOnly() in
>>>> BaseResourceCollectionWrapper and BaseResourceCollectionContainer is a
>>>> quick check whether the underlying collection/s is/are known to be
>>>> entirely filesystem-based.  If not, each resource is checked in turn
>>>> until one is found that is *not* filesystem-based.  The reason for the
>>>> distinction is to provide a means of recognizing whether it should be
>>>> possible to handle a given resourcecollection using a task that can
>>>> only work with files.
>>> 
>>> So I am suggesting a little change on the implementation: removing the check
>>> isFilesystemOnly on the underlying resource collection; just keep the
>>> iteration on the actual resources of the underlying collection.
>>> I don't think it will change the semantic of the function, right?
>>> About performance, in my use case it should behave better as the <first>
>>> will
>>> only load one resource in the collection.
>>> I think there could be a performance impact where a resource collection is
>>> known to be file system only, and it is used with a task that cannot support
>>> that. The resources will be loaded but not actually used as the build will
>>> fail.
>>> So it seems a choice between:
>>> * useless iteration on some resources which can have high latency
>>> * useless iteration on some resources which should be on a filsytem and
>>> trigger a failed build.
>>> 
>>> I prefer the second choice. I might not be impartial though ;)
>>> 
>> 
>> I feel like if we get stuck on #isFilesystemOnly(), we're going to
>> lose sight of the important issue here.  This kind of diagnostic
>> method must, by definition, ultimately survey each resource, but
>> checking against the nested collection/s is a valuable optimization.
>> Consider the case where you may wrap a <first> in a <sort> in a
>> <restrict> of some other resourcecollection.  If each of these has to
>> fully iterate, that amounts to greater latency than attempting to
>> delegate that call all the way back to the innermost nested
>> collection, does it not?
> 
> When we use the iterator on the top resource collection, we will only get the necessary resources. <first> will only fetch one resource. Because of <sort>, actually every resource will be loaded, but it is only due to the sort function, which algorithm force use to fetch every resource even if we want only one. So in that case, yes, asking for the underlying collection if it is filesystem only is an optimization. This is the second case I pointed above. The optimization is valid only on structurally filesystem only resource collection.
> 
> Maybe the optimization should be more precise when querying the underlying resource collection. Today there is a check if the underlying collection actually contains filesystem only resources. Whereas it could just check that the underlying resource collection is structurally a filesystem resource collection; and if not then the top most resource collection will iterate on the resources.
> The improvement should then to add a new function isFileSystemOnlyCollection which should not check actual resources but check only its own "type". ResourceCollection being an interface we cannot change it without breaking backward compatibility. So we are back to the above two "bad" choices.
> 
> 
>> 
>> The bigger issue is whether we can figure out a way to increase
>> throughput by delegating iterator calls.  The problem is that
>> BaseResourceCollectionWrapper and BaseResourceCollectionContainer both
>> implement the cache attribute by caching a collection of resources.
>> We could explore making the implementation of the cached collection
>> more efficient.  Once again, my understanding of your wider purpose
>> here is, given:
>> 
>> <first>
>> <restrict>
>>   <someotherresourcecollection />
>> </restrict>
>> </first>
>> 
>> You would like for the innermost collection to iterate only until
>> restrict accepts one of its resources.  Perhaps the original
>> implementation is naive in its simplicity.  It has been several years
>> now since the bulk of that code was added to Ant;
> 
> There is always room for some optimization; I prefer a lot seeing not-yet-improved code that already buggy code ;)
> 
>> if we can figure out
>> how to optimize it now I would be perfectly happy with that, but I
>> don't think #isFilesystemOnly() is germane to that efficiency
>> discussion.
> 
> you are right. I wondered about isFilesystemOnly because if a task happens to call that function, then a smart and lazy iterator would be useless.
> 
> I'll try to implement that iterator.

I have implemented it in r958669.
I did some unit tests and some real test with the Eclipse mirror use case. It works definitively better.

Nicolas

> 
> Nicolas
> 
> 
>> 
>> -Matt
>> 
>>> Nicolas
>>> 
>>> 
>>>> 
>>>> -Matt
>>>> 
>>>>> Side note: I didn't checked that piece of script in as our Hudson
>>>>> instance doesn't have Ant 1.8 installed. Not yet. I'll ask for it.
>>>>> 
>>>>> Nicolas
>>>>> 
>>>>> [1] http://ant.apache.org/manual/Types/resources.html#resourcelist
>>>>> 
>>>>> PS: I started recently to intensively use loadresource and all the
>>>>> resource related stuff in my build scripts, it is amazingly porwerful !
>>>>> It reminds me when I was 'pipelining' in cocoon ;)
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>>>>> For additional commands, e-mail: dev-help@ant.apache.org
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>>>> For additional commands, e-mail: dev-help@ant.apache.org
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>>> For additional commands, e-mail: dev-help@ant.apache.org
>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>> For additional commands, e-mail: dev-help@ant.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: A performance issue in 'restrict'

Posted by Nicolas Lalevée <ni...@hibnet.org>.

Le 15 juin 2010 à 18:38, Matt Benson a écrit :

> On 6/15/10, Nicolas Lalevée <ni...@hibnet.org> wrote:
>> On Monday 14 June 2010 20:01:08 Matt Benson wrote:
>>> Hi, Nicolas:
>>> 
>>> [SNIP]
>>> 
>>>> I looked into the code, it seems
>>>> org.apache.tools.ant.types.resources.Restrict is the culprit. It uses
>>>> BaseResourceCollectionWrapper which loads the entire underlying resource
>>>> collection.
>>>> I searched for the use of BaseResourceCollectionWrapper in Ant and I
>>>> think that this performance issue may also affect
>>>> org.apache.tools.ant.types.resources.SizeLimitCollection and
>>>> org.apache.tools.ant.types.resources.Tokens.
>>>> 
>>>> I started to think about a fix but it seems non trivial. If I'm not
>>>> mistaken we would have to implement an iterator which should deal with
>>>> an
>>>> optionnal cache and concurrency.
>>> 
>>> While it would be possible to reimplement Restrict's internal wrapper
>>> such that it would dynamically calculate the included resources while
>>> iterating, I don't see any way to do the same when the *size* of the
>>> collection is requested...
>> 
>> Yep if size is called then the whole collection has to be "fetched".
>> I just hope that in my use case where I just want the first element of the
>> restrict, the whole resource collection won't be fetched. And as I keep
>> looking into the code, it seems that the implementation of
>> org.apache.tools.ant.types.resources.First does that. So size() shouldn't be
>> an issue here.
>> 
>>>> I am also worried about the isFilesystemOnly implementation in Restrict.
>>>> Should it return true if actually selected resources are files, or
>>>> return
>>>> true if the entire set of candidates are files ? In the first case it
>>>> would make the iterator useless.
>>> 
>>> Typically the implementation of #isFilesystemOnly() in
>>> BaseResourceCollectionWrapper and BaseResourceCollectionContainer is a
>>> quick check whether the underlying collection/s is/are known to be
>>> entirely filesystem-based.  If not, each resource is checked in turn
>>> until one is found that is *not* filesystem-based.  The reason for the
>>> distinction is to provide a means of recognizing whether it should be
>>> possible to handle a given resourcecollection using a task that can
>>> only work with files.
>> 
>> So I am suggesting a little change on the implementation: removing the check
>> isFilesystemOnly on the underlying resource collection; just keep the
>> iteration on the actual resources of the underlying collection.
>> I don't think it will change the semantic of the function, right?
>> About performance, in my use case it should behave better as the <first>
>> will
>> only load one resource in the collection.
>> I think there could be a performance impact where a resource collection is
>> known to be file system only, and it is used with a task that cannot support
>> that. The resources will be loaded but not actually used as the build will
>> fail.
>> So it seems a choice between:
>> * useless iteration on some resources which can have high latency
>> * useless iteration on some resources which should be on a filsytem and
>> trigger a failed build.
>> 
>> I prefer the second choice. I might not be impartial though ;)
>> 
> 
> I feel like if we get stuck on #isFilesystemOnly(), we're going to
> lose sight of the important issue here.  This kind of diagnostic
> method must, by definition, ultimately survey each resource, but
> checking against the nested collection/s is a valuable optimization.
> Consider the case where you may wrap a <first> in a <sort> in a
> <restrict> of some other resourcecollection.  If each of these has to
> fully iterate, that amounts to greater latency than attempting to
> delegate that call all the way back to the innermost nested
> collection, does it not?

When we use the iterator on the top resource collection, we will only get the necessary resources. <first> will only fetch one resource. Because of <sort>, actually every resource will be loaded, but it is only due to the sort function, which algorithm force use to fetch every resource even if we want only one. So in that case, yes, asking for the underlying collection if it is filesystem only is an optimization. This is the second case I pointed above. The optimization is valid only on structurally filesystem only resource collection.

Maybe the optimization should be more precise when querying the underlying resource collection. Today there is a check if the underlying collection actually contains filesystem only resources. Whereas it could just check that the underlying resource collection is structurally a filesystem resource collection; and if not then the top most resource collection will iterate on the resources.
The improvement should then to add a new function isFileSystemOnlyCollection which should not check actual resources but check only its own "type". ResourceCollection being an interface we cannot change it without breaking backward compatibility. So we are back to the above two "bad" choices.


> 
> The bigger issue is whether we can figure out a way to increase
> throughput by delegating iterator calls.  The problem is that
> BaseResourceCollectionWrapper and BaseResourceCollectionContainer both
> implement the cache attribute by caching a collection of resources.
> We could explore making the implementation of the cached collection
> more efficient.  Once again, my understanding of your wider purpose
> here is, given:
> 
> <first>
>  <restrict>
>    <someotherresourcecollection />
>  </restrict>
> </first>
> 
> You would like for the innermost collection to iterate only until
> restrict accepts one of its resources.  Perhaps the original
> implementation is naive in its simplicity.  It has been several years
> now since the bulk of that code was added to Ant;

There is always room for some optimization; I prefer a lot seeing not-yet-improved code that already buggy code ;)

> if we can figure out
> how to optimize it now I would be perfectly happy with that, but I
> don't think #isFilesystemOnly() is germane to that efficiency
> discussion.

you are right. I wondered about isFilesystemOnly because if a task happens to call that function, then a smart and lazy iterator would be useless.

I'll try to implement that iterator.

Nicolas


> 
> -Matt
> 
>> Nicolas
>> 
>> 
>>> 
>>> -Matt
>>> 
>>>> Side note: I didn't checked that piece of script in as our Hudson
>>>> instance doesn't have Ant 1.8 installed. Not yet. I'll ask for it.
>>>> 
>>>> Nicolas
>>>> 
>>>> [1] http://ant.apache.org/manual/Types/resources.html#resourcelist
>>>> 
>>>> PS: I started recently to intensively use loadresource and all the
>>>> resource related stuff in my build scripts, it is amazingly porwerful !
>>>> It reminds me when I was 'pipelining' in cocoon ;)
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>>>> For additional commands, e-mail: dev-help@ant.apache.org
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>>> For additional commands, e-mail: dev-help@ant.apache.org
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>> For additional commands, e-mail: dev-help@ant.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: A performance issue in 'restrict'

Posted by Matt Benson <gu...@gmail.com>.

On 6/15/10, Nicolas Lalevée <ni...@hibnet.org> wrote:
> On Monday 14 June 2010 20:01:08 Matt Benson wrote:
>> Hi, Nicolas:
>>
>> [SNIP]
>>
>> > I looked into the code, it seems
>> > org.apache.tools.ant.types.resources.Restrict is the culprit. It uses
>> > BaseResourceCollectionWrapper which loads the entire underlying resource
>> > collection.
>> > I searched for the use of BaseResourceCollectionWrapper in Ant and I
>> > think that this performance issue may also affect
>> > org.apache.tools.ant.types.resources.SizeLimitCollection and
>> > org.apache.tools.ant.types.resources.Tokens.
>> >
>> > I started to think about a fix but it seems non trivial. If I'm not
>> > mistaken we would have to implement an iterator which should deal with
>> > an
>> > optionnal cache and concurrency.
>>
>> While it would be possible to reimplement Restrict's internal wrapper
>> such that it would dynamically calculate the included resources while
>> iterating, I don't see any way to do the same when the *size* of the
>> collection is requested...
>
> Yep if size is called then the whole collection has to be "fetched".
> I just hope that in my use case where I just want the first element of the
> restrict, the whole resource collection won't be fetched. And as I keep
> looking into the code, it seems that the implementation of
> org.apache.tools.ant.types.resources.First does that. So size() shouldn't be
> an issue here.
>
>> > I am also worried about the isFilesystemOnly implementation in Restrict.
>> > Should it return true if actually selected resources are files, or
>> > return
>> > true if the entire set of candidates are files ? In the first case it
>> > would make the iterator useless.
>>
>> Typically the implementation of #isFilesystemOnly() in
>> BaseResourceCollectionWrapper and BaseResourceCollectionContainer is a
>> quick check whether the underlying collection/s is/are known to be
>> entirely filesystem-based.  If not, each resource is checked in turn
>> until one is found that is *not* filesystem-based.  The reason for the
>> distinction is to provide a means of recognizing whether it should be
>> possible to handle a given resourcecollection using a task that can
>> only work with files.
>
> So I am suggesting a little change on the implementation: removing the check
> isFilesystemOnly on the underlying resource collection; just keep the
> iteration on the actual resources of the underlying collection.
> I don't think it will change the semantic of the function, right?
> About performance, in my use case it should behave better as the <first>
> will
> only load one resource in the collection.
> I think there could be a performance impact where a resource collection is
> known to be file system only, and it is used with a task that cannot support
> that. The resources will be loaded but not actually used as the build will
> fail.
> So it seems a choice between:
>  * useless iteration on some resources which can have high latency
>  * useless iteration on some resources which should be on a filsytem and
> trigger a failed build.
>
> I prefer the second choice. I might not be impartial though ;)
>

I feel like if we get stuck on #isFilesystemOnly(), we're going to
lose sight of the important issue here.  This kind of diagnostic
method must, by definition, ultimately survey each resource, but
checking against the nested collection/s is a valuable optimization.
Consider the case where you may wrap a <first> in a <sort> in a
<restrict> of some other resourcecollection.  If each of these has to
fully iterate, that amounts to greater latency than attempting to
delegate that call all the way back to the innermost nested
collection, does it not?

The bigger issue is whether we can figure out a way to increase
throughput by delegating iterator calls.  The problem is that
BaseResourceCollectionWrapper and BaseResourceCollectionContainer both
implement the cache attribute by caching a collection of resources.
We could explore making the implementation of the cached collection
more efficient.  Once again, my understanding of your wider purpose
here is, given:

<first>
  <restrict>
    <someotherresourcecollection />
  </restrict>
</first>

You would like for the innermost collection to iterate only until
restrict accepts one of its resources.  Perhaps the original
implementation is naive in its simplicity.  It has been several years
now since the bulk of that code was added to Ant; if we can figure out
how to optimize it now I would be perfectly happy with that, but I
don't think #isFilesystemOnly() is germane to that efficiency
discussion.

-Matt

> Nicolas
>
>
>>
>> -Matt
>>
>> > Side note: I didn't checked that piece of script in as our Hudson
>> > instance doesn't have Ant 1.8 installed. Not yet. I'll ask for it.
>> >
>> > Nicolas
>> >
>> > [1] http://ant.apache.org/manual/Types/resources.html#resourcelist
>> >
>> > PS: I started recently to intensively use loadresource and all the
>> > resource related stuff in my build scripts, it is amazingly porwerful !
>> > It reminds me when I was 'pipelining' in cocoon ;)
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>> > For additional commands, e-mail: dev-help@ant.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
>> For additional commands, e-mail: dev-help@ant.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: A performance issue in 'restrict'

Posted by Nicolas Lalevée <ni...@hibnet.org>.

On Monday 14 June 2010 20:01:08 Matt Benson wrote:
> Hi, Nicolas:
>
> [SNIP]
>
> > I looked into the code, it seems
> > org.apache.tools.ant.types.resources.Restrict is the culprit. It uses
> > BaseResourceCollectionWrapper which loads the entire underlying resource
> > collection.
> > I searched for the use of BaseResourceCollectionWrapper in Ant and I
> > think that this performance issue may also affect
> > org.apache.tools.ant.types.resources.SizeLimitCollection and
> > org.apache.tools.ant.types.resources.Tokens.
> >
> > I started to think about a fix but it seems non trivial. If I'm not
> > mistaken we would have to implement an iterator which should deal with an
> > optionnal cache and concurrency.
>
> While it would be possible to reimplement Restrict's internal wrapper
> such that it would dynamically calculate the included resources while
> iterating, I don't see any way to do the same when the *size* of the
> collection is requested...

Yep if size is called then the whole collection has to be "fetched".
I just hope that in my use case where I just want the first element of the 
restrict, the whole resource collection won't be fetched. And as I keep 
looking into the code, it seems that the implementation of 
org.apache.tools.ant.types.resources.First does that. So size() shouldn't be 
an issue here.

> > I am also worried about the isFilesystemOnly implementation in Restrict.
> > Should it return true if actually selected resources are files, or return
> > true if the entire set of candidates are files ? In the first case it
> > would make the iterator useless.
>
> Typically the implementation of #isFilesystemOnly() in
> BaseResourceCollectionWrapper and BaseResourceCollectionContainer is a
> quick check whether the underlying collection/s is/are known to be
> entirely filesystem-based.  If not, each resource is checked in turn
> until one is found that is *not* filesystem-based.  The reason for the
> distinction is to provide a means of recognizing whether it should be
> possible to handle a given resourcecollection using a task that can
> only work with files.

So I am suggesting a little change on the implementation: removing the check 
isFilesystemOnly on the underlying resource collection; just keep the 
iteration on the actual resources of the underlying collection.
I don't think it will change the semantic of the function, right?
About performance, in my use case it should behave better as the <first> will 
only load one resource in the collection.
I think there could be a performance impact where a resource collection is 
known to be file system only, and it is used with a task that cannot support 
that. The resources will be loaded but not actually used as the build will 
fail.
So it seems a choice between:
 * useless iteration on some resources which can have high latency
 * useless iteration on some resources which should be on a filsytem and 
trigger a failed build.

I prefer the second choice. I might not be impartial though ;)

Nicolas

>
> -Matt
>
> > Side note: I didn't checked that piece of script in as our Hudson
> > instance doesn't have Ant 1.8 installed. Not yet. I'll ask for it.
> >
> > Nicolas
> >
> > [1] http://ant.apache.org/manual/Types/resources.html#resourcelist
> >
> > PS: I started recently to intensively use loadresource and all the
> > resource related stuff in my build scripts, it is amazingly porwerful !
> > It reminds me when I was 'pipelining' in cocoon ;)
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> > For additional commands, e-mail: dev-help@ant.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: A performance issue in 'restrict'

Posted by Matt Benson <gu...@gmail.com>.

Hi, Nicolas:

[SNIP]
> I looked into the code, it seems
> org.apache.tools.ant.types.resources.Restrict is the culprit. It uses
> BaseResourceCollectionWrapper which loads the entire underlying resource
> collection.
> I searched for the use of BaseResourceCollectionWrapper in Ant and I think
> that this performance issue may also affect
> org.apache.tools.ant.types.resources.SizeLimitCollection and
> org.apache.tools.ant.types.resources.Tokens.
>
> I started to think about a fix but it seems non trivial. If I'm not mistaken
> we would have to implement an iterator which should deal with an optionnal
> cache and concurrency.

While it would be possible to reimplement Restrict's internal wrapper
such that it would dynamically calculate the included resources while
iterating, I don't see any way to do the same when the *size* of the
collection is requested...

> I am also worried about the isFilesystemOnly implementation in Restrict.
> Should it return true if actually selected resources are files, or return
> true if the entire set of candidates are files ? In the first case it would
> make the iterator useless.
>

Typically the implementation of #isFilesystemOnly() in
BaseResourceCollectionWrapper and BaseResourceCollectionContainer is a
quick check whether the underlying collection/s is/are known to be
entirely filesystem-based.  If not, each resource is checked in turn
until one is found that is *not* filesystem-based.  The reason for the
distinction is to provide a means of recognizing whether it should be
possible to handle a given resourcecollection using a task that can
only work with files.

-Matt

> Side note: I didn't checked that piece of script in as our Hudson instance
> doesn't have Ant 1.8 installed. Not yet. I'll ask for it.
>
> Nicolas
>
> [1] http://ant.apache.org/manual/Types/resources.html#resourcelist
>
> PS: I started recently to intensively use loadresource and all the resource
> related stuff in my build scripts, it is amazingly porwerful ! It reminds me
> when I was 'pipelining' in cocoon ;)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org