You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2007/07/02 17:41:15 UTC

ant jar-core is slow

I've noticed that "ant jar-core" walks the full subtree; this is very
slow and in fact is deadly when wikipedia XML has been unpacked (~3.2
million files under contrib/benchmark/work/enwiki!).

I've tracked it down to this nested element in the jar task in
common-build.xml: 

  <metainf dir=".">
    <patternset refid="metainf.includes"/>
  </metainf>

which references this "metainf.includes" patternset:

  <patternset id="metainf.includes">
    <exclude name="**/*"/>
  </patternset>
	
Does this rule just exclude everything?  (I'm not very familiar w/ the
syntax here).  EG I don't see anything besides LICENSE.txt and NOTICE.txt
and MANIFEST.MF under the "META-INF" dir in the released 2.2.0 core JAR, so
it seems like this rule isn't doing anything?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ant jar-core is slow

Posted by Chris Hostetter <ho...@fucit.org>.
: That doesn't seem to change things.  However, this seems to
: successfully match nothing without taking a long time doing so :)
:
:   <patternset id="metainf.includes">
:     <include name="FIND_NOTHING"/>
:   </patternset>

ah ... yes, because ant first compares files to the <include> directives,
and only if they match does it compare with the <exclude> directives ...
and if there is no <include> directive a default one is assumed to match
all files.

So if we really want to be safe, this should be functionally equivilent to
what we had before, but more effecient (just in case we ever add a file
named "FIND_NOTHING" to the tree) ...

  <patternset id="metainf.includes">
    <!-- use an explicit include to prevent expensive walking of subdirs
         that default include triggers
    -->
    <include name="FIND_NOTHING"/>
    <exclude name="**/*"/>
  </patternset>


...i've also double checked that there are no other patternsets in any
build file that have an exclude without an inlcude.

Committed revision 552589.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ant jar-core is slow

Posted by Michael McCandless <lu...@mikemccandless.com>.
"Chris Hostetter" <ho...@fucit.org> wrote:
> : which references this "metainf.includes" patternset:
> :
> :   <patternset id="metainf.includes">
> :     <exclude name="**/*"/>
> :   </patternset>
> :
> : Does this rule just exclude everything?  (I'm not very familiar w/ the
> : syntax here).  EG I don't see anything besides LICENSE.txt and
> NOTICE.txt
> : and MANIFEST.MF under the "META-INF" dir in the released 2.2.0 core
> JAR, so
> : it seems like this rule isn't doing anything?
> 
> that's the patternset as defined in common-build.xml, but it can be
> overridden in the individual build files -- it looks like it was put in
> place for the snowball contirb...
> 
> chrish@asimov:~/svn/lucene-clean$ find contrib -name \*.xml | xargs grep
> -A3 metainf.includes
> contrib/snowball/build.xml:  <patternset id="metainf.includes">
> contrib/snowball/build.xml-    <include name="SNOWBALL-LICENSE.txt"/>
> contrib/snowball/build.xml-  </patternset>
> contrib/snowball/build.xml-

Ahhh, OK.

> ...we could eliminate that patternset and change the jarify macro to take
> in nested element instead, but before trying that i'm curious: does
> adding...
>    <exclude name="*" />
> ..do the default patternset improve the performance?

That doesn't seem to change things.  However, this seems to
successfully match nothing without taking a long time doing so :)

  <patternset id="metainf.includes">
    <include name="FIND_NOTHING"/>
  </patternset>

And building in contrib/snowball successfully includes its
SNOWBALL-LICENSE.txt.

> (i'm not actaully sure how to tell which files ant walks so i haven't
> tried it myself)

I'm just using "strace ant jar-core" (on Linux) and watching all the
system calls that happen.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ant jar-core is slow

Posted by Chris Hostetter <ho...@fucit.org>.
: which references this "metainf.includes" patternset:
:
:   <patternset id="metainf.includes">
:     <exclude name="**/*"/>
:   </patternset>
:
: Does this rule just exclude everything?  (I'm not very familiar w/ the
: syntax here).  EG I don't see anything besides LICENSE.txt and NOTICE.txt
: and MANIFEST.MF under the "META-INF" dir in the released 2.2.0 core JAR, so
: it seems like this rule isn't doing anything?

that's the patternset as defined in common-build.xml, but it can be
overridden in the individual build files -- it looks like it was put in
place for the snowball contirb...

chrish@asimov:~/svn/lucene-clean$ find contrib -name \*.xml | xargs grep -A3 metainf.includes
contrib/snowball/build.xml:  <patternset id="metainf.includes">
contrib/snowball/build.xml-    <include name="SNOWBALL-LICENSE.txt"/>
contrib/snowball/build.xml-  </patternset>
contrib/snowball/build.xml-

...we could eliminate that patternset and change the jarify macro to take
in nested element instead, but before trying that i'm curious: does
adding...
   <exclude name="*" />
..do the default patternset improve the performance?

(i'm not actaully sure how to tell which files ant walks so i haven't
tried it myself)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org