You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomee.apache.org by David Blevins <da...@gmail.com> on 2012/04/15 22:21:53 UTC

XBean and scanning performance

(decision and 4 choices at the bottom -- feedback requested)

I did some studying of the zip file format and determined that part of the reworked xbean-finder Archive API was plain wrong.

Using maps as an analogy here is how we were effectively scanning zips (jars):

    "Style A"

    Map<String, InputStream> zip = new HashMap<String, InputStream>();
    for (String entryName : zip.keySet()) {
        InputStream inputStream = zip.get(entryName);
        // scan the stream
    }

While there is some indexing in a zip file in what is called the central directory, it isn't nearly good enough to support this type of random access.  The actual reading is done in C code when a zip file is randomly accessed in this way, but basically it seems about as slow as starting at the beginning of a stream and reading ahead in the stream until the index is hit and then reading for "real".  I doubt it's doing exactly that as in C code you should be able to start in the middle of a file, but let's put it this way... at the very minimum you are reading the Central Directory each and every single random access.

I've reworked the Archive API so that when you iterate over it, you iterate over actual entries.  Using map again as an analogy it looks like this now:

    "Style B"

    for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
        String className = entry.getKey();
        InputStream inputStream = entry.getValue();
        // scan the stream
    }


Using Altassian Confluence as a driver to benchmark only the call to 'new AnnotationFinder(archive)' which is where our scanning happens, here are the results before (style A) and after (style b):


  StyleA: 8.89s - 9.02s
  StyleB: 3.33s - 3.52s

Now unfortunately the 'link()' call used to resolve parent classes that are not in the jars scanned as well as to resolve meta-annotations still needs the StyleA random access.  These things don't involve going in "jar order", but definitely are random access.  With the new and improved code that scans Confluence at around 3.4s, here is the time with 'link()' added

  StyleB scan + StyleA link: 15.61s - 15.75s

That link() call adds another 12 seconds.  Roughly equivalent to the cost of 4 more scans.

So the good news is we don't need the link.  We very much like the link, but we don't need the link for Java EE 6 certification.  We have two very excellent features associated with that linking.

  - Meta-Annotations
  - Discovery JAX-RS of non-annotated Application subclasses (Application is a concrete class you subclass, like HttpServlet)

We have more or less 4 kinds of choices on how we deal with this:

  1. Link() is always called.  (always slow, extra features always enabled)
  2. Link() can be disabled but is enabled by default.   (slow, w/optional fast flag, extra features enabled by default)
  3. Link() can be enabled but is disabled by default.   (fast, w/optional slow flag, extra features disabled by default)
  4. Link is never enabled.  (always fast, extra features permanently disabled)


Thoughts, preferences?


-David


Re: XBean and scanning performance

Posted by Romain Manni-Bucau <rm...@gmail.com>.
i imported on trunk the logic David added in the branch and added the
@ApplicationPath scanning. This way using the hack @ApplicationPath("/") on
the application subclass the deployement is the same (and the cost is not
important at all for the user).

well, xbean snapshot is not deployed? seems it breaks the build

- Romain


2012/4/16 Jean-Louis MONTEIRO <je...@gmail.com>

> Forgot to give my opinion in case you want it :D
>
> +1 to remove REST scanning by default and give a system property to
> activate it if necessary.
>
> Jean-Louis
>
>
> Le 16 avril 2012 13:35, Jean-Louis MONTEIRO <je...@gmail.com> a écrit :
>
> > Hey guys,
> >
> > Great job.
> > First feedbacks seem really nice.
> >
> > Jean-Louis
> >
> >
> > Le 16 avril 2012 01:38, Romain Manni-Bucau <rm...@gmail.com> a
> > écrit :
> >
> > hmm,
> >>
> >> thinking a bit of it we have a nice default behavior = implicit scanning
> >> for rest so i guess we can avoid link by default.
> >>
> >> still to avoid it we can use web.xml...
> >>
> >> so a flag for tck should be enough....but if you do so some tests need
> to
> >> be fixed.
> >>
> >> - Romain
> >>
> >>
> >> 2012/4/16 David Blevins <da...@gmail.com>
> >>
> >> > Ok, fixed the getAnnotatedClasses() bug.  Added XBEAN-206 and more
> code
> >> > like it to JarArchive.
> >> >
> >> > As well i've split out the 'link()' method so we can see the times of
> >> the
> >> > related functionality:
> >> >
> >> > 2.87 = scan
> >> > 1.62 = linkSubclasses
> >> > 4.05 = linkImplementations
> >> > 0.03 = linkMetaAnnotations
> >> > 8.57 = total
> >> >
> >> > (times are in seconds)
> >> >
> >> > Most the cost is linkImplementations for enabling
> 'findImplementations'
> >> > methods, which we don't even use.  So those can easily go without
> >> debate.
> >> >
> >> > The linkMetaAnnotations call is negligible, even still, we could only
> >> call
> >> > it if there are meta-annotations in the app.  We can happily disable
> >> that
> >> > unless it's needed.
> >> >
> >> > That leaves linkSubclasses which at the very least should be
> >> disableable.
> >> >
> >> >
> >> > -David
> >> >
> >> >
> >> > On Apr 15, 2012, at 2:16 PM, Romain Manni-Bucau wrote:
> >> >
> >> > > added a patch: https://issues.apache.org/jira/browse/XBEAN-206
> >> > >
> >> > > can you test it against your mini bench please?
> >> > >
> >> > > - Romain
> >> > >
> >> > >
> >> > > 2012/4/15 Romain Manni-Bucau <rm...@gmail.com>
> >> > >
> >> > >> Hi David,
> >> > >>
> >> > >> for me only 1 should be done.
> >> > >>
> >> > >> well, i didnt understand the whole mail: why do we need to browse
> the
> >> > zip
> >> > >> file multiple times? only for the getbytecode method? i think we
> can
> >> get
> >> > >> rid of multiple scannings and keep the link() features. Another
> >> point is
> >> > >> getAnnotatedClasses() should be able to return sthg even when
> link()
> >> was
> >> > >> not called.
> >> > >>
> >> > >> If the zip parsing is badly done by the jre (if it doesn't use
> fseek
> >> for
> >> > >> instance) we simply have to rewrite it.
> >> > >>
> >> > >> well in
> >> > org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
> >> > >> why Jarfile is not used when possible?
> >> > >>
> >> > >> - Romain
> >> > >>
> >> > >>
> >> > >>
> >> > >> 2012/4/15 David Blevins <da...@gmail.com>
> >> > >>
> >> > >>> (decision and 4 choices at the bottom -- feedback requested)
> >> > >>>
> >> > >>> I did some studying of the zip file format and determined that
> part
> >> of
> >> > >>> the reworked xbean-finder Archive API was plain wrong.
> >> > >>>
> >> > >>> Using maps as an analogy here is how we were effectively scanning
> >> zips
> >> > >>> (jars):
> >> > >>>
> >> > >>>   "Style A"
> >> > >>>
> >> > >>>   Map<String, InputStream> zip = new HashMap<String,
> InputStream>();
> >> > >>>   for (String entryName : zip.keySet()) {
> >> > >>>       InputStream inputStream = zip.get(entryName);
> >> > >>>       // scan the stream
> >> > >>>   }
> >> > >>>
> >> > >>> While there is some indexing in a zip file in what is called the
> >> > central
> >> > >>> directory, it isn't nearly good enough to support this type of
> >> random
> >> > >>> access.  The actual reading is done in C code when a zip file is
> >> > randomly
> >> > >>> accessed in this way, but basically it seems about as slow as
> >> starting
> >> > at
> >> > >>> the beginning of a stream and reading ahead in the stream until
> the
> >> > index
> >> > >>> is hit and then reading for "real".  I doubt it's doing exactly
> that
> >> > as in
> >> > >>> C code you should be able to start in the middle of a file, but
> >> let's
> >> > put
> >> > >>> it this way... at the very minimum you are reading the Central
> >> > Directory
> >> > >>> each and every single random access.
> >> > >>>
> >> > >>> I've reworked the Archive API so that when you iterate over it,
> you
> >> > >>> iterate over actual entries.  Using map again as an analogy it
> looks
> >> > like
> >> > >>> this now:
> >> > >>>
> >> > >>>   "Style B"
> >> > >>>
> >> > >>>   for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
> >> > >>>       String className = entry.getKey();
> >> > >>>       InputStream inputStream = entry.getValue();
> >> > >>>       // scan the stream
> >> > >>>   }
> >> > >>>
> >> > >>>
> >> > >>> Using Altassian Confluence as a driver to benchmark only the call
> to
> >> > 'new
> >> > >>> AnnotationFinder(archive)' which is where our scanning happens,
> here
> >> > are
> >> > >>> the results before (style A) and after (style b):
> >> > >>>
> >> > >>>
> >> > >>> StyleA: 8.89s - 9.02s
> >> > >>> StyleB: 3.33s - 3.52s
> >> > >>>
> >> > >>> Now unfortunately the 'link()' call used to resolve parent classes
> >> that
> >> > >>> are not in the jars scanned as well as to resolve meta-annotations
> >> > still
> >> > >>> needs the StyleA random access.  These things don't involve going
> in
> >> > "jar
> >> > >>> order", but definitely are random access.  With the new and
> improved
> >> > code
> >> > >>> that scans Confluence at around 3.4s, here is the time with
> 'link()'
> >> > added
> >> > >>>
> >> > >>> StyleB scan + StyleA link: 15.61s - 15.75s
> >> > >>>
> >> > >>> That link() call adds another 12 seconds.  Roughly equivalent to
> the
> >> > cost
> >> > >>> of 4 more scans.
> >> > >>>
> >> > >>> So the good news is we don't need the link.  We very much like the
> >> > link,
> >> > >>> but we don't need the link for Java EE 6 certification.  We have
> two
> >> > very
> >> > >>> excellent features associated with that linking.
> >> > >>>
> >> > >>> - Meta-Annotations
> >> > >>> - Discovery JAX-RS of non-annotated Application subclasses
> >> (Application
> >> > >>> is a concrete class you subclass, like HttpServlet)
> >> > >>>
> >> > >>> We have more or less 4 kinds of choices on how we deal with this:
> >> > >>>
> >> > >>> 1. Link() is always called.  (always slow, extra features always
> >> > enabled)
> >> > >>> 2. Link() can be disabled but is enabled by default.   (slow,
> >> > w/optional
> >> > >>> fast flag, extra features enabled by default)
> >> > >>> 3. Link() can be enabled but is disabled by default.   (fast,
> >> > w/optional
> >> > >>> slow flag, extra features disabled by default)
> >> > >>> 4. Link is never enabled.  (always fast, extra features
> permanently
> >> > >>> disabled)
> >> > >>>
> >> > >>>
> >> > >>> Thoughts, preferences?
> >> > >>>
> >> > >>>
> >> > >>> -David
> >> > >>>
> >> > >>>
> >> > >>
> >> >
> >> >
> >>
> >
> >
>

Re: XBean and scanning performance

Posted by Jean-Louis MONTEIRO <je...@gmail.com>.
Forgot to give my opinion in case you want it :D

+1 to remove REST scanning by default and give a system property to
activate it if necessary.

Jean-Louis


Le 16 avril 2012 13:35, Jean-Louis MONTEIRO <je...@gmail.com> a écrit :

> Hey guys,
>
> Great job.
> First feedbacks seem really nice.
>
> Jean-Louis
>
>
> Le 16 avril 2012 01:38, Romain Manni-Bucau <rm...@gmail.com> a
> écrit :
>
> hmm,
>>
>> thinking a bit of it we have a nice default behavior = implicit scanning
>> for rest so i guess we can avoid link by default.
>>
>> still to avoid it we can use web.xml...
>>
>> so a flag for tck should be enough....but if you do so some tests need to
>> be fixed.
>>
>> - Romain
>>
>>
>> 2012/4/16 David Blevins <da...@gmail.com>
>>
>> > Ok, fixed the getAnnotatedClasses() bug.  Added XBEAN-206 and more code
>> > like it to JarArchive.
>> >
>> > As well i've split out the 'link()' method so we can see the times of
>> the
>> > related functionality:
>> >
>> > 2.87 = scan
>> > 1.62 = linkSubclasses
>> > 4.05 = linkImplementations
>> > 0.03 = linkMetaAnnotations
>> > 8.57 = total
>> >
>> > (times are in seconds)
>> >
>> > Most the cost is linkImplementations for enabling 'findImplementations'
>> > methods, which we don't even use.  So those can easily go without
>> debate.
>> >
>> > The linkMetaAnnotations call is negligible, even still, we could only
>> call
>> > it if there are meta-annotations in the app.  We can happily disable
>> that
>> > unless it's needed.
>> >
>> > That leaves linkSubclasses which at the very least should be
>> disableable.
>> >
>> >
>> > -David
>> >
>> >
>> > On Apr 15, 2012, at 2:16 PM, Romain Manni-Bucau wrote:
>> >
>> > > added a patch: https://issues.apache.org/jira/browse/XBEAN-206
>> > >
>> > > can you test it against your mini bench please?
>> > >
>> > > - Romain
>> > >
>> > >
>> > > 2012/4/15 Romain Manni-Bucau <rm...@gmail.com>
>> > >
>> > >> Hi David,
>> > >>
>> > >> for me only 1 should be done.
>> > >>
>> > >> well, i didnt understand the whole mail: why do we need to browse the
>> > zip
>> > >> file multiple times? only for the getbytecode method? i think we can
>> get
>> > >> rid of multiple scannings and keep the link() features. Another
>> point is
>> > >> getAnnotatedClasses() should be able to return sthg even when link()
>> was
>> > >> not called.
>> > >>
>> > >> If the zip parsing is badly done by the jre (if it doesn't use fseek
>> for
>> > >> instance) we simply have to rewrite it.
>> > >>
>> > >> well in
>> > org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
>> > >> why Jarfile is not used when possible?
>> > >>
>> > >> - Romain
>> > >>
>> > >>
>> > >>
>> > >> 2012/4/15 David Blevins <da...@gmail.com>
>> > >>
>> > >>> (decision and 4 choices at the bottom -- feedback requested)
>> > >>>
>> > >>> I did some studying of the zip file format and determined that part
>> of
>> > >>> the reworked xbean-finder Archive API was plain wrong.
>> > >>>
>> > >>> Using maps as an analogy here is how we were effectively scanning
>> zips
>> > >>> (jars):
>> > >>>
>> > >>>   "Style A"
>> > >>>
>> > >>>   Map<String, InputStream> zip = new HashMap<String, InputStream>();
>> > >>>   for (String entryName : zip.keySet()) {
>> > >>>       InputStream inputStream = zip.get(entryName);
>> > >>>       // scan the stream
>> > >>>   }
>> > >>>
>> > >>> While there is some indexing in a zip file in what is called the
>> > central
>> > >>> directory, it isn't nearly good enough to support this type of
>> random
>> > >>> access.  The actual reading is done in C code when a zip file is
>> > randomly
>> > >>> accessed in this way, but basically it seems about as slow as
>> starting
>> > at
>> > >>> the beginning of a stream and reading ahead in the stream until the
>> > index
>> > >>> is hit and then reading for "real".  I doubt it's doing exactly that
>> > as in
>> > >>> C code you should be able to start in the middle of a file, but
>> let's
>> > put
>> > >>> it this way... at the very minimum you are reading the Central
>> > Directory
>> > >>> each and every single random access.
>> > >>>
>> > >>> I've reworked the Archive API so that when you iterate over it, you
>> > >>> iterate over actual entries.  Using map again as an analogy it looks
>> > like
>> > >>> this now:
>> > >>>
>> > >>>   "Style B"
>> > >>>
>> > >>>   for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
>> > >>>       String className = entry.getKey();
>> > >>>       InputStream inputStream = entry.getValue();
>> > >>>       // scan the stream
>> > >>>   }
>> > >>>
>> > >>>
>> > >>> Using Altassian Confluence as a driver to benchmark only the call to
>> > 'new
>> > >>> AnnotationFinder(archive)' which is where our scanning happens, here
>> > are
>> > >>> the results before (style A) and after (style b):
>> > >>>
>> > >>>
>> > >>> StyleA: 8.89s - 9.02s
>> > >>> StyleB: 3.33s - 3.52s
>> > >>>
>> > >>> Now unfortunately the 'link()' call used to resolve parent classes
>> that
>> > >>> are not in the jars scanned as well as to resolve meta-annotations
>> > still
>> > >>> needs the StyleA random access.  These things don't involve going in
>> > "jar
>> > >>> order", but definitely are random access.  With the new and improved
>> > code
>> > >>> that scans Confluence at around 3.4s, here is the time with 'link()'
>> > added
>> > >>>
>> > >>> StyleB scan + StyleA link: 15.61s - 15.75s
>> > >>>
>> > >>> That link() call adds another 12 seconds.  Roughly equivalent to the
>> > cost
>> > >>> of 4 more scans.
>> > >>>
>> > >>> So the good news is we don't need the link.  We very much like the
>> > link,
>> > >>> but we don't need the link for Java EE 6 certification.  We have two
>> > very
>> > >>> excellent features associated with that linking.
>> > >>>
>> > >>> - Meta-Annotations
>> > >>> - Discovery JAX-RS of non-annotated Application subclasses
>> (Application
>> > >>> is a concrete class you subclass, like HttpServlet)
>> > >>>
>> > >>> We have more or less 4 kinds of choices on how we deal with this:
>> > >>>
>> > >>> 1. Link() is always called.  (always slow, extra features always
>> > enabled)
>> > >>> 2. Link() can be disabled but is enabled by default.   (slow,
>> > w/optional
>> > >>> fast flag, extra features enabled by default)
>> > >>> 3. Link() can be enabled but is disabled by default.   (fast,
>> > w/optional
>> > >>> slow flag, extra features disabled by default)
>> > >>> 4. Link is never enabled.  (always fast, extra features permanently
>> > >>> disabled)
>> > >>>
>> > >>>
>> > >>> Thoughts, preferences?
>> > >>>
>> > >>>
>> > >>> -David
>> > >>>
>> > >>>
>> > >>
>> >
>> >
>>
>
>

Re: XBean and scanning performance

Posted by Jean-Louis MONTEIRO <je...@gmail.com>.
Hey guys,

Great job.
First feedbacks seem really nice.

Jean-Louis


Le 16 avril 2012 01:38, Romain Manni-Bucau <rm...@gmail.com> a écrit :

> hmm,
>
> thinking a bit of it we have a nice default behavior = implicit scanning
> for rest so i guess we can avoid link by default.
>
> still to avoid it we can use web.xml...
>
> so a flag for tck should be enough....but if you do so some tests need to
> be fixed.
>
> - Romain
>
>
> 2012/4/16 David Blevins <da...@gmail.com>
>
> > Ok, fixed the getAnnotatedClasses() bug.  Added XBEAN-206 and more code
> > like it to JarArchive.
> >
> > As well i've split out the 'link()' method so we can see the times of the
> > related functionality:
> >
> > 2.87 = scan
> > 1.62 = linkSubclasses
> > 4.05 = linkImplementations
> > 0.03 = linkMetaAnnotations
> > 8.57 = total
> >
> > (times are in seconds)
> >
> > Most the cost is linkImplementations for enabling 'findImplementations'
> > methods, which we don't even use.  So those can easily go without debate.
> >
> > The linkMetaAnnotations call is negligible, even still, we could only
> call
> > it if there are meta-annotations in the app.  We can happily disable that
> > unless it's needed.
> >
> > That leaves linkSubclasses which at the very least should be disableable.
> >
> >
> > -David
> >
> >
> > On Apr 15, 2012, at 2:16 PM, Romain Manni-Bucau wrote:
> >
> > > added a patch: https://issues.apache.org/jira/browse/XBEAN-206
> > >
> > > can you test it against your mini bench please?
> > >
> > > - Romain
> > >
> > >
> > > 2012/4/15 Romain Manni-Bucau <rm...@gmail.com>
> > >
> > >> Hi David,
> > >>
> > >> for me only 1 should be done.
> > >>
> > >> well, i didnt understand the whole mail: why do we need to browse the
> > zip
> > >> file multiple times? only for the getbytecode method? i think we can
> get
> > >> rid of multiple scannings and keep the link() features. Another point
> is
> > >> getAnnotatedClasses() should be able to return sthg even when link()
> was
> > >> not called.
> > >>
> > >> If the zip parsing is badly done by the jre (if it doesn't use fseek
> for
> > >> instance) we simply have to rewrite it.
> > >>
> > >> well in
> > org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
> > >> why Jarfile is not used when possible?
> > >>
> > >> - Romain
> > >>
> > >>
> > >>
> > >> 2012/4/15 David Blevins <da...@gmail.com>
> > >>
> > >>> (decision and 4 choices at the bottom -- feedback requested)
> > >>>
> > >>> I did some studying of the zip file format and determined that part
> of
> > >>> the reworked xbean-finder Archive API was plain wrong.
> > >>>
> > >>> Using maps as an analogy here is how we were effectively scanning
> zips
> > >>> (jars):
> > >>>
> > >>>   "Style A"
> > >>>
> > >>>   Map<String, InputStream> zip = new HashMap<String, InputStream>();
> > >>>   for (String entryName : zip.keySet()) {
> > >>>       InputStream inputStream = zip.get(entryName);
> > >>>       // scan the stream
> > >>>   }
> > >>>
> > >>> While there is some indexing in a zip file in what is called the
> > central
> > >>> directory, it isn't nearly good enough to support this type of random
> > >>> access.  The actual reading is done in C code when a zip file is
> > randomly
> > >>> accessed in this way, but basically it seems about as slow as
> starting
> > at
> > >>> the beginning of a stream and reading ahead in the stream until the
> > index
> > >>> is hit and then reading for "real".  I doubt it's doing exactly that
> > as in
> > >>> C code you should be able to start in the middle of a file, but let's
> > put
> > >>> it this way... at the very minimum you are reading the Central
> > Directory
> > >>> each and every single random access.
> > >>>
> > >>> I've reworked the Archive API so that when you iterate over it, you
> > >>> iterate over actual entries.  Using map again as an analogy it looks
> > like
> > >>> this now:
> > >>>
> > >>>   "Style B"
> > >>>
> > >>>   for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
> > >>>       String className = entry.getKey();
> > >>>       InputStream inputStream = entry.getValue();
> > >>>       // scan the stream
> > >>>   }
> > >>>
> > >>>
> > >>> Using Altassian Confluence as a driver to benchmark only the call to
> > 'new
> > >>> AnnotationFinder(archive)' which is where our scanning happens, here
> > are
> > >>> the results before (style A) and after (style b):
> > >>>
> > >>>
> > >>> StyleA: 8.89s - 9.02s
> > >>> StyleB: 3.33s - 3.52s
> > >>>
> > >>> Now unfortunately the 'link()' call used to resolve parent classes
> that
> > >>> are not in the jars scanned as well as to resolve meta-annotations
> > still
> > >>> needs the StyleA random access.  These things don't involve going in
> > "jar
> > >>> order", but definitely are random access.  With the new and improved
> > code
> > >>> that scans Confluence at around 3.4s, here is the time with 'link()'
> > added
> > >>>
> > >>> StyleB scan + StyleA link: 15.61s - 15.75s
> > >>>
> > >>> That link() call adds another 12 seconds.  Roughly equivalent to the
> > cost
> > >>> of 4 more scans.
> > >>>
> > >>> So the good news is we don't need the link.  We very much like the
> > link,
> > >>> but we don't need the link for Java EE 6 certification.  We have two
> > very
> > >>> excellent features associated with that linking.
> > >>>
> > >>> - Meta-Annotations
> > >>> - Discovery JAX-RS of non-annotated Application subclasses
> (Application
> > >>> is a concrete class you subclass, like HttpServlet)
> > >>>
> > >>> We have more or less 4 kinds of choices on how we deal with this:
> > >>>
> > >>> 1. Link() is always called.  (always slow, extra features always
> > enabled)
> > >>> 2. Link() can be disabled but is enabled by default.   (slow,
> > w/optional
> > >>> fast flag, extra features enabled by default)
> > >>> 3. Link() can be enabled but is disabled by default.   (fast,
> > w/optional
> > >>> slow flag, extra features disabled by default)
> > >>> 4. Link is never enabled.  (always fast, extra features permanently
> > >>> disabled)
> > >>>
> > >>>
> > >>> Thoughts, preferences?
> > >>>
> > >>>
> > >>> -David
> > >>>
> > >>>
> > >>
> >
> >
>

Re: XBean and scanning performance

Posted by Romain Manni-Bucau <rm...@gmail.com>.
hmm,

thinking a bit of it we have a nice default behavior = implicit scanning
for rest so i guess we can avoid link by default.

still to avoid it we can use web.xml...

so a flag for tck should be enough....but if you do so some tests need to
be fixed.

- Romain


2012/4/16 David Blevins <da...@gmail.com>

> Ok, fixed the getAnnotatedClasses() bug.  Added XBEAN-206 and more code
> like it to JarArchive.
>
> As well i've split out the 'link()' method so we can see the times of the
> related functionality:
>
> 2.87 = scan
> 1.62 = linkSubclasses
> 4.05 = linkImplementations
> 0.03 = linkMetaAnnotations
> 8.57 = total
>
> (times are in seconds)
>
> Most the cost is linkImplementations for enabling 'findImplementations'
> methods, which we don't even use.  So those can easily go without debate.
>
> The linkMetaAnnotations call is negligible, even still, we could only call
> it if there are meta-annotations in the app.  We can happily disable that
> unless it's needed.
>
> That leaves linkSubclasses which at the very least should be disableable.
>
>
> -David
>
>
> On Apr 15, 2012, at 2:16 PM, Romain Manni-Bucau wrote:
>
> > added a patch: https://issues.apache.org/jira/browse/XBEAN-206
> >
> > can you test it against your mini bench please?
> >
> > - Romain
> >
> >
> > 2012/4/15 Romain Manni-Bucau <rm...@gmail.com>
> >
> >> Hi David,
> >>
> >> for me only 1 should be done.
> >>
> >> well, i didnt understand the whole mail: why do we need to browse the
> zip
> >> file multiple times? only for the getbytecode method? i think we can get
> >> rid of multiple scannings and keep the link() features. Another point is
> >> getAnnotatedClasses() should be able to return sthg even when link() was
> >> not called.
> >>
> >> If the zip parsing is badly done by the jre (if it doesn't use fseek for
> >> instance) we simply have to rewrite it.
> >>
> >> well in
> org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
> >> why Jarfile is not used when possible?
> >>
> >> - Romain
> >>
> >>
> >>
> >> 2012/4/15 David Blevins <da...@gmail.com>
> >>
> >>> (decision and 4 choices at the bottom -- feedback requested)
> >>>
> >>> I did some studying of the zip file format and determined that part of
> >>> the reworked xbean-finder Archive API was plain wrong.
> >>>
> >>> Using maps as an analogy here is how we were effectively scanning zips
> >>> (jars):
> >>>
> >>>   "Style A"
> >>>
> >>>   Map<String, InputStream> zip = new HashMap<String, InputStream>();
> >>>   for (String entryName : zip.keySet()) {
> >>>       InputStream inputStream = zip.get(entryName);
> >>>       // scan the stream
> >>>   }
> >>>
> >>> While there is some indexing in a zip file in what is called the
> central
> >>> directory, it isn't nearly good enough to support this type of random
> >>> access.  The actual reading is done in C code when a zip file is
> randomly
> >>> accessed in this way, but basically it seems about as slow as starting
> at
> >>> the beginning of a stream and reading ahead in the stream until the
> index
> >>> is hit and then reading for "real".  I doubt it's doing exactly that
> as in
> >>> C code you should be able to start in the middle of a file, but let's
> put
> >>> it this way... at the very minimum you are reading the Central
> Directory
> >>> each and every single random access.
> >>>
> >>> I've reworked the Archive API so that when you iterate over it, you
> >>> iterate over actual entries.  Using map again as an analogy it looks
> like
> >>> this now:
> >>>
> >>>   "Style B"
> >>>
> >>>   for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
> >>>       String className = entry.getKey();
> >>>       InputStream inputStream = entry.getValue();
> >>>       // scan the stream
> >>>   }
> >>>
> >>>
> >>> Using Altassian Confluence as a driver to benchmark only the call to
> 'new
> >>> AnnotationFinder(archive)' which is where our scanning happens, here
> are
> >>> the results before (style A) and after (style b):
> >>>
> >>>
> >>> StyleA: 8.89s - 9.02s
> >>> StyleB: 3.33s - 3.52s
> >>>
> >>> Now unfortunately the 'link()' call used to resolve parent classes that
> >>> are not in the jars scanned as well as to resolve meta-annotations
> still
> >>> needs the StyleA random access.  These things don't involve going in
> "jar
> >>> order", but definitely are random access.  With the new and improved
> code
> >>> that scans Confluence at around 3.4s, here is the time with 'link()'
> added
> >>>
> >>> StyleB scan + StyleA link: 15.61s - 15.75s
> >>>
> >>> That link() call adds another 12 seconds.  Roughly equivalent to the
> cost
> >>> of 4 more scans.
> >>>
> >>> So the good news is we don't need the link.  We very much like the
> link,
> >>> but we don't need the link for Java EE 6 certification.  We have two
> very
> >>> excellent features associated with that linking.
> >>>
> >>> - Meta-Annotations
> >>> - Discovery JAX-RS of non-annotated Application subclasses (Application
> >>> is a concrete class you subclass, like HttpServlet)
> >>>
> >>> We have more or less 4 kinds of choices on how we deal with this:
> >>>
> >>> 1. Link() is always called.  (always slow, extra features always
> enabled)
> >>> 2. Link() can be disabled but is enabled by default.   (slow,
> w/optional
> >>> fast flag, extra features enabled by default)
> >>> 3. Link() can be enabled but is disabled by default.   (fast,
> w/optional
> >>> slow flag, extra features disabled by default)
> >>> 4. Link is never enabled.  (always fast, extra features permanently
> >>> disabled)
> >>>
> >>>
> >>> Thoughts, preferences?
> >>>
> >>>
> >>> -David
> >>>
> >>>
> >>
>
>

Re: XBean and scanning performance

Posted by David Blevins <da...@gmail.com>.
Ok, fixed the getAnnotatedClasses() bug.  Added XBEAN-206 and more code like it to JarArchive.

As well i've split out the 'link()' method so we can see the times of the related functionality:

2.87 = scan
1.62 = linkSubclasses
4.05 = linkImplementations
0.03 = linkMetaAnnotations
8.57 = total

(times are in seconds)

Most the cost is linkImplementations for enabling 'findImplementations' methods, which we don't even use.  So those can easily go without debate.

The linkMetaAnnotations call is negligible, even still, we could only call it if there are meta-annotations in the app.  We can happily disable that unless it's needed.

That leaves linkSubclasses which at the very least should be disableable.


-David


On Apr 15, 2012, at 2:16 PM, Romain Manni-Bucau wrote:

> added a patch: https://issues.apache.org/jira/browse/XBEAN-206
> 
> can you test it against your mini bench please?
> 
> - Romain
> 
> 
> 2012/4/15 Romain Manni-Bucau <rm...@gmail.com>
> 
>> Hi David,
>> 
>> for me only 1 should be done.
>> 
>> well, i didnt understand the whole mail: why do we need to browse the zip
>> file multiple times? only for the getbytecode method? i think we can get
>> rid of multiple scannings and keep the link() features. Another point is
>> getAnnotatedClasses() should be able to return sthg even when link() was
>> not called.
>> 
>> If the zip parsing is badly done by the jre (if it doesn't use fseek for
>> instance) we simply have to rewrite it.
>> 
>> well in org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
>> why Jarfile is not used when possible?
>> 
>> - Romain
>> 
>> 
>> 
>> 2012/4/15 David Blevins <da...@gmail.com>
>> 
>>> (decision and 4 choices at the bottom -- feedback requested)
>>> 
>>> I did some studying of the zip file format and determined that part of
>>> the reworked xbean-finder Archive API was plain wrong.
>>> 
>>> Using maps as an analogy here is how we were effectively scanning zips
>>> (jars):
>>> 
>>>   "Style A"
>>> 
>>>   Map<String, InputStream> zip = new HashMap<String, InputStream>();
>>>   for (String entryName : zip.keySet()) {
>>>       InputStream inputStream = zip.get(entryName);
>>>       // scan the stream
>>>   }
>>> 
>>> While there is some indexing in a zip file in what is called the central
>>> directory, it isn't nearly good enough to support this type of random
>>> access.  The actual reading is done in C code when a zip file is randomly
>>> accessed in this way, but basically it seems about as slow as starting at
>>> the beginning of a stream and reading ahead in the stream until the index
>>> is hit and then reading for "real".  I doubt it's doing exactly that as in
>>> C code you should be able to start in the middle of a file, but let's put
>>> it this way... at the very minimum you are reading the Central Directory
>>> each and every single random access.
>>> 
>>> I've reworked the Archive API so that when you iterate over it, you
>>> iterate over actual entries.  Using map again as an analogy it looks like
>>> this now:
>>> 
>>>   "Style B"
>>> 
>>>   for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
>>>       String className = entry.getKey();
>>>       InputStream inputStream = entry.getValue();
>>>       // scan the stream
>>>   }
>>> 
>>> 
>>> Using Altassian Confluence as a driver to benchmark only the call to 'new
>>> AnnotationFinder(archive)' which is where our scanning happens, here are
>>> the results before (style A) and after (style b):
>>> 
>>> 
>>> StyleA: 8.89s - 9.02s
>>> StyleB: 3.33s - 3.52s
>>> 
>>> Now unfortunately the 'link()' call used to resolve parent classes that
>>> are not in the jars scanned as well as to resolve meta-annotations still
>>> needs the StyleA random access.  These things don't involve going in "jar
>>> order", but definitely are random access.  With the new and improved code
>>> that scans Confluence at around 3.4s, here is the time with 'link()' added
>>> 
>>> StyleB scan + StyleA link: 15.61s - 15.75s
>>> 
>>> That link() call adds another 12 seconds.  Roughly equivalent to the cost
>>> of 4 more scans.
>>> 
>>> So the good news is we don't need the link.  We very much like the link,
>>> but we don't need the link for Java EE 6 certification.  We have two very
>>> excellent features associated with that linking.
>>> 
>>> - Meta-Annotations
>>> - Discovery JAX-RS of non-annotated Application subclasses (Application
>>> is a concrete class you subclass, like HttpServlet)
>>> 
>>> We have more or less 4 kinds of choices on how we deal with this:
>>> 
>>> 1. Link() is always called.  (always slow, extra features always enabled)
>>> 2. Link() can be disabled but is enabled by default.   (slow, w/optional
>>> fast flag, extra features enabled by default)
>>> 3. Link() can be enabled but is disabled by default.   (fast, w/optional
>>> slow flag, extra features disabled by default)
>>> 4. Link is never enabled.  (always fast, extra features permanently
>>> disabled)
>>> 
>>> 
>>> Thoughts, preferences?
>>> 
>>> 
>>> -David
>>> 
>>> 
>> 


Re: XBean and scanning performance

Posted by Romain Manni-Bucau <rm...@gmail.com>.
added a patch: https://issues.apache.org/jira/browse/XBEAN-206

can you test it against your mini bench please?

- Romain


2012/4/15 Romain Manni-Bucau <rm...@gmail.com>

> Hi David,
>
> for me only 1 should be done.
>
> well, i didnt understand the whole mail: why do we need to browse the zip
> file multiple times? only for the getbytecode method? i think we can get
> rid of multiple scannings and keep the link() features. Another point is
> getAnnotatedClasses() should be able to return sthg even when link() was
> not called.
>
> If the zip parsing is badly done by the jre (if it doesn't use fseek for
> instance) we simply have to rewrite it.
>
> well in org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
> why Jarfile is not used when possible?
>
> - Romain
>
>
>
> 2012/4/15 David Blevins <da...@gmail.com>
>
>> (decision and 4 choices at the bottom -- feedback requested)
>>
>> I did some studying of the zip file format and determined that part of
>> the reworked xbean-finder Archive API was plain wrong.
>>
>> Using maps as an analogy here is how we were effectively scanning zips
>> (jars):
>>
>>    "Style A"
>>
>>    Map<String, InputStream> zip = new HashMap<String, InputStream>();
>>    for (String entryName : zip.keySet()) {
>>        InputStream inputStream = zip.get(entryName);
>>        // scan the stream
>>    }
>>
>> While there is some indexing in a zip file in what is called the central
>> directory, it isn't nearly good enough to support this type of random
>> access.  The actual reading is done in C code when a zip file is randomly
>> accessed in this way, but basically it seems about as slow as starting at
>> the beginning of a stream and reading ahead in the stream until the index
>> is hit and then reading for "real".  I doubt it's doing exactly that as in
>> C code you should be able to start in the middle of a file, but let's put
>> it this way... at the very minimum you are reading the Central Directory
>> each and every single random access.
>>
>> I've reworked the Archive API so that when you iterate over it, you
>> iterate over actual entries.  Using map again as an analogy it looks like
>> this now:
>>
>>    "Style B"
>>
>>    for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
>>        String className = entry.getKey();
>>        InputStream inputStream = entry.getValue();
>>        // scan the stream
>>    }
>>
>>
>> Using Altassian Confluence as a driver to benchmark only the call to 'new
>> AnnotationFinder(archive)' which is where our scanning happens, here are
>> the results before (style A) and after (style b):
>>
>>
>>  StyleA: 8.89s - 9.02s
>>  StyleB: 3.33s - 3.52s
>>
>> Now unfortunately the 'link()' call used to resolve parent classes that
>> are not in the jars scanned as well as to resolve meta-annotations still
>> needs the StyleA random access.  These things don't involve going in "jar
>> order", but definitely are random access.  With the new and improved code
>> that scans Confluence at around 3.4s, here is the time with 'link()' added
>>
>>  StyleB scan + StyleA link: 15.61s - 15.75s
>>
>> That link() call adds another 12 seconds.  Roughly equivalent to the cost
>> of 4 more scans.
>>
>> So the good news is we don't need the link.  We very much like the link,
>> but we don't need the link for Java EE 6 certification.  We have two very
>> excellent features associated with that linking.
>>
>>  - Meta-Annotations
>>  - Discovery JAX-RS of non-annotated Application subclasses (Application
>> is a concrete class you subclass, like HttpServlet)
>>
>> We have more or less 4 kinds of choices on how we deal with this:
>>
>>  1. Link() is always called.  (always slow, extra features always enabled)
>>  2. Link() can be disabled but is enabled by default.   (slow, w/optional
>> fast flag, extra features enabled by default)
>>  3. Link() can be enabled but is disabled by default.   (fast, w/optional
>> slow flag, extra features disabled by default)
>>  4. Link is never enabled.  (always fast, extra features permanently
>> disabled)
>>
>>
>> Thoughts, preferences?
>>
>>
>> -David
>>
>>
>

Re: XBean and scanning performance

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi David,

for me only 1 should be done.

well, i didnt understand the whole mail: why do we need to browse the zip
file multiple times? only for the getbytecode method? i think we can get
rid of multiple scannings and keep the link() features. Another point is
getAnnotatedClasses() should be able to return sthg even when link() was
not called.

If the zip parsing is badly done by the jre (if it doesn't use fseek for
instance) we simply have to rewrite it.

well in org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
why Jarfile is not used when possible?

- Romain


2012/4/15 David Blevins <da...@gmail.com>

> (decision and 4 choices at the bottom -- feedback requested)
>
> I did some studying of the zip file format and determined that part of the
> reworked xbean-finder Archive API was plain wrong.
>
> Using maps as an analogy here is how we were effectively scanning zips
> (jars):
>
>    "Style A"
>
>    Map<String, InputStream> zip = new HashMap<String, InputStream>();
>    for (String entryName : zip.keySet()) {
>        InputStream inputStream = zip.get(entryName);
>        // scan the stream
>    }
>
> While there is some indexing in a zip file in what is called the central
> directory, it isn't nearly good enough to support this type of random
> access.  The actual reading is done in C code when a zip file is randomly
> accessed in this way, but basically it seems about as slow as starting at
> the beginning of a stream and reading ahead in the stream until the index
> is hit and then reading for "real".  I doubt it's doing exactly that as in
> C code you should be able to start in the middle of a file, but let's put
> it this way... at the very minimum you are reading the Central Directory
> each and every single random access.
>
> I've reworked the Archive API so that when you iterate over it, you
> iterate over actual entries.  Using map again as an analogy it looks like
> this now:
>
>    "Style B"
>
>    for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
>        String className = entry.getKey();
>        InputStream inputStream = entry.getValue();
>        // scan the stream
>    }
>
>
> Using Altassian Confluence as a driver to benchmark only the call to 'new
> AnnotationFinder(archive)' which is where our scanning happens, here are
> the results before (style A) and after (style b):
>
>
>  StyleA: 8.89s - 9.02s
>  StyleB: 3.33s - 3.52s
>
> Now unfortunately the 'link()' call used to resolve parent classes that
> are not in the jars scanned as well as to resolve meta-annotations still
> needs the StyleA random access.  These things don't involve going in "jar
> order", but definitely are random access.  With the new and improved code
> that scans Confluence at around 3.4s, here is the time with 'link()' added
>
>  StyleB scan + StyleA link: 15.61s - 15.75s
>
> That link() call adds another 12 seconds.  Roughly equivalent to the cost
> of 4 more scans.
>
> So the good news is we don't need the link.  We very much like the link,
> but we don't need the link for Java EE 6 certification.  We have two very
> excellent features associated with that linking.
>
>  - Meta-Annotations
>  - Discovery JAX-RS of non-annotated Application subclasses (Application
> is a concrete class you subclass, like HttpServlet)
>
> We have more or less 4 kinds of choices on how we deal with this:
>
>  1. Link() is always called.  (always slow, extra features always enabled)
>  2. Link() can be disabled but is enabled by default.   (slow, w/optional
> fast flag, extra features enabled by default)
>  3. Link() can be enabled but is disabled by default.   (fast, w/optional
> slow flag, extra features disabled by default)
>  4. Link is never enabled.  (always fast, extra features permanently
> disabled)
>
>
> Thoughts, preferences?
>
>
> -David
>
>