You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@harmony.apache.org by Alexei Fedotov <al...@gmail.com> on 2008/02/13 18:01:49 UTC

[classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Hello folks,

Do we have original
working_classlib/modules/archive/src/main/java/java/util/jar/ module
contributors on board? Could anyone clarify the reasons behind heavy
solution to copy manifest chunks into a separate hash table descried
at HARMONY-4569? Aren't entity hash table the only object which should
be populated?

-- 
With best regards,
Alexei

[1] http://issues.apache.org/jira/browse/HARMONY-4569

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexey Varlamov <al...@gmail.com>.

2008/2/20, Alexei Zakharov <al...@gmail.com>:
> Pavel,
>
> Have you ever seen the jar of such size? Or ever close to it?
> Well, I also agree we should kept them in mind. But if we can really
> speed up processing of small jars lets do it.

Just for the record, I had to move a BTI installation to other host a
few months ago and it took a few Gigs zipped. Anyway there's nothing
unusual in huge files nowadays. So I second Pavel here: while
optimizing for everyday usecases we should still keep a path for
handling valid corner cases.

--
Alexey

>
> Regards,
> Alexei
>
> 2008/2/20, Pavel Pervov <pm...@gmail.com>:
> > Alexei,
> >
> > I generally agree with Alexei Z, but large zip entries should be kept
> > in mind while implementing current optimizations to java.util.jar, so
> > it wouldn't lead to rewriting the code again when faced with large
> > entries.
> >
> > WBR,
> >     Pavel.
> >
> > On 2/20/08, Alexei Fedotov <al...@gmail.com> wrote:
> > > Alexei,
> > > Thanks for sharing your opinion! Let me note that I mistakenly said
> > > about 4GB. Actually the maximum size of uncompressed entry is limited
> > > by 2GB (Integer.MAX_VALUE).
> > >
> > > Any other votes?
> > >
> > > On Feb 20, 2008 12:19 PM, Alexei Zakharov <al...@gmail.com> wrote:
> > > > Hi Alexei,
> > > >
> > > > I don't think we should really care about such a huge zip files now.
> > > > Especially in case if this assumption that our zip file is less than
> > > > 4Gb can give us performance benefits. IMO it is enough just to file a
> > > > low-pririty JIRA (something like "Harmony can't deal with 16Gb zip
> > > > files") and continue optimizations having in mind we will never met
> > > > zip files more than 4Gb in size.
> > > >
> > > > Regards,
> > > > Alexei
> > > >
> > > > 2008/2/19, Alexei Fedotov <al...@gmail.com>:
> > > >
> > > > > Hello folks,
> > > > >
> > > > > Let me continue with my questions about our archive implementation. I
> > > > > have noticed that our zip input stream is constructed as follows:
> > > > >
> > > > >         byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
> > > > >         return new ByteArrayInputStream(buf);
> > > > >
> > > > > Does it mean that we strategically want to work with zip entries less
> > > > > than 4Gb? This would allow specific optimizations using underlying
> > > > > byte buffer array. Or is it just a bug, and strategically we want to
> > > > > handle as big entries as specified in zip file format?
> > > > >
> > > > > Thank you for sharing your opinion.
> > > > > Alexei
> > > > >
> > > > >
> > > > >
> > > > > On Feb 17, 2008 4:46 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > Thanks Tim for taking care of the patch! I got another question about
> > > > > > this module. Accoroding to specification, attributes of individual
> > > > > > entry sections for the same entry name should be merged. Which bytes
> > > > > > should be checked for a digital digest of this merged entry?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > >
> > > > > > On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > > Hello folks,
> > > > > > >
> > > > > > > Alexey Zakharov kindly shared a hint with me that shorter letters have
> > > > > > > a better chance of being read. That is why I prepared a shorter letter
> > > > > > > asking again about manifest encodings in a form of patch, see
> > > > > > > HARMONY-5517.
> > > > > > >
> > > > > > > I really appreciate if people who touched the code before me (Nathan,
> > > > > > > Tim, or Evgeniya) would take a look.
> > > > > > > Thank you in advance.
> > > > > > >
> > > > > > > [1] http://issues.apache.org/jira/browse/HARMONY-5517
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
> > > > > > >
> > > > > > > <al...@gmail.com> wrote:
> > > > > > > > Hello, Nathan,
> > > > > > > >  Thanks for your interest. I'm trying to resolve a performance problem
> > > > > > > >  described at HARMONY-4569. Gregory mentions that methods write() from
> > > > > > > >  nextChunk() are called too many times, see lines 187, 201 of
> > > > > > > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> > > > > > > >  This slows down Harmony VM in debug and interpreter modes and may
> > > > > > > >  affect overall Eclipse startup since many jars are read in the
> > > > > > > >  process. I'm trying to collect more data.
> > > > > > > >
> > > > > > > >  As far as I was able to advance reviewing the complex code it seemed
> > > > > > > >  that either code or my understanding may be improved.
> > > > > > > >   * "chunks" hash table is used only for jar verification. Do we need
> > > > > > > >  to initialize it for any manifest when this cost us much invocations?
> > > > > > > >  Instead of using write() methods for creating chunks one may think of
> > > > > > > >  remembering chunk positions in the stream, which should be read into
> > > > > > > >  byte array using big buffers instead of individual writes.
> > > > > > > >   * It seems that manifests longer than 1024 characters may result in
> > > > > > > >  "string too long" exception - the buffer they are read in just gets as
> > > > > > > >  much characters from stream as possible, and reports error if the
> > > > > > > >  stream is not read fully.
> > > > > > > >   * I don't know a reason why manifests are read in different
> > > > > > > >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> > > > > > > >   * Close functionality of readLines and nextChunk containing long
> > > > > > > >  conditional sequences may be rewritten in more transparent and
> > > > > > > >  documented way. Generally idea behind "rewriting" of chunks is above
> > > > > > > >  of my understanding: I have not noticed in the specification that line
> > > > > > > >  breaks or anything else should be "rewritten" using eight-if algorithm
> > > > > > > >  instead of taken as is. BTW, I have noticed that Tim was behind
> > > > > > > >  readability improvements of the code. I wonder what was there before
> > > > > > > >  and will check it after lunch.
> > > > > > > >   * The whole class InitManifest seems to be redundant and may be
> > > > > > > >  replaced with a set of static methods. It seems that specific
> > > > > > > >  functionality for two calls to InitManifest should be kept in the
> > > > > > > >  place where InitManifest is called rather than passed to InitManifest
> > > > > > > >  as a parameter for internal check.
> > > > > > > >
> > > > > > > >  I appreciate your comments and help.
> > > > > > > >
> > > > > > > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > >  > Can you point out the painful bits (line numbers, etc)?
> > > > > > > >  >
> > > > > > > >  >
> > > > > > > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > > >  > > Hello folks,
> > > > > > > >  > >
> > > > > > > >  > > Do we have original
> > > > > > > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > > > > > > >  > > contributors on board? Could anyone clarify the reasons behind heavy
> > > > > > > >  > > solution to copy manifest chunks into a separate hash table descried
> > > > > > > >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> > > > > > > >  > > be populated?
> > > > > > > >  > >
> > > > > > > >  > > --
> > > > > > > >  > > With best regards,
> > > > > > > >  > > Alexei
> > > > > > > >  > >
> > > > > > > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> > > > > > > >  > >
> > > > > > > >  >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >  --
> > > > > > > >  With best regards,
> > > > > > > >  Alexei
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > With best regards,
> > > > > > > Alexei
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > With best regards,
> > > > > > Alexei
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > With best regards,
> > > > > Alexei
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > With best regards,
> > > Alexei
> > >
> >
> >
> > --
> > Pavel Pervov,
> > Intel Enterprise Solutions Software Division
> >
>

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexei Zakharov <al...@gmail.com>.

Pavel,

Have you ever seen the jar of such size? Or ever close to it?
Well, I also agree we should kept them in mind. But if we can really
speed up processing of small jars lets do it.

Regards,
Alexei

2008/2/20, Pavel Pervov <pm...@gmail.com>:
> Alexei,
>
> I generally agree with Alexei Z, but large zip entries should be kept
> in mind while implementing current optimizations to java.util.jar, so
> it wouldn't lead to rewriting the code again when faced with large
> entries.
>
> WBR,
>     Pavel.
>
> On 2/20/08, Alexei Fedotov <al...@gmail.com> wrote:
> > Alexei,
> > Thanks for sharing your opinion! Let me note that I mistakenly said
> > about 4GB. Actually the maximum size of uncompressed entry is limited
> > by 2GB (Integer.MAX_VALUE).
> >
> > Any other votes?
> >
> > On Feb 20, 2008 12:19 PM, Alexei Zakharov <al...@gmail.com> wrote:
> > > Hi Alexei,
> > >
> > > I don't think we should really care about such a huge zip files now.
> > > Especially in case if this assumption that our zip file is less than
> > > 4Gb can give us performance benefits. IMO it is enough just to file a
> > > low-pririty JIRA (something like "Harmony can't deal with 16Gb zip
> > > files") and continue optimizations having in mind we will never met
> > > zip files more than 4Gb in size.
> > >
> > > Regards,
> > > Alexei
> > >
> > > 2008/2/19, Alexei Fedotov <al...@gmail.com>:
> > >
> > > > Hello folks,
> > > >
> > > > Let me continue with my questions about our archive implementation. I
> > > > have noticed that our zip input stream is constructed as follows:
> > > >
> > > >         byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
> > > >         return new ByteArrayInputStream(buf);
> > > >
> > > > Does it mean that we strategically want to work with zip entries less
> > > > than 4Gb? This would allow specific optimizations using underlying
> > > > byte buffer array. Or is it just a bug, and strategically we want to
> > > > handle as big entries as specified in zip file format?
> > > >
> > > > Thank you for sharing your opinion.
> > > > Alexei
> > > >
> > > >
> > > >
> > > > On Feb 17, 2008 4:46 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > Thanks Tim for taking care of the patch! I got another question about
> > > > > this module. Accoroding to specification, attributes of individual
> > > > > entry sections for the same entry name should be merged. Which bytes
> > > > > should be checked for a digital digest of this merged entry?
> > > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > > On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > Hello folks,
> > > > > >
> > > > > > Alexey Zakharov kindly shared a hint with me that shorter letters have
> > > > > > a better chance of being read. That is why I prepared a shorter letter
> > > > > > asking again about manifest encodings in a form of patch, see
> > > > > > HARMONY-5517.
> > > > > >
> > > > > > I really appreciate if people who touched the code before me (Nathan,
> > > > > > Tim, or Evgeniya) would take a look.
> > > > > > Thank you in advance.
> > > > > >
> > > > > > [1] http://issues.apache.org/jira/browse/HARMONY-5517
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
> > > > > >
> > > > > > <al...@gmail.com> wrote:
> > > > > > > Hello, Nathan,
> > > > > > >  Thanks for your interest. I'm trying to resolve a performance problem
> > > > > > >  described at HARMONY-4569. Gregory mentions that methods write() from
> > > > > > >  nextChunk() are called too many times, see lines 187, 201 of
> > > > > > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> > > > > > >  This slows down Harmony VM in debug and interpreter modes and may
> > > > > > >  affect overall Eclipse startup since many jars are read in the
> > > > > > >  process. I'm trying to collect more data.
> > > > > > >
> > > > > > >  As far as I was able to advance reviewing the complex code it seemed
> > > > > > >  that either code or my understanding may be improved.
> > > > > > >   * "chunks" hash table is used only for jar verification. Do we need
> > > > > > >  to initialize it for any manifest when this cost us much invocations?
> > > > > > >  Instead of using write() methods for creating chunks one may think of
> > > > > > >  remembering chunk positions in the stream, which should be read into
> > > > > > >  byte array using big buffers instead of individual writes.
> > > > > > >   * It seems that manifests longer than 1024 characters may result in
> > > > > > >  "string too long" exception - the buffer they are read in just gets as
> > > > > > >  much characters from stream as possible, and reports error if the
> > > > > > >  stream is not read fully.
> > > > > > >   * I don't know a reason why manifests are read in different
> > > > > > >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> > > > > > >   * Close functionality of readLines and nextChunk containing long
> > > > > > >  conditional sequences may be rewritten in more transparent and
> > > > > > >  documented way. Generally idea behind "rewriting" of chunks is above
> > > > > > >  of my understanding: I have not noticed in the specification that line
> > > > > > >  breaks or anything else should be "rewritten" using eight-if algorithm
> > > > > > >  instead of taken as is. BTW, I have noticed that Tim was behind
> > > > > > >  readability improvements of the code. I wonder what was there before
> > > > > > >  and will check it after lunch.
> > > > > > >   * The whole class InitManifest seems to be redundant and may be
> > > > > > >  replaced with a set of static methods. It seems that specific
> > > > > > >  functionality for two calls to InitManifest should be kept in the
> > > > > > >  place where InitManifest is called rather than passed to InitManifest
> > > > > > >  as a parameter for internal check.
> > > > > > >
> > > > > > >  I appreciate your comments and help.
> > > > > > >
> > > > > > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > >  > Can you point out the painful bits (line numbers, etc)?
> > > > > > >  >
> > > > > > >  >
> > > > > > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > >  > > Hello folks,
> > > > > > >  > >
> > > > > > >  > > Do we have original
> > > > > > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > > > > > >  > > contributors on board? Could anyone clarify the reasons behind heavy
> > > > > > >  > > solution to copy manifest chunks into a separate hash table descried
> > > > > > >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> > > > > > >  > > be populated?
> > > > > > >  > >
> > > > > > >  > > --
> > > > > > >  > > With best regards,
> > > > > > >  > > Alexei
> > > > > > >  > >
> > > > > > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> > > > > > >  > >
> > > > > > >  >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >  --
> > > > > > >  With best regards,
> > > > > > >  Alexei
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > With best regards,
> > > > > > Alexei
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > With best regards,
> > > > > Alexei
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > With best regards,
> > > > Alexei
> > > >
> > >
> >
> >
> >
> > --
> > With best regards,
> > Alexei
> >
>
>
> --
> Pavel Pervov,
> Intel Enterprise Solutions Software Division
>