You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Alexei Fedotov <al...@gmail.com> on 2008/02/13 18:01:49 UTC

[classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Hello folks,

Do we have original
working_classlib/modules/archive/src/main/java/java/util/jar/ module
contributors on board? Could anyone clarify the reasons behind heavy
solution to copy manifest chunks into a separate hash table descried
at HARMONY-4569? Aren't entity hash table the only object which should
be populated?

-- 
With best regards,
Alexei

[1] http://issues.apache.org/jira/browse/HARMONY-4569

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexey Varlamov <al...@gmail.com>.
2008/2/20, Alexei Zakharov <al...@gmail.com>:
> Pavel,
>
> Have you ever seen the jar of such size? Or ever close to it?
> Well, I also agree we should kept them in mind. But if we can really
> speed up processing of small jars lets do it.

Just for the record, I had to move a BTI installation to other host a
few months ago and it took a few Gigs zipped. Anyway there's nothing
unusual in huge files nowadays. So I second Pavel here: while
optimizing for everyday usecases we should still keep a path for
handling valid corner cases.

--
Alexey

>
> Regards,
> Alexei
>
> 2008/2/20, Pavel Pervov <pm...@gmail.com>:
> > Alexei,
> >
> > I generally agree with Alexei Z, but large zip entries should be kept
> > in mind while implementing current optimizations to java.util.jar, so
> > it wouldn't lead to rewriting the code again when faced with large
> > entries.
> >
> > WBR,
> >     Pavel.
> >
> > On 2/20/08, Alexei Fedotov <al...@gmail.com> wrote:
> > > Alexei,
> > > Thanks for sharing your opinion! Let me note that I mistakenly said
> > > about 4GB. Actually the maximum size of uncompressed entry is limited
> > > by 2GB (Integer.MAX_VALUE).
> > >
> > > Any other votes?
> > >
> > > On Feb 20, 2008 12:19 PM, Alexei Zakharov <al...@gmail.com> wrote:
> > > > Hi Alexei,
> > > >
> > > > I don't think we should really care about such a huge zip files now.
> > > > Especially in case if this assumption that our zip file is less than
> > > > 4Gb can give us performance benefits. IMO it is enough just to file a
> > > > low-pririty JIRA (something like "Harmony can't deal with 16Gb zip
> > > > files") and continue optimizations having in mind we will never met
> > > > zip files more than 4Gb in size.
> > > >
> > > > Regards,
> > > > Alexei
> > > >
> > > > 2008/2/19, Alexei Fedotov <al...@gmail.com>:
> > > >
> > > > > Hello folks,
> > > > >
> > > > > Let me continue with my questions about our archive implementation. I
> > > > > have noticed that our zip input stream is constructed as follows:
> > > > >
> > > > >         byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
> > > > >         return new ByteArrayInputStream(buf);
> > > > >
> > > > > Does it mean that we strategically want to work with zip entries less
> > > > > than 4Gb? This would allow specific optimizations using underlying
> > > > > byte buffer array. Or is it just a bug, and strategically we want to
> > > > > handle as big entries as specified in zip file format?
> > > > >
> > > > > Thank you for sharing your opinion.
> > > > > Alexei
> > > > >
> > > > >
> > > > >
> > > > > On Feb 17, 2008 4:46 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > Thanks Tim for taking care of the patch! I got another question about
> > > > > > this module. Accoroding to specification, attributes of individual
> > > > > > entry sections for the same entry name should be merged. Which bytes
> > > > > > should be checked for a digital digest of this merged entry?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > >
> > > > > > On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > > Hello folks,
> > > > > > >
> > > > > > > Alexey Zakharov kindly shared a hint with me that shorter letters have
> > > > > > > a better chance of being read. That is why I prepared a shorter letter
> > > > > > > asking again about manifest encodings in a form of patch, see
> > > > > > > HARMONY-5517.
> > > > > > >
> > > > > > > I really appreciate if people who touched the code before me (Nathan,
> > > > > > > Tim, or Evgeniya) would take a look.
> > > > > > > Thank you in advance.
> > > > > > >
> > > > > > > [1] http://issues.apache.org/jira/browse/HARMONY-5517
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
> > > > > > >
> > > > > > > <al...@gmail.com> wrote:
> > > > > > > > Hello, Nathan,
> > > > > > > >  Thanks for your interest. I'm trying to resolve a performance problem
> > > > > > > >  described at HARMONY-4569. Gregory mentions that methods write() from
> > > > > > > >  nextChunk() are called too many times, see lines 187, 201 of
> > > > > > > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> > > > > > > >  This slows down Harmony VM in debug and interpreter modes and may
> > > > > > > >  affect overall Eclipse startup since many jars are read in the
> > > > > > > >  process. I'm trying to collect more data.
> > > > > > > >
> > > > > > > >  As far as I was able to advance reviewing the complex code it seemed
> > > > > > > >  that either code or my understanding may be improved.
> > > > > > > >   * "chunks" hash table is used only for jar verification. Do we need
> > > > > > > >  to initialize it for any manifest when this cost us much invocations?
> > > > > > > >  Instead of using write() methods for creating chunks one may think of
> > > > > > > >  remembering chunk positions in the stream, which should be read into
> > > > > > > >  byte array using big buffers instead of individual writes.
> > > > > > > >   * It seems that manifests longer than 1024 characters may result in
> > > > > > > >  "string too long" exception - the buffer they are read in just gets as
> > > > > > > >  much characters from stream as possible, and reports error if the
> > > > > > > >  stream is not read fully.
> > > > > > > >   * I don't know a reason why manifests are read in different
> > > > > > > >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> > > > > > > >   * Close functionality of readLines and nextChunk containing long
> > > > > > > >  conditional sequences may be rewritten in more transparent and
> > > > > > > >  documented way. Generally idea behind "rewriting" of chunks is above
> > > > > > > >  of my understanding: I have not noticed in the specification that line
> > > > > > > >  breaks or anything else should be "rewritten" using eight-if algorithm
> > > > > > > >  instead of taken as is. BTW, I have noticed that Tim was behind
> > > > > > > >  readability improvements of the code. I wonder what was there before
> > > > > > > >  and will check it after lunch.
> > > > > > > >   * The whole class InitManifest seems to be redundant and may be
> > > > > > > >  replaced with a set of static methods. It seems that specific
> > > > > > > >  functionality for two calls to InitManifest should be kept in the
> > > > > > > >  place where InitManifest is called rather than passed to InitManifest
> > > > > > > >  as a parameter for internal check.
> > > > > > > >
> > > > > > > >  I appreciate your comments and help.
> > > > > > > >
> > > > > > > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > > >  > Can you point out the painful bits (line numbers, etc)?
> > > > > > > >  >
> > > > > > > >  >
> > > > > > > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > > >  > > Hello folks,
> > > > > > > >  > >
> > > > > > > >  > > Do we have original
> > > > > > > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > > > > > > >  > > contributors on board? Could anyone clarify the reasons behind heavy
> > > > > > > >  > > solution to copy manifest chunks into a separate hash table descried
> > > > > > > >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> > > > > > > >  > > be populated?
> > > > > > > >  > >
> > > > > > > >  > > --
> > > > > > > >  > > With best regards,
> > > > > > > >  > > Alexei
> > > > > > > >  > >
> > > > > > > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> > > > > > > >  > >
> > > > > > > >  >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >  --
> > > > > > > >  With best regards,
> > > > > > > >  Alexei
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > With best regards,
> > > > > > > Alexei
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > With best regards,
> > > > > > Alexei
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > With best regards,
> > > > > Alexei
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > With best regards,
> > > Alexei
> > >
> >
> >
> > --
> > Pavel Pervov,
> > Intel Enterprise Solutions Software Division
> >
>

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexei Zakharov <al...@gmail.com>.
Pavel,

Have you ever seen the jar of such size? Or ever close to it?
Well, I also agree we should kept them in mind. But if we can really
speed up processing of small jars lets do it.

Regards,
Alexei

2008/2/20, Pavel Pervov <pm...@gmail.com>:
> Alexei,
>
> I generally agree with Alexei Z, but large zip entries should be kept
> in mind while implementing current optimizations to java.util.jar, so
> it wouldn't lead to rewriting the code again when faced with large
> entries.
>
> WBR,
>     Pavel.
>
> On 2/20/08, Alexei Fedotov <al...@gmail.com> wrote:
> > Alexei,
> > Thanks for sharing your opinion! Let me note that I mistakenly said
> > about 4GB. Actually the maximum size of uncompressed entry is limited
> > by 2GB (Integer.MAX_VALUE).
> >
> > Any other votes?
> >
> > On Feb 20, 2008 12:19 PM, Alexei Zakharov <al...@gmail.com> wrote:
> > > Hi Alexei,
> > >
> > > I don't think we should really care about such a huge zip files now.
> > > Especially in case if this assumption that our zip file is less than
> > > 4Gb can give us performance benefits. IMO it is enough just to file a
> > > low-pririty JIRA (something like "Harmony can't deal with 16Gb zip
> > > files") and continue optimizations having in mind we will never met
> > > zip files more than 4Gb in size.
> > >
> > > Regards,
> > > Alexei
> > >
> > > 2008/2/19, Alexei Fedotov <al...@gmail.com>:
> > >
> > > > Hello folks,
> > > >
> > > > Let me continue with my questions about our archive implementation. I
> > > > have noticed that our zip input stream is constructed as follows:
> > > >
> > > >         byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
> > > >         return new ByteArrayInputStream(buf);
> > > >
> > > > Does it mean that we strategically want to work with zip entries less
> > > > than 4Gb? This would allow specific optimizations using underlying
> > > > byte buffer array. Or is it just a bug, and strategically we want to
> > > > handle as big entries as specified in zip file format?
> > > >
> > > > Thank you for sharing your opinion.
> > > > Alexei
> > > >
> > > >
> > > >
> > > > On Feb 17, 2008 4:46 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > Thanks Tim for taking care of the patch! I got another question about
> > > > > this module. Accoroding to specification, attributes of individual
> > > > > entry sections for the same entry name should be merged. Which bytes
> > > > > should be checked for a digital digest of this merged entry?
> > > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > > On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > Hello folks,
> > > > > >
> > > > > > Alexey Zakharov kindly shared a hint with me that shorter letters have
> > > > > > a better chance of being read. That is why I prepared a shorter letter
> > > > > > asking again about manifest encodings in a form of patch, see
> > > > > > HARMONY-5517.
> > > > > >
> > > > > > I really appreciate if people who touched the code before me (Nathan,
> > > > > > Tim, or Evgeniya) would take a look.
> > > > > > Thank you in advance.
> > > > > >
> > > > > > [1] http://issues.apache.org/jira/browse/HARMONY-5517
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
> > > > > >
> > > > > > <al...@gmail.com> wrote:
> > > > > > > Hello, Nathan,
> > > > > > >  Thanks for your interest. I'm trying to resolve a performance problem
> > > > > > >  described at HARMONY-4569. Gregory mentions that methods write() from
> > > > > > >  nextChunk() are called too many times, see lines 187, 201 of
> > > > > > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> > > > > > >  This slows down Harmony VM in debug and interpreter modes and may
> > > > > > >  affect overall Eclipse startup since many jars are read in the
> > > > > > >  process. I'm trying to collect more data.
> > > > > > >
> > > > > > >  As far as I was able to advance reviewing the complex code it seemed
> > > > > > >  that either code or my understanding may be improved.
> > > > > > >   * "chunks" hash table is used only for jar verification. Do we need
> > > > > > >  to initialize it for any manifest when this cost us much invocations?
> > > > > > >  Instead of using write() methods for creating chunks one may think of
> > > > > > >  remembering chunk positions in the stream, which should be read into
> > > > > > >  byte array using big buffers instead of individual writes.
> > > > > > >   * It seems that manifests longer than 1024 characters may result in
> > > > > > >  "string too long" exception - the buffer they are read in just gets as
> > > > > > >  much characters from stream as possible, and reports error if the
> > > > > > >  stream is not read fully.
> > > > > > >   * I don't know a reason why manifests are read in different
> > > > > > >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> > > > > > >   * Close functionality of readLines and nextChunk containing long
> > > > > > >  conditional sequences may be rewritten in more transparent and
> > > > > > >  documented way. Generally idea behind "rewriting" of chunks is above
> > > > > > >  of my understanding: I have not noticed in the specification that line
> > > > > > >  breaks or anything else should be "rewritten" using eight-if algorithm
> > > > > > >  instead of taken as is. BTW, I have noticed that Tim was behind
> > > > > > >  readability improvements of the code. I wonder what was there before
> > > > > > >  and will check it after lunch.
> > > > > > >   * The whole class InitManifest seems to be redundant and may be
> > > > > > >  replaced with a set of static methods. It seems that specific
> > > > > > >  functionality for two calls to InitManifest should be kept in the
> > > > > > >  place where InitManifest is called rather than passed to InitManifest
> > > > > > >  as a parameter for internal check.
> > > > > > >
> > > > > > >  I appreciate your comments and help.
> > > > > > >
> > > > > > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> > > > > > >  > Can you point out the painful bits (line numbers, etc)?
> > > > > > >  >
> > > > > > >  >
> > > > > > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > > >  > > Hello folks,
> > > > > > >  > >
> > > > > > >  > > Do we have original
> > > > > > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > > > > > >  > > contributors on board? Could anyone clarify the reasons behind heavy
> > > > > > >  > > solution to copy manifest chunks into a separate hash table descried
> > > > > > >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> > > > > > >  > > be populated?
> > > > > > >  > >
> > > > > > >  > > --
> > > > > > >  > > With best regards,
> > > > > > >  > > Alexei
> > > > > > >  > >
> > > > > > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> > > > > > >  > >
> > > > > > >  >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >  --
> > > > > > >  With best regards,
> > > > > > >  Alexei
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > With best regards,
> > > > > > Alexei
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > With best regards,
> > > > > Alexei
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > With best regards,
> > > > Alexei
> > > >
> > >
> >
> >
> >
> > --
> > With best regards,
> > Alexei
> >
>
>
> --
> Pavel Pervov,
> Intel Enterprise Solutions Software Division
>

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Pavel Pervov <pm...@gmail.com>.
Alexei,

I generally agree with Alexei Z, but large zip entries should be kept
in mind while implementing current optimizations to java.util.jar, so
it wouldn't lead to rewriting the code again when faced with large
entries.

WBR,
    Pavel.

On 2/20/08, Alexei Fedotov <al...@gmail.com> wrote:
> Alexei,
> Thanks for sharing your opinion! Let me note that I mistakenly said
> about 4GB. Actually the maximum size of uncompressed entry is limited
> by 2GB (Integer.MAX_VALUE).
>
> Any other votes?
>
> On Feb 20, 2008 12:19 PM, Alexei Zakharov <al...@gmail.com> wrote:
> > Hi Alexei,
> >
> > I don't think we should really care about such a huge zip files now.
> > Especially in case if this assumption that our zip file is less than
> > 4Gb can give us performance benefits. IMO it is enough just to file a
> > low-pririty JIRA (something like "Harmony can't deal with 16Gb zip
> > files") and continue optimizations having in mind we will never met
> > zip files more than 4Gb in size.
> >
> > Regards,
> > Alexei
> >
> > 2008/2/19, Alexei Fedotov <al...@gmail.com>:
> >
> > > Hello folks,
> > >
> > > Let me continue with my questions about our archive implementation. I
> > > have noticed that our zip input stream is constructed as follows:
> > >
> > >         byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
> > >         return new ByteArrayInputStream(buf);
> > >
> > > Does it mean that we strategically want to work with zip entries less
> > > than 4Gb? This would allow specific optimizations using underlying
> > > byte buffer array. Or is it just a bug, and strategically we want to
> > > handle as big entries as specified in zip file format?
> > >
> > > Thank you for sharing your opinion.
> > > Alexei
> > >
> > >
> > >
> > > On Feb 17, 2008 4:46 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > Thanks Tim for taking care of the patch! I got another question about
> > > > this module. Accoroding to specification, attributes of individual
> > > > entry sections for the same entry name should be merged. Which bytes
> > > > should be checked for a digital digest of this merged entry?
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > Hello folks,
> > > > >
> > > > > Alexey Zakharov kindly shared a hint with me that shorter letters have
> > > > > a better chance of being read. That is why I prepared a shorter letter
> > > > > asking again about manifest encodings in a form of patch, see
> > > > > HARMONY-5517.
> > > > >
> > > > > I really appreciate if people who touched the code before me (Nathan,
> > > > > Tim, or Evgeniya) would take a look.
> > > > > Thank you in advance.
> > > > >
> > > > > [1] http://issues.apache.org/jira/browse/HARMONY-5517
> > > > >
> > > > >
> > > > > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
> > > > >
> > > > > <al...@gmail.com> wrote:
> > > > > > Hello, Nathan,
> > > > > >  Thanks for your interest. I'm trying to resolve a performance problem
> > > > > >  described at HARMONY-4569. Gregory mentions that methods write() from
> > > > > >  nextChunk() are called too many times, see lines 187, 201 of
> > > > > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> > > > > >  This slows down Harmony VM in debug and interpreter modes and may
> > > > > >  affect overall Eclipse startup since many jars are read in the
> > > > > >  process. I'm trying to collect more data.
> > > > > >
> > > > > >  As far as I was able to advance reviewing the complex code it seemed
> > > > > >  that either code or my understanding may be improved.
> > > > > >   * "chunks" hash table is used only for jar verification. Do we need
> > > > > >  to initialize it for any manifest when this cost us much invocations?
> > > > > >  Instead of using write() methods for creating chunks one may think of
> > > > > >  remembering chunk positions in the stream, which should be read into
> > > > > >  byte array using big buffers instead of individual writes.
> > > > > >   * It seems that manifests longer than 1024 characters may result in
> > > > > >  "string too long" exception - the buffer they are read in just gets as
> > > > > >  much characters from stream as possible, and reports error if the
> > > > > >  stream is not read fully.
> > > > > >   * I don't know a reason why manifests are read in different
> > > > > >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> > > > > >   * Close functionality of readLines and nextChunk containing long
> > > > > >  conditional sequences may be rewritten in more transparent and
> > > > > >  documented way. Generally idea behind "rewriting" of chunks is above
> > > > > >  of my understanding: I have not noticed in the specification that line
> > > > > >  breaks or anything else should be "rewritten" using eight-if algorithm
> > > > > >  instead of taken as is. BTW, I have noticed that Tim was behind
> > > > > >  readability improvements of the code. I wonder what was there before
> > > > > >  and will check it after lunch.
> > > > > >   * The whole class InitManifest seems to be redundant and may be
> > > > > >  replaced with a set of static methods. It seems that specific
> > > > > >  functionality for two calls to InitManifest should be kept in the
> > > > > >  place where InitManifest is called rather than passed to InitManifest
> > > > > >  as a parameter for internal check.
> > > > > >
> > > > > >  I appreciate your comments and help.
> > > > > >
> > > > > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> > > > > >
> > > > > >
> > > > > >
> > > > > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> > > > > >  > Can you point out the painful bits (line numbers, etc)?
> > > > > >  >
> > > > > >  >
> > > > > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > > >  > > Hello folks,
> > > > > >  > >
> > > > > >  > > Do we have original
> > > > > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > > > > >  > > contributors on board? Could anyone clarify the reasons behind heavy
> > > > > >  > > solution to copy manifest chunks into a separate hash table descried
> > > > > >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> > > > > >  > > be populated?
> > > > > >  > >
> > > > > >  > > --
> > > > > >  > > With best regards,
> > > > > >  > > Alexei
> > > > > >  > >
> > > > > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> > > > > >  > >
> > > > > >  >
> > > > > >
> > > > > >
> > > > > >
> > > > > >  --
> > > > > >  With best regards,
> > > > > >  Alexei
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > With best regards,
> > > > > Alexei
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > With best regards,
> > > > Alexei
> > > >
> > >
> > >
> > >
> > > --
> > > With best regards,
> > > Alexei
> > >
> >
>
>
>
> --
> With best regards,
> Alexei
>


-- 
Pavel Pervov,
Intel Enterprise Solutions Software Division

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexei Fedotov <al...@gmail.com>.
Alexei,
Thanks for sharing your opinion! Let me note that I mistakenly said
about 4GB. Actually the maximum size of uncompressed entry is limited
by 2GB (Integer.MAX_VALUE).

Any other votes?

On Feb 20, 2008 12:19 PM, Alexei Zakharov <al...@gmail.com> wrote:
> Hi Alexei,
>
> I don't think we should really care about such a huge zip files now.
> Especially in case if this assumption that our zip file is less than
> 4Gb can give us performance benefits. IMO it is enough just to file a
> low-pririty JIRA (something like "Harmony can't deal with 16Gb zip
> files") and continue optimizations having in mind we will never met
> zip files more than 4Gb in size.
>
> Regards,
> Alexei
>
> 2008/2/19, Alexei Fedotov <al...@gmail.com>:
>
> > Hello folks,
> >
> > Let me continue with my questions about our archive implementation. I
> > have noticed that our zip input stream is constructed as follows:
> >
> >         byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
> >         return new ByteArrayInputStream(buf);
> >
> > Does it mean that we strategically want to work with zip entries less
> > than 4Gb? This would allow specific optimizations using underlying
> > byte buffer array. Or is it just a bug, and strategically we want to
> > handle as big entries as specified in zip file format?
> >
> > Thank you for sharing your opinion.
> > Alexei
> >
> >
> >
> > On Feb 17, 2008 4:46 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > Thanks Tim for taking care of the patch! I got another question about
> > > this module. Accoroding to specification, attributes of individual
> > > entry sections for the same entry name should be merged. Which bytes
> > > should be checked for a digital digest of this merged entry?
> > >
> > > Thanks!
> > >
> > >
> > > On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > Hello folks,
> > > >
> > > > Alexey Zakharov kindly shared a hint with me that shorter letters have
> > > > a better chance of being read. That is why I prepared a shorter letter
> > > > asking again about manifest encodings in a form of patch, see
> > > > HARMONY-5517.
> > > >
> > > > I really appreciate if people who touched the code before me (Nathan,
> > > > Tim, or Evgeniya) would take a look.
> > > > Thank you in advance.
> > > >
> > > > [1] http://issues.apache.org/jira/browse/HARMONY-5517
> > > >
> > > >
> > > > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
> > > >
> > > > <al...@gmail.com> wrote:
> > > > > Hello, Nathan,
> > > > >  Thanks for your interest. I'm trying to resolve a performance problem
> > > > >  described at HARMONY-4569. Gregory mentions that methods write() from
> > > > >  nextChunk() are called too many times, see lines 187, 201 of
> > > > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> > > > >  This slows down Harmony VM in debug and interpreter modes and may
> > > > >  affect overall Eclipse startup since many jars are read in the
> > > > >  process. I'm trying to collect more data.
> > > > >
> > > > >  As far as I was able to advance reviewing the complex code it seemed
> > > > >  that either code or my understanding may be improved.
> > > > >   * "chunks" hash table is used only for jar verification. Do we need
> > > > >  to initialize it for any manifest when this cost us much invocations?
> > > > >  Instead of using write() methods for creating chunks one may think of
> > > > >  remembering chunk positions in the stream, which should be read into
> > > > >  byte array using big buffers instead of individual writes.
> > > > >   * It seems that manifests longer than 1024 characters may result in
> > > > >  "string too long" exception - the buffer they are read in just gets as
> > > > >  much characters from stream as possible, and reports error if the
> > > > >  stream is not read fully.
> > > > >   * I don't know a reason why manifests are read in different
> > > > >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> > > > >   * Close functionality of readLines and nextChunk containing long
> > > > >  conditional sequences may be rewritten in more transparent and
> > > > >  documented way. Generally idea behind "rewriting" of chunks is above
> > > > >  of my understanding: I have not noticed in the specification that line
> > > > >  breaks or anything else should be "rewritten" using eight-if algorithm
> > > > >  instead of taken as is. BTW, I have noticed that Tim was behind
> > > > >  readability improvements of the code. I wonder what was there before
> > > > >  and will check it after lunch.
> > > > >   * The whole class InitManifest seems to be redundant and may be
> > > > >  replaced with a set of static methods. It seems that specific
> > > > >  functionality for two calls to InitManifest should be kept in the
> > > > >  place where InitManifest is called rather than passed to InitManifest
> > > > >  as a parameter for internal check.
> > > > >
> > > > >  I appreciate your comments and help.
> > > > >
> > > > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> > > > >
> > > > >
> > > > >
> > > > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> > > > >  > Can you point out the painful bits (line numbers, etc)?
> > > > >  >
> > > > >  >
> > > > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > > > >  > > Hello folks,
> > > > >  > >
> > > > >  > > Do we have original
> > > > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > > > >  > > contributors on board? Could anyone clarify the reasons behind heavy
> > > > >  > > solution to copy manifest chunks into a separate hash table descried
> > > > >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> > > > >  > > be populated?
> > > > >  > >
> > > > >  > > --
> > > > >  > > With best regards,
> > > > >  > > Alexei
> > > > >  > >
> > > > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> > > > >  > >
> > > > >  >
> > > > >
> > > > >
> > > > >
> > > > >  --
> > > > >  With best regards,
> > > > >  Alexei
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > With best regards,
> > > > Alexei
> > > >
> > >
> > >
> > >
> > > --
> > > With best regards,
> > > Alexei
> > >
> >
> >
> >
> > --
> > With best regards,
> > Alexei
> >
>



-- 
With best regards,
Alexei

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexei Zakharov <al...@gmail.com>.
Hi Alexei,

I don't think we should really care about such a huge zip files now.
Especially in case if this assumption that our zip file is less than
4Gb can give us performance benefits. IMO it is enough just to file a
low-pririty JIRA (something like "Harmony can't deal with 16Gb zip
files") and continue optimizations having in mind we will never met
zip files more than 4Gb in size.

Regards,
Alexei

2008/2/19, Alexei Fedotov <al...@gmail.com>:
> Hello folks,
>
> Let me continue with my questions about our archive implementation. I
> have noticed that our zip input stream is constructed as follows:
>
>         byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
>         return new ByteArrayInputStream(buf);
>
> Does it mean that we strategically want to work with zip entries less
> than 4Gb? This would allow specific optimizations using underlying
> byte buffer array. Or is it just a bug, and strategically we want to
> handle as big entries as specified in zip file format?
>
> Thank you for sharing your opinion.
> Alexei
>
>
>
> On Feb 17, 2008 4:46 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > Thanks Tim for taking care of the patch! I got another question about
> > this module. Accoroding to specification, attributes of individual
> > entry sections for the same entry name should be merged. Which bytes
> > should be checked for a digital digest of this merged entry?
> >
> > Thanks!
> >
> >
> > On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > > Hello folks,
> > >
> > > Alexey Zakharov kindly shared a hint with me that shorter letters have
> > > a better chance of being read. That is why I prepared a shorter letter
> > > asking again about manifest encodings in a form of patch, see
> > > HARMONY-5517.
> > >
> > > I really appreciate if people who touched the code before me (Nathan,
> > > Tim, or Evgeniya) would take a look.
> > > Thank you in advance.
> > >
> > > [1] http://issues.apache.org/jira/browse/HARMONY-5517
> > >
> > >
> > > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
> > >
> > > <al...@gmail.com> wrote:
> > > > Hello, Nathan,
> > > >  Thanks for your interest. I'm trying to resolve a performance problem
> > > >  described at HARMONY-4569. Gregory mentions that methods write() from
> > > >  nextChunk() are called too many times, see lines 187, 201 of
> > > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> > > >  This slows down Harmony VM in debug and interpreter modes and may
> > > >  affect overall Eclipse startup since many jars are read in the
> > > >  process. I'm trying to collect more data.
> > > >
> > > >  As far as I was able to advance reviewing the complex code it seemed
> > > >  that either code or my understanding may be improved.
> > > >   * "chunks" hash table is used only for jar verification. Do we need
> > > >  to initialize it for any manifest when this cost us much invocations?
> > > >  Instead of using write() methods for creating chunks one may think of
> > > >  remembering chunk positions in the stream, which should be read into
> > > >  byte array using big buffers instead of individual writes.
> > > >   * It seems that manifests longer than 1024 characters may result in
> > > >  "string too long" exception - the buffer they are read in just gets as
> > > >  much characters from stream as possible, and reports error if the
> > > >  stream is not read fully.
> > > >   * I don't know a reason why manifests are read in different
> > > >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> > > >   * Close functionality of readLines and nextChunk containing long
> > > >  conditional sequences may be rewritten in more transparent and
> > > >  documented way. Generally idea behind "rewriting" of chunks is above
> > > >  of my understanding: I have not noticed in the specification that line
> > > >  breaks or anything else should be "rewritten" using eight-if algorithm
> > > >  instead of taken as is. BTW, I have noticed that Tim was behind
> > > >  readability improvements of the code. I wonder what was there before
> > > >  and will check it after lunch.
> > > >   * The whole class InitManifest seems to be redundant and may be
> > > >  replaced with a set of static methods. It seems that specific
> > > >  functionality for two calls to InitManifest should be kept in the
> > > >  place where InitManifest is called rather than passed to InitManifest
> > > >  as a parameter for internal check.
> > > >
> > > >  I appreciate your comments and help.
> > > >
> > > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> > > >
> > > >
> > > >
> > > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> > > >  > Can you point out the painful bits (line numbers, etc)?
> > > >  >
> > > >  >
> > > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > > >  > > Hello folks,
> > > >  > >
> > > >  > > Do we have original
> > > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > > >  > > contributors on board? Could anyone clarify the reasons behind heavy
> > > >  > > solution to copy manifest chunks into a separate hash table descried
> > > >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> > > >  > > be populated?
> > > >  > >
> > > >  > > --
> > > >  > > With best regards,
> > > >  > > Alexei
> > > >  > >
> > > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> > > >  > >
> > > >  >
> > > >
> > > >
> > > >
> > > >  --
> > > >  With best regards,
> > > >  Alexei
> > > >
> > >
> > >
> > >
> > > --
> > > With best regards,
> > > Alexei
> > >
> >
> >
> >
> > --
> > With best regards,
> > Alexei
> >
>
>
>
> --
> With best regards,
> Alexei
>

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexei Fedotov <al...@gmail.com>.
Hello folks,

Let me continue with my questions about our archive implementation. I
have noticed that our zip input stream is constructed as follows:

        byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
        return new ByteArrayInputStream(buf);

Does it mean that we strategically want to work with zip entries less
than 4Gb? This would allow specific optimizations using underlying
byte buffer array. Or is it just a bug, and strategically we want to
handle as big entries as specified in zip file format?

Thank you for sharing your opinion.
Alexei



On Feb 17, 2008 4:46 PM, Alexei Fedotov <al...@gmail.com> wrote:
> Thanks Tim for taking care of the patch! I got another question about
> this module. Accoroding to specification, attributes of individual
> entry sections for the same entry name should be merged. Which bytes
> should be checked for a digital digest of this merged entry?
>
> Thanks!
>
>
> On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> > Hello folks,
> >
> > Alexey Zakharov kindly shared a hint with me that shorter letters have
> > a better chance of being read. That is why I prepared a shorter letter
> > asking again about manifest encodings in a form of patch, see
> > HARMONY-5517.
> >
> > I really appreciate if people who touched the code before me (Nathan,
> > Tim, or Evgeniya) would take a look.
> > Thank you in advance.
> >
> > [1] http://issues.apache.org/jira/browse/HARMONY-5517
> >
> >
> > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
> >
> > <al...@gmail.com> wrote:
> > > Hello, Nathan,
> > >  Thanks for your interest. I'm trying to resolve a performance problem
> > >  described at HARMONY-4569. Gregory mentions that methods write() from
> > >  nextChunk() are called too many times, see lines 187, 201 of
> > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> > >  This slows down Harmony VM in debug and interpreter modes and may
> > >  affect overall Eclipse startup since many jars are read in the
> > >  process. I'm trying to collect more data.
> > >
> > >  As far as I was able to advance reviewing the complex code it seemed
> > >  that either code or my understanding may be improved.
> > >   * "chunks" hash table is used only for jar verification. Do we need
> > >  to initialize it for any manifest when this cost us much invocations?
> > >  Instead of using write() methods for creating chunks one may think of
> > >  remembering chunk positions in the stream, which should be read into
> > >  byte array using big buffers instead of individual writes.
> > >   * It seems that manifests longer than 1024 characters may result in
> > >  "string too long" exception - the buffer they are read in just gets as
> > >  much characters from stream as possible, and reports error if the
> > >  stream is not read fully.
> > >   * I don't know a reason why manifests are read in different
> > >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> > >   * Close functionality of readLines and nextChunk containing long
> > >  conditional sequences may be rewritten in more transparent and
> > >  documented way. Generally idea behind "rewriting" of chunks is above
> > >  of my understanding: I have not noticed in the specification that line
> > >  breaks or anything else should be "rewritten" using eight-if algorithm
> > >  instead of taken as is. BTW, I have noticed that Tim was behind
> > >  readability improvements of the code. I wonder what was there before
> > >  and will check it after lunch.
> > >   * The whole class InitManifest seems to be redundant and may be
> > >  replaced with a set of static methods. It seems that specific
> > >  functionality for two calls to InitManifest should be kept in the
> > >  place where InitManifest is called rather than passed to InitManifest
> > >  as a parameter for internal check.
> > >
> > >  I appreciate your comments and help.
> > >
> > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> > >
> > >
> > >
> > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> > >  > Can you point out the painful bits (line numbers, etc)?
> > >  >
> > >  >
> > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > >  > > Hello folks,
> > >  > >
> > >  > > Do we have original
> > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > >  > > contributors on board? Could anyone clarify the reasons behind heavy
> > >  > > solution to copy manifest chunks into a separate hash table descried
> > >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> > >  > > be populated?
> > >  > >
> > >  > > --
> > >  > > With best regards,
> > >  > > Alexei
> > >  > >
> > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> > >  > >
> > >  >
> > >
> > >
> > >
> > >  --
> > >  With best regards,
> > >  Alexei
> > >
> >
> >
> >
> > --
> > With best regards,
> > Alexei
> >
>
>
>
> --
> With best regards,
> Alexei
>



-- 
With best regards,
Alexei

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexei Fedotov <al...@gmail.com>.
Thanks Tim for taking care of the patch! I got another question about
this module. Accoroding to specification, attributes of individual
entry sections for the same entry name should be merged. Which bytes
should be checked for a digital digest of this merged entry?

Thanks!

On Feb 15, 2008 3:52 PM, Alexei Fedotov <al...@gmail.com> wrote:
> Hello folks,
>
> Alexey Zakharov kindly shared a hint with me that shorter letters have
> a better chance of being read. That is why I prepared a shorter letter
> asking again about manifest encodings in a form of patch, see
> HARMONY-5517.
>
> I really appreciate if people who touched the code before me (Nathan,
> Tim, or Evgeniya) would take a look.
> Thank you in advance.
>
> [1] http://issues.apache.org/jira/browse/HARMONY-5517
>
>
> On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
>
> <al...@gmail.com> wrote:
> > Hello, Nathan,
> >  Thanks for your interest. I'm trying to resolve a performance problem
> >  described at HARMONY-4569. Gregory mentions that methods write() from
> >  nextChunk() are called too many times, see lines 187, 201 of
> >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
> >  This slows down Harmony VM in debug and interpreter modes and may
> >  affect overall Eclipse startup since many jars are read in the
> >  process. I'm trying to collect more data.
> >
> >  As far as I was able to advance reviewing the complex code it seemed
> >  that either code or my understanding may be improved.
> >   * "chunks" hash table is used only for jar verification. Do we need
> >  to initialize it for any manifest when this cost us much invocations?
> >  Instead of using write() methods for creating chunks one may think of
> >  remembering chunk positions in the stream, which should be read into
> >  byte array using big buffers instead of individual writes.
> >   * It seems that manifests longer than 1024 characters may result in
> >  "string too long" exception - the buffer they are read in just gets as
> >  much characters from stream as possible, and reports error if the
> >  stream is not read fully.
> >   * I don't know a reason why manifests are read in different
> >  encodings. The spec [1] mentions UTF-8 only. Nice to know.
> >   * Close functionality of readLines and nextChunk containing long
> >  conditional sequences may be rewritten in more transparent and
> >  documented way. Generally idea behind "rewriting" of chunks is above
> >  of my understanding: I have not noticed in the specification that line
> >  breaks or anything else should be "rewritten" using eight-if algorithm
> >  instead of taken as is. BTW, I have noticed that Tim was behind
> >  readability improvements of the code. I wonder what was there before
> >  and will check it after lunch.
> >   * The whole class InitManifest seems to be redundant and may be
> >  replaced with a set of static methods. It seems that specific
> >  functionality for two calls to InitManifest should be kept in the
> >  place where InitManifest is called rather than passed to InitManifest
> >  as a parameter for internal check.
> >
> >  I appreciate your comments and help.
> >
> >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
> >
> >
> >
> >  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> >  > Can you point out the painful bits (line numbers, etc)?
> >  >
> >  >
> >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> >  > > Hello folks,
> >  > >
> >  > > Do we have original
> >  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> >  > > contributors on board? Could anyone clarify the reasons behind heavy
> >  > > solution to copy manifest chunks into a separate hash table descried
> >  > > at HARMONY-4569? Aren't entity hash table the only object which should
> >  > > be populated?
> >  > >
> >  > > --
> >  > > With best regards,
> >  > > Alexei
> >  > >
> >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> >  > >
> >  >
> >
> >
> >
> >  --
> >  With best regards,
> >  Alexei
> >
>
>
>
> --
> With best regards,
> Alexei
>



-- 
With best regards,
Alexei

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexei Fedotov <al...@gmail.com>.
Hello folks,

Alexey Zakharov kindly shared a hint with me that shorter letters have
a better chance of being read. That is why I prepared a shorter letter
asking again about manifest encodings in a form of patch, see
HARMONY-5517.

I really appreciate if people who touched the code before me (Nathan,
Tim, or Evgeniya) would take a look.
Thank you in advance.

[1] http://issues.apache.org/jira/browse/HARMONY-5517


On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
<al...@gmail.com> wrote:
> Hello, Nathan,
>  Thanks for your interest. I'm trying to resolve a performance problem
>  described at HARMONY-4569. Gregory mentions that methods write() from
>  nextChunk() are called too many times, see lines 187, 201 of
>  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
>  This slows down Harmony VM in debug and interpreter modes and may
>  affect overall Eclipse startup since many jars are read in the
>  process. I'm trying to collect more data.
>
>  As far as I was able to advance reviewing the complex code it seemed
>  that either code or my understanding may be improved.
>   * "chunks" hash table is used only for jar verification. Do we need
>  to initialize it for any manifest when this cost us much invocations?
>  Instead of using write() methods for creating chunks one may think of
>  remembering chunk positions in the stream, which should be read into
>  byte array using big buffers instead of individual writes.
>   * It seems that manifests longer than 1024 characters may result in
>  "string too long" exception - the buffer they are read in just gets as
>  much characters from stream as possible, and reports error if the
>  stream is not read fully.
>   * I don't know a reason why manifests are read in different
>  encodings. The spec [1] mentions UTF-8 only. Nice to know.
>   * Close functionality of readLines and nextChunk containing long
>  conditional sequences may be rewritten in more transparent and
>  documented way. Generally idea behind "rewriting" of chunks is above
>  of my understanding: I have not noticed in the specification that line
>  breaks or anything else should be "rewritten" using eight-if algorithm
>  instead of taken as is. BTW, I have noticed that Tim was behind
>  readability improvements of the code. I wonder what was there before
>  and will check it after lunch.
>   * The whole class InitManifest seems to be redundant and may be
>  replaced with a set of static methods. It seems that specific
>  functionality for two calls to InitManifest should be kept in the
>  place where InitManifest is called rather than passed to InitManifest
>  as a parameter for internal check.
>
>  I appreciate your comments and help.
>
>  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
>
>
>
>  On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
>  > Can you point out the painful bits (line numbers, etc)?
>  >
>  >
>  > On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
>  > > Hello folks,
>  > >
>  > > Do we have original
>  > > working_classlib/modules/archive/src/main/java/java/util/jar/ module
>  > > contributors on board? Could anyone clarify the reasons behind heavy
>  > > solution to copy manifest chunks into a separate hash table descried
>  > > at HARMONY-4569? Aren't entity hash table the only object which should
>  > > be populated?
>  > >
>  > > --
>  > > With best regards,
>  > > Alexei
>  > >
>  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
>  > >
>  >
>
>
>
>  --
>  With best regards,
>  Alexei
>



-- 
With best regards,
Alexei

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Alexei Fedotov <al...@gmail.com>.
Hello, Nathan,
Thanks for your interest. I'm trying to resolve a performance problem
described at HARMONY-4569. Gregory mentions that methods write() from
nextChunk() are called too many times, see lines 187, 201 of
working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
This slows down Harmony VM in debug and interpreter modes and may
affect overall Eclipse startup since many jars are read in the
process. I'm trying to collect more data.

As far as I was able to advance reviewing the complex code it seemed
that either code or my understanding may be improved.
  * "chunks" hash table is used only for jar verification. Do we need
to initialize it for any manifest when this cost us much invocations?
Instead of using write() methods for creating chunks one may think of
remembering chunk positions in the stream, which should be read into
byte array using big buffers instead of individual writes.
  * It seems that manifests longer than 1024 characters may result in
"string too long" exception - the buffer they are read in just gets as
much characters from stream as possible, and reports error if the
stream is not read fully.
  * I don't know a reason why manifests are read in different
encodings. The spec [1] mentions UTF-8 only. Nice to know.
  * Close functionality of readLines and nextChunk containing long
conditional sequences may be rewritten in more transparent and
documented way. Generally idea behind "rewriting" of chunks is above
of my understanding: I have not noticed in the specification that line
breaks or anything else should be "rewritten" using eight-if algorithm
instead of taken as is. BTW, I have noticed that Tim was behind
readability improvements of the code. I wonder what was there before
and will check it after lunch.
  * The whole class InitManifest seems to be redundant and may be
replaced with a set of static methods. It seems that specific
functionality for two calls to InitManifest should be kept in the
place where InitManifest is called rather than passed to InitManifest
as a parameter for internal check.

I appreciate your comments and help.

[1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html

On Feb 14, 2008 6:00 AM, Nathan Beyer <nd...@apache.org> wrote:
> Can you point out the painful bits (line numbers, etc)?
>
>
> On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> > Hello folks,
> >
> > Do we have original
> > working_classlib/modules/archive/src/main/java/java/util/jar/ module
> > contributors on board? Could anyone clarify the reasons behind heavy
> > solution to copy manifest chunks into a separate hash table descried
> > at HARMONY-4569? Aren't entity hash table the only object which should
> > be populated?
> >
> > --
> > With best regards,
> > Alexei
> >
> > [1] http://issues.apache.org/jira/browse/HARMONY-4569
> >
>



-- 
With best regards,
Alexei

Re: [classlib][archive] java.util.jar specialists/authors wanted to clarify manifest chunks

Posted by Nathan Beyer <nd...@apache.org>.
Can you point out the painful bits (line numbers, etc)?

On Feb 13, 2008 11:01 AM, Alexei Fedotov <al...@gmail.com> wrote:
> Hello folks,
>
> Do we have original
> working_classlib/modules/archive/src/main/java/java/util/jar/ module
> contributors on board? Could anyone clarify the reasons behind heavy
> solution to copy manifest chunks into a separate hash table descried
> at HARMONY-4569? Aren't entity hash table the only object which should
> be populated?
>
> --
> With best regards,
> Alexei
>
> [1] http://issues.apache.org/jira/browse/HARMONY-4569
>