You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@harmony.apache.org by Alexei Fedotov <al...@gmail.com> on 2008/03/18 11:58:14 UTC

Delayed entry parsing Was: [classlib][luni] optimized InputStream reader (HARMONY-5522)

Hello folks,

I was thinking about delaying parsing of Manifest entries to get even
more speed up while merging Tim's inlinable ByteStream (it is called
ByteSnapshot now) and faced a design choice. Manifest#getEntries() is
not reported to throw IOException. It means that if I delay parsing
entries till this call and get a parse error, I can no longer throw
conventional IOException.

What should I do?
1. Throw new RuntimeException
2. Set all entries to null (and probably get NPE very soon)
3. Ignore a particular attribute (name: value) with a parsing error as
specification might be suggesting [1]
4. Parse entries during read() method / constructor invocation as RI does.

Suggestions?

[1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html, search for
"Attributes which are not understood are ignored."

On Fri, Mar 14, 2008 at 8:30 PM, Alexei Fedotov
<al...@gmail.com> wrote:
> I just want to have the following written somewhere until I forget: in
>  one of hot spots Eclipse guys parse the whole manifest to get two main
>  attributes, and then drop the object. There is no need to parse 100k
>  of ICU manifst to populate entry signatures for that use case. In
>  other words one may think of delaying parsing of other sections then
>  the main one.
>
>  On Fri, Mar 14, 2008 at 8:22 PM, Alexei Fedotov
>
> <al...@gmail.com> wrote:
>
>
> > Tim, thank you for reading and understanding. I like when my code
>  >  passes a review and I am eager to try arguments I've created for
>  >  myself. I thought of your variant and liked it because it would be
>  >  possible to JIT to inline your version of ByteStream.readFully()
>  >  method.
>  >
>  >  There is some complication caused with a public API exposed by
>  >  Manifest class: the constructor and read methods accept a general
>  >  InputStream as a source for Manifest. Other applications, such as
>  >  Eclipse, read a file themselves and then call Manifest API directly to
>  >  get few main attributes. There is a better chance that external
>  >  developers would use use documented ByteArrayInputStream to wrap their
>  >  bytes than "our possibly optimized internal child" of that class.
>  >
>  >  I also believe that I decrease overall project astonishment by
>  >  encapsulating more complex technique in one place instead of spreading
>  >  less conventional things around. Reflection is hidden in a class with
>  >  conventional API, and no one would question when reading
>  >  ZipFile#getInputStream(ZipEntry) code why he should wrap a resulted
>  >  byte buffer in "our possibly optimized internal child" instead of
>  >  ByteArrayInputStream.
>  >
>  >  Finally two these methods may be combined: in addition to
>  >  ByteArrayInputStream one may check in readFully(InputStream) if "our
>  >  possibly optimized internal child" is supplied an use a quick path
>  >  without reflection. I wonder what you and other java gurus think about
>  >  this.
>  >
>  >
>  >
>  >
>  >  On Fri, Mar 14, 2008 at 7:47 PM, Tim Ellison <t....@gmail.com> wrote:
>  >  > Alexei Fedotov wrote:
>  >  >  > Let me write something more understandable. Breathe, Alexei, breathe slowly.
>  >  >
>  >  >  Code is good ;-)
>  >  >
>  >  >
>  >  >  > 1. Our native code returns us Zip entries in a form of a byte array
>  >  >  > which is later wrapped in ByteArrayInputSteam object.
>  >  >
>  >  >  So this is my point.  While I see that you are writing the ByteStream
>  >  >  util to reach into a regular ByteArrayInputStream to get the backing
>  >  >  array -- my question is why not change the
>  >  >  ZipFile#getInputStream(ZipEntry) implementation to return a ByteStream
>  >  >  (which will be a subclass of ByteArrayInputStream that will simply
>  >  >  return the backing array when asked without reflection).
>  >  >
>  >  >  I agree with the rest of the arguments below.
>  >  >
>  >  >  Regards,
>  >  >  Tim
>  >  >
>  >  >
>  >  >
>  >  >  > 2. Instead of copying portions of this ByteArrayInputSteam via
>  >  >  > read(buffer) call it is quicker to get a reference to the underlying
>  >  >  > buffer using reflection and work with this array.
>  >  >  > 3. This big buffer is useful for further manifest verification. This
>  >  >  > is a subject of separate [1]. Instead of copying manifest sections
>  >  >  > into smaller byte buffers, we maintain a list of relative positions in
>  >  >  > the big buffer, and update the digests using these positions.
>  >  >  > Generally, storing two relative positions is much cheaper than
>  >  >  > allocating memory and populating it with bytes. That is why I refer to
>  >  >  > this change as to implementation cleanup, and not an optimization.
>  >  >  > 4.  I wrote ideas of possible further cleanup in TODO comments. They
>  >  >  > could be described as follows: I removed two of six copies (acts of
>  >  >  > copying) of a manifest byte before it becomes accessible as a key or
>  >  >  > value in a java hash table. Another two copies are easy to be removed,
>  >  >  > so there is enough place for improvement here.
>  >  >  >
>  >  >  > If we would only know the algorithm which will be used for digesting,
>  >  >  > we may replace a manifest buffer with few signatures which might be
>  >  >  > updated while manifest is read. After algorithm is known the manifest
>  >  >  > may be verified in one pass.
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  > On Thu, Mar 13, 2008 at 6:03 PM, Tim Ellison <t....@gmail.com> wrote:
>  >  >  >> Alexei Fedotov wrote:
>  >  >  >>  > I see the question. The general InoouttStream cannot be optimized, but
>  >  >  >>  > our implementation returns ByteArrayInputStream. We can choose a fast
>  >  >  >>  > path without using arraycopy  if we are first to read from this
>  >  >  >>  > ByteArrayInputStream object. We just get get a reference to the
>  >  >  >>  > underlying buffer and return it without copying the contents. This
>  >  >  >>  > works only for unmodified ByteArrayInputStream object, but this is
>  >  >  >>  > exactly our case.
>  >  >  >>
>  >  >  >>  Yep, I see that.  Now how will you use it (i.e. can you show the code
>  >  >  >>  that references ByteStream)?  I can see where you are going but want to
>  >  >  >>  see the whole picture.
>  >  >  >>
>  >  >  >>  Regards,
>  >  >  >>  Tim
>  >  >  >>
>  >  >  >>
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >
>  >
>  >
>  >
>  >  --
>  >  With best regards,
>  >  Alexei
>  >
>
>
>
>  --
>  With best regards,
>  Alexei
>



-- 
With best regards,
Alexei

Re: [classlib][archive] Re: Delayed entry parsing

Posted by Tim Ellison <t....@gmail.com>.

Alexei Fedotov wrote:
> Actually I was thinking of benefits of extending ByteBuffer and
> returning ByteBuffer instance from the static method. Imagine a time
> when the same ByteBuffer would be used through the whole
> implementation starting from a memory mapped file till UTF-8 decoder
> input, and only relative positions would be passed around.
> 
> The reason why I did not use the ByteBuffer for return type was I was
> unable to find enough arguments for that complication right now. May
> be Alexey's MANIFEST would be the case when it would be convenient
> just to shift the last buffer position a over a nul character.
> 
> Folks, I need your opinion to either rename the class or change return
> type to a slightly heavier one.

Rename.  Let's move one step at a time.

Regards,
Tim

Re: [classlib][archive] Re: Delayed entry parsing

Posted by Alexei Fedotov <al...@gmail.com>.

Tim,

Actually I was thinking of benefits of extending ByteBuffer and
returning ByteBuffer instance from the static method. Imagine a time
when the same ByteBuffer would be used through the whole
implementation starting from a memory mapped file till UTF-8 decoder
input, and only relative positions would be passed around.

The reason why I did not use the ByteBuffer for return type was I was
unable to find enough arguments for that complication right now. May
be Alexey's MANIFEST would be the case when it would be convenient
just to shift the last buffer position a over a nul character.

Folks, I need your opinion to either rename the class or change return
type to a slightly heavier one.
Thanks!

On Fri, Mar 28, 2008 at 1:57 AM, Tim Ellison <t....@gmail.com> wrote:
> Alexei Fedotov wrote:
>  > I have completed changes [1] based on your ideas of using
>  > ByteArrayInputStream child for exposing its underlying buffer as well
>  > as fixed few of my earlier TODO comments. I found the contract of the
>  > underlying buffer capturing close to ByteBuffer.wrap() method and
>  > named the class correspondingly.
>
>  Nah, ByteBuffer.wrap(byte[]) would return a ByteBuffer.  This is
>  reaching in and getting the implementation (i.e. almost the opposite).
>
>  We need to rename it.  How about InputStreamExposer#expose(InputStream).
>
>  Regards,
>  Tim
>
>

-- 
With best regards,
Alexei

Re: [classlib][archive] Re: Delayed entry parsing

Posted by Tim Ellison <t....@gmail.com>.

Alexei Fedotov wrote:
> I have completed changes [1] based on your ideas of using
> ByteArrayInputStream child for exposing its underlying buffer as well
> as fixed few of my earlier TODO comments. I found the contract of the
> underlying buffer capturing close to ByteBuffer.wrap() method and
> named the class correspondingly.

Nah, ByteBuffer.wrap(byte[]) would return a ByteBuffer.  This is 
reaching in and getting the implementation (i.e. almost the opposite).

We need to rename it.  How about InputStreamExposer#expose(InputStream).

Regards,
Tim

Re: [classlib][archive] Re: Delayed entry parsing Was: [classlib][luni] optimized InputStream reader (HARMONY-5522)

Posted by Alexei Fedotov <al...@gmail.com>.

Tim,
I have completed changes [1] based on your ideas of using
ByteArrayInputStream child for exposing its underlying buffer as well
as fixed few of my earlier TODO comments. I found the contract of the
underlying buffer capturing close to ByteBuffer.wrap() method and
named the class correspondingly.

As for your question about IOException during manifest parsing, known
data shows that RI uses the exception when Manifest cannot be either
parsed or read. For example, this exception is used when line breaks
violate the required pattern.I have not thoroughly tested that though,
and will do it when get more time.

Thanks.

[1] http://issues.apache.org/jira/browse/HARMONY-4569

On Tue, Mar 18, 2008 at 2:13 PM, Tim Ellison <t....@gmail.com> wrote:
> Alexei Fedotov wrote:
>  > I was thinking about delaying parsing of Manifest entries to get even
>  > more speed up while merging Tim's inlinable ByteStream (it is called
>  > ByteSnapshot now) and faced a design choice. Manifest#getEntries() is
>  > not reported to throw IOException. It means that if I delay parsing
>  > entries till this call and get a parse error, I can no longer throw
>  > conventional IOException.
>
>  Although you are deferring the parsing, I assume you throw an
>  IOException if the manifest stream is not readable?
>
>
>  > What should I do?
>  > 1. Throw new RuntimeException
>  > 2. Set all entries to null (and probably get NPE very soon)
>  > 3. Ignore a particular attribute (name: value) with a parsing error as
>  > specification might be suggesting [1]
>  > 4. Parse entries during read() method / constructor invocation as RI does.
>
>  What does the RI do if the manifest is readable, but not parsable?  Does
>  it throw an IOException?
>
>  Regards,
>  Tim
>
>
>
>  > Suggestions?
>  >
>  > [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html, search for
>  > "Attributes which are not understood are ignored."
>
>

-- 
With best regards,
Alexei

[classlib][archive] Re: Delayed entry parsing Was: [classlib][luni] optimized InputStream reader (HARMONY-5522)

Posted by Tim Ellison <t....@gmail.com>.

Alexei Fedotov wrote:
> I was thinking about delaying parsing of Manifest entries to get even
> more speed up while merging Tim's inlinable ByteStream (it is called
> ByteSnapshot now) and faced a design choice. Manifest#getEntries() is
> not reported to throw IOException. It means that if I delay parsing
> entries till this call and get a parse error, I can no longer throw
> conventional IOException.

Although you are deferring the parsing, I assume you throw an 
IOException if the manifest stream is not readable?

> What should I do?
> 1. Throw new RuntimeException
> 2. Set all entries to null (and probably get NPE very soon)
> 3. Ignore a particular attribute (name: value) with a parsing error as
> specification might be suggesting [1]
> 4. Parse entries during read() method / constructor invocation as RI does.

What does the RI do if the manifest is readable, but not parsable?  Does 
it throw an IOException?

Regards,
Tim

> Suggestions?
> 
> [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html, search for
> "Attributes which are not understood are ignored."