You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Chris Nokleberg <ch...@sixlegs.com> on 2003/07/25 02:21:20 UTC

Event-driven POIFS API

I've been cleaning up our OLE code and have started to look at the
existing POIFS API.

http://jakarta.apache.org/poi/poifs/how-to.html#Event-Driven+Reading
says this:

  "The event-driven API for reading documents is a little more
   complicated and requires that your application know, in advance,
   which files it wants to read. The benefit of using this API is that
   each document is in memory just long enough for your application to
   read it, and documents that you never read at all are not in memory
   at all. When you're finished reading the documents you wanted, the
   file system has no data structures associated with it at all and can
   be discarded."

I think this is a little misleading, especially the part "documents that
you never read at all are not in memory at all". Due to the nature of
OLE, the table of contents stuff could very well be at the end of the
file. When reading from an InputStream, this means you need to buffer
the entire contents, since you can't tell what data to discard as you
are reading it in.

Looking at the code seems to bear out that this is actually what happens
(not surprising), but it does raise a question in my mind as to how
useful an event-driven API actually is. You are not actually reducing
the total memory required to read in a file, just "releasing" the parts
that you do not want a little quicker. The same effect could be achieved
with a simple addition to the "conventional" API. 

In any case, the best choice for low memory situations going forward
will be to stream the data to disk and use a RandomAccessFile-based
reader (mmap or otherwise). So, I don't see any benefit to keeping the
event-based API.

Chris

Re: Event-driven POIFS API

Posted by "Andrew C. Oliver" <ac...@apache.org>.
*shrug* We did profiling on it at the time and it came out with a smaller
memory footprint and a bit quicker in the use cases.  That was almost 2
years ago IIRC.

-Andy


On 7/24/03 8:21 PM, "Chris Nokleberg" <ch...@sixlegs.com> wrote:

> I've been cleaning up our OLE code and have started to look at the
> existing POIFS API.
> 
> http://jakarta.apache.org/poi/poifs/how-to.html#Event-Driven+Reading
> says this:
> 
> "The event-driven API for reading documents is a little more
>  complicated and requires that your application know, in advance,
>  which files it wants to read. The benefit of using this API is that
>  each document is in memory just long enough for your application to
>  read it, and documents that you never read at all are not in memory
>  at all. When you're finished reading the documents you wanted, the
>  file system has no data structures associated with it at all and can
>  be discarded."
> 
> I think this is a little misleading, especially the part "documents that
> you never read at all are not in memory at all". Due to the nature of
> OLE, the table of contents stuff could very well be at the end of the
> file. When reading from an InputStream, this means you need to buffer
> the entire contents, since you can't tell what data to discard as you
> are reading it in.
> 
> Looking at the code seems to bear out that this is actually what happens
> (not surprising), but it does raise a question in my mind as to how
> useful an event-driven API actually is. You are not actually reducing
> the total memory required to read in a file, just "releasing" the parts
> that you do not want a little quicker. The same effect could be achieved
> with a simple addition to the "conventional" API.
> 
> In any case, the best choice for low memory situations going forward
> will be to stream the data to disk and use a RandomAccessFile-based
> reader (mmap or otherwise). So, I don't see any benefit to keeping the
> event-based API.
> 
> Chris
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-dev-help@jakarta.apache.org
> 

-- 
Andrew C. Oliver
http://www.superlinksoftware.com/poi.jsp
Custom enhancements and Commercial Implementation for Jakarta POI

http://jakarta.apache.org/poi
For Java and Excel, Got POI?