You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Daniel Noll <da...@nuix.com> on 2008/01/22 23:22:25 UTC
HSSF: Middle-ground API for reading an Excel spreadsheet
Hi all.
I was wondering if anyone had experimented with doing lazy parsing via the
eventusermodel interface. I've had an attempt at it myself but am running
into various troubles.
The first one which is really problematic is that once I get a FormulaRecord,
I can't find a way to convert that into the formula string. Thankfully
getting the value result is relatively simple.
Have the HSSF developers considered making an API half way between usermodel
and eventusermodel, which can return HSSFCell instances one at a time without
instantiating the entire spreadsheet? It would be a really nice thing for
saving memory. (Although an implementation of the records which doesn't
create copies of everything in memory would probably solve the memory
problems almost as well.)
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: HSSF: Middle-ground API for reading an Excel spreadsheet
Posted by Nick Burch <ni...@torchbox.com>.
On Tue, 12 Feb 2008, Daniel Noll wrote:
> - The file loaded from disk is merely one big ByteBuffer. (easy)
>
> - A block in the file would be a ByteBuffer created as a subset over the
> larger file ByteBuffer (easy, Java allows for this already)
This looks like it might be a little bit of work. It looks to me like most
of the block creation/reading is done on the input stream one block at a
time, with eof checks etc in there. So, I guess we'd need to change to
just reading the whole lot into some sort of growable byte array, wrap
that as a ByteBuffer, then change the block code to work on that.
> - A document would be a ByteBuffer created as a composite ByteBuffer over
> the blocks which make it up (slightly less easy, requires custom
> ByteBuffer subclass to be written but such a thing will be a useful
> utility and probably should be in Commons if not the JRE itself.)
I guess we could do the blocks first, then have them return byte arrays to
maintain current behaviour. Then, we write the new bytebuffer stuff, and I
guess finally we tweak things like RecordInputStream
> Of course if someone writes to a document it's a different story. You
> would need to create a new ByteBuffer so as not to damage the original
> file (unless you design it to write to the original file -- probably
> harder.)
Currently, we just dump it all into a fresh output stream. I guess we
could keep going with that, or possibly dump into a fresh ByteBuffer,
which we can also pass into an output stream if wanted?
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: HSSF: Middle-ground API for reading an Excel spreadsheet
Posted by Daniel Noll <da...@nuix.com>.
On Saturday 09 February 2008 05:37:12 Nick Burch wrote:
> I've been doing some reading up on ByteBuffer, and was wondering:
>
> On Mon, 4 Feb 2008, Daniel Noll wrote:
> > 1. Lower memory usage due to not keeping a byte[] copy of all data at
> > the POIFS level.
>
> How would this work? Surely we'll still need to read all the bytes that
> make up the whole poifs stream, then pass those into our underlying
> ByteBuffer? I couldn't figure out a way to do it without processing all
> the input stream at least once, since most of them won't support zipping
> about to different places
>
> > 2. If you don't ask for a DocumentInputStream for a given Document, the
> > bytes don't even get read. If you open a stream for a given
> > Document and only read the first part, the rest of the bytes don't even
> > get read.
>
> Again, not sure about that. I can see how we could possibly use a
> ByteBuffer to ensure we always use the same set of bytes in all the bits
> of poifs (and on up as required), but surely we'll still need to save the
> bytes of each DocumentInputStream, otherwise they'll be gone?
I don't follow. Here's what I was thinking in more detail:
At the POIFS level:
- The file loaded from disk is merely one big ByteBuffer. (easy)
- A block in the file would be a ByteBuffer created as a subset over the
larger file ByteBuffer (easy, Java allows for this already)
- A document would be a ByteBuffer created as a composite ByteBuffer over
the blocks which make it up (slightly less easy, requires custom
ByteBuffer subclass to be written but such a thing will be a useful
utility and probably should be in Commons if not the JRE itself.)
- A new kind of DocumentInputStream is created which create a fresh copy
of the ByteBuffer state and uses that to implement an InputStream. (easy)
With this, even if callers read every input stream, it will use only slightly
more memory than what they store themselves. The main memory usage at the
POIFS level would be the storage of which block offsets make up which
documents, and the directory tree information.
Of course if someone writes to a document it's a different story. You would
need to create a new ByteBuffer so as not to damage the original file (unless
you design it to write to the original file -- probably harder.)
> > Of course the main beef I have with ByteBuffer is that it is limited to
> > Integer.MAX_VALUE size, but I guess with OLE2 this isn't, in practice,
> > going to be reached. I imagine the maximum size for an OLE2 document is
> > somewhat lower, although I don't actually know.
>
> Nore do I, but I have a feeling it could well be 2gb too. Surely we have
> that 2gb limit already though, since we're reading the poifs data into a
> byte array, which has the same restriction?
True enough.
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: HSSF: Middle-ground API for reading an Excel spreadsheet
Posted by Nick Burch <ni...@torchbox.com>.
I've been doing some reading up on ByteBuffer, and was wondering:
On Mon, 4 Feb 2008, Daniel Noll wrote:
> 1. Lower memory usage due to not keeping a byte[] copy of all data at the
> POIFS level.
How would this work? Surely we'll still need to read all the bytes that
make up the whole poifs stream, then pass those into our underlying
ByteBuffer? I couldn't figure out a way to do it without processing all
the input stream at least once, since most of them won't support zipping
about to different places
> 2. If you don't ask for a DocumentInputStream for a given Document, the
> bytes don't even get read. If you open a stream for a given Document and
> only read the first part, the rest of the bytes don't even get read.
Again, not sure about that. I can see how we could possibly use a
ByteBuffer to ensure we always use the same set of bytes in all the bits
of poifs (and on up as required), but surely we'll still need to save the
bytes of each DocumentInputStream, otherwise they'll be gone?
> Of course the main beef I have with ByteBuffer is that it is limited to
> Integer.MAX_VALUE size, but I guess with OLE2 this isn't, in practice,
> going to be reached. I imagine the maximum size for an OLE2 document is
> somewhat lower, although I don't actually know.
Nore do I, but I have a feeling it could well be 2gb too. Surely we have
that 2gb limit already though, since we're reading the poifs data into a
byte array, which has the same restriction?
If we can get some memory savings without too much work by switching to
nio / bytebuffer stuff, I am keen to do it. I'm just struggling, almost
certainly due to being new to it all, to see how it'll deliver much of a
saving just yet. Do please educate me :)
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: HSSF: Middle-ground API for reading an Excel spreadsheet
Posted by Daniel Noll <da...@nuix.com>.
On Friday 01 February 2008 05:10:30 Avik Sengupta wrote:
> We've looked at an NIO based POIFS earlier, which is not simple
> (relatively), but doable, but doesnt help at all ...
It's true that it's not simple, I made an attempt to do it once before but
failed.
I wouldn't say that it doesn't help at all though.
1. Lower memory usage due to not keeping a byte[] copy of all data at the
POIFS level.
2. If you don't ask for a DocumentInputStream for a given Document, the
bytes don't even get read. If you open a stream for a given Document and
only read the first part, the rest of the bytes don't even get read.
3. Not everyone is reading OLE2 documents from a File in the first place.
All three of these benefits apply *even if* the changes don't cascade into
HSSF and the other libraries which sit on top of POIFS.
Of course the main beef I have with ByteBuffer is that it is limited to
Integer.MAX_VALUE size, but I guess with OLE2 this isn't, in practice, going
to be reached. I imagine the maximum size for an OLE2 document is somewhat
lower, although I don't actually know.
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: HSSF: Middle-ground API for reading an Excel spreadsheet
Posted by Avik Sengupta <av...@lab49.com>.
On Thursday 31 January 2008 17:37:02 Nick Burch wrote:
> On Tue, 29 Jan 2008, Daniel Noll wrote:
> >> Is your formula related eventusermodel code in a format suitable for
... snip ...
>
> > And as far as POIFS keeping a copy, yes... POIFS is full of issues like
> > that. For instance, even if all you need to read is the CLSID, you still
> > have to read the entire file. If POIFSFileSystem could construct from a
> > ByteBuffer and not take unnecessary copies, it could speed things up
> > dramatically for that situation... but ultimately that would need to
> > propagate to the whole framework for it to really show benefits.
>
> Do feel free to submit patches for that sort of thing :)
>
> I haven't played with ByteBuffer before, so do feel free to suggest how it
> might help + point at code examples / patches that show it
>
We've looked at an NIO based POIFS earlier, which is not simple (relatively),
but doable, but doesnt help at all ... as you say, it needs to propagate up
to HSSF, which will be a significant amount of work....
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
RE: XLS files with no header
Posted by Marwan Gedeon <ma...@zaradoustra.com>.
I attached the file that is Excel 2.1, it seems it uses BIFF 2.0 format,
where no documentation about the format is available anywhere online.
I just need to pull out the data in there, but first POI would complain
about the headers. Anyway to skip that part, and just extract the data,
would be awesome.
-----Original Message-----
From: Nick Burch [mailto:nick@torchbox.com]
Sent: Tuesday, February 05, 2008 7:29 PM
To: POI Users List
Subject: RE: XLS files with no header
On Tue, 5 Feb 2008, Marwan Gedeon wrote:
> The file I'm unable to read is an excel 2.1 file, which is really old.
Wow, that is old
> But POI as I understand does not support this, any easy way to make it
> support this format, since this format is still actively used by some
> carriers for sending invoices to their customers?
Depends what you need to do with the file. Just get some simple numeric
data out? Get formulas out? Get formatting out?
Many of the more complex records will certainly have changed, but you
might be able to bodge something to work just with the numeric records.
Try using the eventusermodel code (it's much simpler), and disable all the
records in RecordFactory except NumberRecord. If that works, you'll have
the cell numeric values, and you can add in other records as needed
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
RE: XLS files with no header
Posted by Nick Burch <ni...@torchbox.com>.
On Tue, 5 Feb 2008, Marwan Gedeon wrote:
> The file I'm unable to read is an excel 2.1 file, which is really old.
Wow, that is old
> But POI as I understand does not support this, any easy way to make it
> support this format, since this format is still actively used by some
> carriers for sending invoices to their customers?
Depends what you need to do with the file. Just get some simple numeric
data out? Get formulas out? Get formatting out?
Many of the more complex records will certainly have changed, but you
might be able to bodge something to work just with the numeric records.
Try using the eventusermodel code (it's much simpler), and disable all the
records in RecordFactory except NumberRecord. If that works, you'll have
the cell numeric values, and you can add in other records as needed
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
RE: XLS files with no header
Posted by Marwan Gedeon <ma...@zaradoustra.com>.
The file I'm unable to read is an excel 2.1 file, which is really old. I
figured that out after removing the extension, opening it in excel, then
trying to save, and having Excel prompting if I want to save the 2.1 format
or not.
But POI as I understand does not support this, any easy way to make it
support this format, since this format is still actively used by some
carriers for sending invoices to their customers?
-----Original Message-----
From: Nick Burch [mailto:nick@torchbox.com]
Sent: Friday, February 01, 2008 3:42 PM
To: POI Users List
Subject: Re: XLS files with no header
On Thu, 31 Jan 2008, Marwan Gedeon wrote:
> I'm running through constraints in the format of an Excel file I have at
> hand, as it's being downloaded from a carrier directly. My application
> needs to read the excel file as is without preopening in Excel, then
convert
> it to CSV. POI fails to open it with the error:
>
> java.io.IOException: Invalid header signature; read 4503629692403721,
> expected -2226271756974174256
This error means that your file isn't a valid OLE2 document
One thing you could try doing is saving the file, and looking at it.
Perhaps it's not in excel format after all, but really something else?
If it is an excel file, but without the normal OLE2 wrapper (rare and odd,
but not un-heard of) you'll need to wrap it up as OLE2 before passing to
HSSF. Check the list archives for the appropriate few lines of POIFS code
to call.
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: XLS files with no header
Posted by Nick Burch <ni...@torchbox.com>.
On Thu, 31 Jan 2008, Marwan Gedeon wrote:
> I'm running through constraints in the format of an Excel file I have at
> hand, as it's being downloaded from a carrier directly. My application
> needs to read the excel file as is without preopening in Excel, then convert
> it to CSV. POI fails to open it with the error:
>
> java.io.IOException: Invalid header signature; read 4503629692403721,
> expected -2226271756974174256
This error means that your file isn't a valid OLE2 document
One thing you could try doing is saving the file, and looking at it.
Perhaps it's not in excel format after all, but really something else?
If it is an excel file, but without the normal OLE2 wrapper (rare and odd,
but not un-heard of) you'll need to wrap it up as OLE2 before passing to
HSSF. Check the list archives for the appropriate few lines of POIFS code
to call.
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
XLS files with no header
Posted by Marwan Gedeon <ma...@zaradoustra.com>.
I'm running through constraints in the format of an Excel file I have at
hand, as it's being downloaded from a carrier directly. My application
needs to read the excel file as is without preopening in Excel, then convert
it to CSV. POI fails to open it with the error:
java.io.IOException: Invalid header signature; read 4503629692403721,
expected -2226271756974174256
at
org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java
:100)
at
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:
84)
at com.cme.billtools.ExcelReader.main(ExcelReader.java:36)
I have noticed many threads on the net mentioning that the headers can be
set through the contenttype, but I do not have control over the carrier's
website to do that.
So my other alternative is to preprocess the Excel file in java to insert
headers, then save it, and reopen it with POI. However, I do not see any
information on doing that through the API docs. Particularly, I do not know
how to manipulate the different blocks.
If anyone has some insight on this, it would be greatly appreciated.
--Marwan
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: HSSF: Middle-ground API for reading an Excel spreadsheet
Posted by Nick Burch <ni...@torchbox.com>.
On Tue, 29 Jan 2008, Daniel Noll wrote:
>> Is your formula related eventusermodel code in a format suitable for
>> contributing back? It'd be handy to be able to put something in svn
>> that would make dealing with the formula stuff much simpler. I'd be
>> happy to spend a bit of time tidying it up / writing tests for it, if
>> you could contribute it?
>
> If I ever figure out how to handle it, I probably would contribute it
> back because it would mean changes to how shared formulas work. At the
> moment as you say, it does require a Workbook. At the moment I don't
> have a Workbook to work with. Maybe I can store off the first however
> many records and then create the Workbook from those -- I haven't tried
> so I don't know what happens if you feed in a list of records without
> the ones which make up the read of the file.
I think you might be able to get away with that. If not, shout and we can
tweak things.
If it gets you close, then we should probably come up with something like
a WorkbookRecordSource interface, which model.Workbook implements. Tweak
the formula code to use those instead, then it's easier for you to pass in
the records that mater. Let us know if that looks like being worth doing.
> Memory is indeed cheap, but unless you have the luxury of a 64-bit JVM,
> there is an upper limit of somewhere around 1.4GB, sometimes less.
> This would normally be nearly 2GB but Windows allocates some DLLs in
> weird positions on some systems, and Sun insist on allocating a
> contiguous block of memory for the heap which sometimes causes a huge
> unusable memory hole above that.
Have you tried tweaking your windows box to use a 1gb/3gb split, instead
of the usual 2gb/2gb one? Might help out in the absence of a 64 bit jvm /
a licence for a non-hobbled 32 bit version of windows.
http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx
> In actual fact for us, something closer to RecordInputStream would be
> even better, where we can just say nextRecord() and have it return a
> properly constructed Record. Then we have control over the loop, which
> is ideal when you need to return a Reader.
Does the newly added org.apache.poi.hssf.eventusermodel.HSSFRecordStream
look roughly like what you need? I've converted the existing
eventusermodel code to use it under the hood, so it ought to behave
pretty much the same, except with pull instead of push.
> As far as the records keeping a copy, could they not instead keep an
> offset and a reference to the original buffer? Then if someone calls a
> setter, it would need to create a new buffer, set the offset to 0 and
> copy the data before doing the actual set.
In many cases, they only keep the parsed data in memory, and not the
source bytes. That's certainly one of the advantages of the (not so) new
RecordInputStream method
> And as far as POIFS keeping a copy, yes... POIFS is full of issues like
> that. For instance, even if all you need to read is the CLSID, you still
> have to read the entire file. If POIFSFileSystem could construct from a
> ByteBuffer and not take unnecessary copies, it could speed things up
> dramatically for that situation... but ultimately that would need to
> propagate to the whole framework for it to really show benefits.
Do feel free to submit patches for that sort of thing :)
I haven't played with ByteBuffer before, so do feel free to suggest how it
might help + point at code examples / patches that show it
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: HSSF: Middle-ground API for reading an Excel spreadsheet
Posted by Daniel Noll <da...@nuix.com>.
On Friday 25 January 2008 02:38:14 Nick Burch wrote:
> I did a bit, the core of which is now in svn as
> MissingRecordAwareHSSFListener
I discovered that, it's a great help for handling the blank cells, line
endings and so forth while iterating through the cells.
> Is your formula related eventusermodel code in a format suitable for
> contributing back? It'd be handy to be able to put something in svn that
> would make dealing with the formula stuff much simpler. I'd be happy to
> spend a bit of time tidying it up / writing tests for it, if you could
> contribute it?
If I ever figure out how to handle it, I probably would contribute it back
because it would mean changes to how shared formulas work. At the moment as
you say, it does require a Workbook. At the moment I don't have a Workbook
to work with. Maybe I can store off the first however many records and then
create the Workbook from those -- I haven't tried so I don't know what
happens if you feed in a list of records without the ones which make up the
read of the file.
> I think there was some talk a few years back, but nothing really came of
> it. The problem is that it'd take a large amount of programmer time, and
> memory seems to be fairly cheap.
Memory is indeed cheap, but unless you have the luxury of a 64-bit JVM, there
is an upper limit of somewhere around 1.4GB, sometimes less. This would
normally be nearly 2GB but Windows allocates some DLLs in weird positions on
some systems, and Sun insist on allocating a contiguous block of memory for
the heap which sometimes causes a huge unusable memory hole above that.
"Normal" spreadsheets, where the number of cells isn't excessive, are not
really a problem. The problem is where some spreadsheet does have thousands
of rows and/or dozens of columns. Usually these will cause an OOME, but
allocation which gets close to an OOME without causing one is actually more
dangerous (some other thread suffers, too bad if it's something really
important.)
> I'm not sure how that'd work though. If we don't hold the contents of the
> records in memory, then how are we going to be able to do anything with
> them? (Maybe I'm missing something in your suggestion though)
To convert an Excel spreadsheet to text (or another format), all you need to
do is for each cell, store a text version somewhere (in a StringBuilder, in a
temp file, etc.) If you don't need to modify a cell then there is no reason
to have it in memory.
In actual fact for us, something closer to RecordInputStream would be even
better, where we can just say nextRecord() and have it return a properly
constructed Record. Then we have control over the loop, which is ideal when
you need to return a Reader.
> My hunch is that we'll have a peak use of somewhere around 3-5 times the
> size of the excel file in memory, except for very small files. There'll be
> one copy of the file in poifs, another in hssf, then each record will take
> a copy as it parses itself.
There was one 40MB file which hit the 1GB memory limit. It turns out the file
had a huge number of cells per row, but opening the file showed most of them
to be empty (they probably had styles or something on them which prompted
HSSF to store something about it.)
Underlying issue here is that even if a cell doesn't exist, sometimes there is
still memory allocated for it. HSSFRow stores the cells in an array which
means holes in the middle are still allocated a small amount of space. And
every HSSFCell holds references to many things. All these eat up memory when
you have a spreadsheet with a huge number of cells.
As far as the records keeping a copy, could they not instead keep an offset
and a reference to the original buffer? Then if someone calls a setter, it
would need to create a new buffer, set the offset to 0 and copy the data
before doing the actual set.
And as far as POIFS keeping a copy, yes... POIFS is full of issues like that.
For instance, even if all you need to read is the CLSID, you still have to
read the entire file. If POIFSFileSystem could construct from a ByteBuffer
and not take unnecessary copies, it could speed things up dramatically for
that situation... but ultimately that would need to propagate to the whole
framework for it to really show benefits.
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: HSSF: Middle-ground API for reading an Excel spreadsheet
Posted by Nick Burch <ni...@torchbox.com>.
On Wed, 23 Jan 2008, Daniel Noll wrote:
> I was wondering if anyone had experimented with doing lazy parsing via
> the eventusermodel interface. I've had an attempt at it myself but am
> running into various troubles.
I did a bit, the core of which is now in svn as
MissingRecordAwareHSSFListener
> The first one which is really problematic is that once I get a
> FormulaRecord, I can't find a way to convert that into the formula
> string. Thankfully getting the value result is relatively simple.
Formulas are surprisingly tricky. They're stored as a series of ptgs, and
turning them back into strings is quite hard. Then you have the fun of
shared formulas, so you'll have to track all the formulas to be able to
resolve those. Comes a point that you're holding so many records that you
might as well just give in and use usermodel :/
If you have a fairly simple formula, then you can probably turn them into
strings without needing a hssf.model.Workbook, using
hssf.model.FormulaParser. However, there are some ptgs that need the
workbook to turn into strings, so you might have problems with those.
Is your formula related eventusermodel code in a format suitable for
contributing back? It'd be handy to be able to put something in svn that
would make dealing with the formula stuff much simpler. I'd be happy to
spend a bit of time tidying it up / writing tests for it, if you could
contribute it?
> Have the HSSF developers considered making an API half way between
> usermodel and eventusermodel, which can return HSSFCell instances one at
> a time without instantiating the entire spreadsheet? It would be a
> really nice thing for saving memory.
I think there was some talk a few years back, but nothing really came of
it. The problem is that it'd take a large amount of programmer time, and
memory seems to be fairly cheap.
(From my perspective, I can buy a staggering amount of memory for all my
production servers for a couple of days billable rate. I suspect that
that holds for many of the other poi developers, so in the absense of
external sponsorship, I can't see it being a great priority for anyone.
Alas I think most of us have larger poi 'itches' than memory)
> (Although an implementation of the records which doesn't create copies
> of everything in memory would probably solve the memory problems almost
> as well.)
I'm not sure how that'd work though. If we don't hold the contents of the
records in memory, then how are we going to be able to do anything with
them? (Maybe I'm missing something in your suggestion though)
My hunch is that we'll have a peak use of somewhere around 3-5 times the
size of the excel file in memory, except for very small files. There'll be
one copy of the file in poifs, another in hssf, then each record will take
a copy as it parses itself.
Does anyone have a good memory profiling tool? While I can't see us
re-architecting poi any time soon (unless someone wants to sponsor it...),
if there are a few quick wins them I'm sure we can sort those. If someone
could spot where most of the memory does go, or any points in processing
when we use very large amounts of memory for a short spell, that'd be
helpful to know
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org