You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by "Height, Jason" <JH...@subcorp.com.au> on 2003/02/24 03:52:27 UTC

RE: ValueRecordsAggregate performance branch

Andy,

Ok I guess that I don't completely understand what is going on then! I
thought that the existing insertCell method would be all that is required,
without the need to decomposing the cell inorder to save the cell ie simply
save the object reference as was originally being done, but in a structure
that would enable fast row lookup and fast row cell iteration.

Since I don't exactly know what is going on I think that it may be best if
you do the merge of this class into the main branch. After which I will
supply another patch to 9576.

Jason


-----Original Message-----
From: Andrew C. Oliver [mailto:acoliver@apache.org] 
Sent: Thursday, 20 February 2003 23:42
To: Height, Jason
Subject: Re: ValueRecordsAggregate performance branch [Scanned For Viruses]

Height, Jason wrote:

> Andy,
>
> Ok I think that I roughly understand what you are trying to do on the 
> performance branch wrt ValueRecordsAggregate in that you are trying to 
> store the contents of a cell as a primitive rather than as an object 
> (LabelSST etc). I can see that this will provide memory improvements, 
> according to OptimizeIt a LabelSST object occupies 30bytes whereas you 
> are only storing the index to the string which is a double ie 8 bytes, 
> so there is a 22 byte overhead per LabelSST object.
>
Note that I do this slightly wastefully, Only 4 bytes are really 
required, but in order to make things clean and not have yet another 
array, its stored in the same place as #s.

> However if I am going to iterate over each cell wont there will be 
> some performance problems in that the ValueRecordsAggregate will need 
> to instanciate and then GC a lot of objects? So although you have 
> saved memory for storing the workbook, additional memory and 
> processing is required to use the workbook!
>
Eventually.  I'm into refactoring and stuff.  So this change is pretty 
much a huge refactoring.  Doing it all at once seems wrought with 
disaster (at least to me).  My thought was 1st graft the new method into 
existing structures.  Then later make another pass of optimizations 
(maybe after 2.0, not sure). 

> I think that the VRAIterator is FANTASTIC. It fixes my performance 
> problems with my DBCell etc patch.
>  
>
Cool.

>
> Why couldn't we just implement the data structures for quick access 
> without decomposing them from the java objects? In my opinion (for 
> what it is worth) the conversion from and to the java object 
> representations is not a good idea, it makes the whole thing much more 
> complicated, needs to expose additional methods, will be harder to 
> maintain and the CPU time required to create and maintain the data in 
> both directions may be more significant (and not worth the memory saving).
>
In due time.  Note though that some decomposition is necessary to make 
the code manageable (even if it wastes memory).  I can't imagine bit 
twiddling on byte arrays of this stuff.  Furtheremore if the file were 
treated too flatly, we'd end up doing lots of extra array copies 
whenever an insert was done in the middle...  Still I think memory 
mapping should be a goal for a post-2.0 release (formulas and graphing 
first).

> PS Ever noticed that the Record object holds onto the byte data array 
> from which it is created. It doesn't need to. Basically it is wasting 
> as much memory as the size of the excel file that was opened. Is this 
> intentional?
>
Every record does?  If so, that is a big mistake IMNSHO.

Anyhow, no need to write this privately.  I'm thick skinned when the 
conversation is technical :-).  We sould eliminate the record holding 
onto the data if it indeed is doing that.  Thats a monumental waste of 
memory.

Oh yeah.. I meant to propose you a committer....gotta do that this morning.

-Andy

> Jason
>
>
>
----------------------------------------------------------------------------
----------------------------------------
> This e-mail (including attachments) is confidential information of 
> Australian Submarine Corporation Pty Limited (ASC). It may also be 
> legally privileged. Unauthorised use and disclosure is prohibited. ASC 
> is not taken to have waived confidentiality or privilege if this 
> e-mail was sent to you in error. If you have received it in error, 
> please notify the sender promptly. While ASC takes steps to identify 
> and eliminate viruses, it cannot confirm that this e-mail is free from 
> them. You should scan this e-mail for viruses before it is used. The 
> statements in this e-mail are those of the sender only, unless 
> specifically stated to be those of ASC by someone with authority to do so.



--------------------------------------------------------------------------------------------------------------------
This e-mail (including attachments) is confidential information of Australian Submarine Corporation Pty Limited (ASC).  It may also be legally privileged.  Unauthorised use and disclosure is prohibited.  ASC is not taken to have waived confidentiality or privilege if this e-mail was sent to you in error. If you have received it in error, please notify the sender promptly.  While ASC takes steps to identify and eliminate viruses, it cannot confirm that this e-mail is free from them.  You should scan this e-mail for viruses before it is used.  The statements in this e-mail are those of the sender only, unless specifically stated to be those of ASC by someone with authority to do so.

Re: ValueRecordsAggregate performance branch

Posted by "Andrew C. Oliver" <ac...@apache.org>.
Height, Jason wrote:

>Andy,
>
>Ok I guess that I don't completely understand what is going on then! I
>thought that the existing insertCell method would be all that is required,
>without the need to decomposing the cell inorder to save the cell ie simply
>save the object reference as was originally being done, but in a structure
>that would enable fast row lookup and fast row cell iteration.
>  
>
Thats what takes too much memory.

>Since I don't exactly know what is going on I think that it may be best if
>you do the merge of this class into the main branch. After which I will
>supply another patch to 9576.
>  
>
Okay.  Danny is looking into it.  If I do it, it will have to wait.

-Andy

>Jason
>
>
>-----Original Message-----
>From: Andrew C. Oliver [mailto:acoliver@apache.org] 
>Sent: Thursday, 20 February 2003 23:42
>To: Height, Jason
>Subject: Re: ValueRecordsAggregate performance branch [Scanned For Viruses]
>
>Height, Jason wrote:
>
>  
>
>>Andy,
>>
>>Ok I think that I roughly understand what you are trying to do on the 
>>performance branch wrt ValueRecordsAggregate in that you are trying to 
>>store the contents of a cell as a primitive rather than as an object 
>>(LabelSST etc). I can see that this will provide memory improvements, 
>>according to OptimizeIt a LabelSST object occupies 30bytes whereas you 
>>are only storing the index to the string which is a double ie 8 bytes, 
>>so there is a 22 byte overhead per LabelSST object.
>>
>>    
>>
>Note that I do this slightly wastefully, Only 4 bytes are really 
>required, but in order to make things clean and not have yet another 
>array, its stored in the same place as #s.
>
>  
>
>>However if I am going to iterate over each cell wont there will be 
>>some performance problems in that the ValueRecordsAggregate will need 
>>to instanciate and then GC a lot of objects? So although you have 
>>saved memory for storing the workbook, additional memory and 
>>processing is required to use the workbook!
>>
>>    
>>
>Eventually.  I'm into refactoring and stuff.  So this change is pretty 
>much a huge refactoring.  Doing it all at once seems wrought with 
>disaster (at least to me).  My thought was 1st graft the new method into 
>existing structures.  Then later make another pass of optimizations 
>(maybe after 2.0, not sure). 
>
>  
>
>>I think that the VRAIterator is FANTASTIC. It fixes my performance 
>>problems with my DBCell etc patch.
>> 
>>
>>    
>>
>Cool.
>
>  
>
>>Why couldn't we just implement the data structures for quick access 
>>without decomposing them from the java objects? In my opinion (for 
>>what it is worth) the conversion from and to the java object 
>>representations is not a good idea, it makes the whole thing much more 
>>complicated, needs to expose additional methods, will be harder to 
>>maintain and the CPU time required to create and maintain the data in 
>>both directions may be more significant (and not worth the memory saving).
>>
>>    
>>
>In due time.  Note though that some decomposition is necessary to make 
>the code manageable (even if it wastes memory).  I can't imagine bit 
>twiddling on byte arrays of this stuff.  Furtheremore if the file were 
>treated too flatly, we'd end up doing lots of extra array copies 
>whenever an insert was done in the middle...  Still I think memory 
>mapping should be a goal for a post-2.0 release (formulas and graphing 
>first).
>
>  
>
>>PS Ever noticed that the Record object holds onto the byte data array 
>>from which it is created. It doesn't need to. Basically it is wasting 
>>as much memory as the size of the excel file that was opened. Is this 
>>intentional?
>>
>>    
>>
>Every record does?  If so, that is a big mistake IMNSHO.
>
>Anyhow, no need to write this privately.  I'm thick skinned when the 
>conversation is technical :-).  We sould eliminate the record holding 
>onto the data if it indeed is doing that.  Thats a monumental waste of 
>memory.
>
>Oh yeah.. I meant to propose you a committer....gotta do that this morning.
>
>-Andy
>
>  
>
>>Jason
>>
>>
>>
>>    
>>
>----------------------------------------------------------------------------
>----------------------------------------
>  
>
>>This e-mail (including attachments) is confidential information of 
>>Australian Submarine Corporation Pty Limited (ASC). It may also be 
>>legally privileged. Unauthorised use and disclosure is prohibited. ASC 
>>is not taken to have waived confidentiality or privilege if this 
>>e-mail was sent to you in error. If you have received it in error, 
>>please notify the sender promptly. While ASC takes steps to identify 
>>and eliminate viruses, it cannot confirm that this e-mail is free from 
>>them. You should scan this e-mail for viruses before it is used. The 
>>statements in this e-mail are those of the sender only, unless 
>>specifically stated to be those of ASC by someone with authority to do so.
>>    
>>
>
>
>
>--------------------------------------------------------------------------------------------------------------------
>This e-mail (including attachments) is confidential information of Australian Submarine Corporation Pty Limited (ASC).  It may also be legally privileged.  Unauthorised use and disclosure is prohibited.  ASC is not taken to have waived confidentiality or privilege if this e-mail was sent to you in error. If you have received it in error, please notify the sender promptly.  While ASC takes steps to identify and eliminate viruses, it cannot confirm that this e-mail is free from them.  You should scan this e-mail for viruses before it is used.  The statements in this e-mail are those of the sender only, unless specifically stated to be those of ASC by someone with authority to do so.
>
>  
>