You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Chris Bamford <cb...@mimecast.com> on 2013/07/29 15:16:38 UTC

Event driven file handling for POI-HSLF Powerpoint '97(-2007)

Dear all,

I am running into OOM issues when trying to handle powerpoint files containing large sound and image files.  The problem occurs during the constructor phase as it appears to attempt to load all files into memory at once.
Is there an alternative approach I can use whereby the data of embedded files / images can written out to streams instead?  I notice Word / HWPF has such a facility with this method:

  /**
   * Returns picture object tied to specified CharacterRun
   * @param run
   * @param fillBytes if true, Picture will be returned with filled byte array that represent picture's contents. If you don't want
   * to have that byte array in memory but only write picture's contents to stream, pass false and then use Picture.writeImageContent
   * @see Picture#writeImageContent(java.io.OutputStream)
   * @return a Picture object if picture exists for specified CharacterRun, null otherwise. PicturesTable.hasPicture is used to determine this.
   * @see #hasPicture(org.apache.poi.hwpf.usermodel.CharacterRun)
   */
  public Picture extractPicture(CharacterRun run, boolean fillBytes) {
    if (hasPicture(run)) {
      return new Picture(run.getPicOffset(), _dataStream, fillBytes);
    }
    return null;
  }

Thanks so much for any pointers.

- Chris

[<mc type="clicklink" code="blog">]
[ <mc type="clicklink" code="twitter"> ] 
[ <mc type="clicklink" code="youtube"> ] 
[ <mc type="clicklink" code="LinkedIn">] 
           
Chris Bamford
m: +44 7860 405292
<mc type="clicklink" code="website">
 
Mimecast
 CityPoint
One Ropemaker Street, EC2Y 9AW
+44 (0) 207 847 8700
                      
        
Disclaimer

 cbamford@mimecast.com sent at 2013-07-29 14:16:41is confidential and may be legally privileged. It is intended solely for use by user@poi.apache.organd others  authorized to receive it. If you are not user@poi.apache.org
you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly  prohibited and may be unlawful.<br /><br />Mimecast Ltd. is a company registered in England and Wales with the company number 4698693 VAT No. GB 123 4197 34
Registered Office: CityPoint, One Ropemaker Street, Moorgate, London, EC2Y 9AW

This email message has been scanned for viruses by Mimecast. Mimecast delivers a complete managed email solution from a single web based platform. For more information please visit <a href="http://www.mimecast.com" target="_blank">
                      
 mcst2013</a>



Re: Event driven file handling for POI-HSLF Powerpoint '97(-2007)

Posted by Chris Bamford <cb...@mimecast.com>.
Thanks Nick, will look into that.
Cheers,

- Chris

On 29 Jul 2013, at 15:27, Nick Burch wrote:

> On Mon, 29 Jul 2013, Chris Bamford wrote:
>> I am running into OOM issues when trying to handle powerpoint files containing large sound and image files.  The problem occurs during the constructor phase as it appears to attempt to load all files into memory at once.
> 
> HSLF is DOM-like. However, it should only need a few times the size of the file to process it in, especially if you load it from a NPOIFSFileSystem created from a File rather than an InputStream
> 
> Are you able to bump up your heap size a lot so it loads, then use a profiler to track down which bits are using all the memory? It's possible that one or two large bits could be converted to lazy-loading or something like that, to reduce the footprint
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Event driven file handling for POI-HSLF Powerpoint '97(-2007)

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 29 Jul 2013, Chris Bamford wrote:
> I am running into OOM issues when trying to handle powerpoint files 
> containing large sound and image files.  The problem occurs during the 
> constructor phase as it appears to attempt to load all files into memory 
> at once.

HSLF is DOM-like. However, it should only need a few times the size of the 
file to process it in, especially if you load it from a NPOIFSFileSystem 
created from a File rather than an InputStream

Are you able to bump up your heap size a lot so it loads, then use a 
profiler to track down which bits are using all the memory? It's possible 
that one or two large bits could be converted to lazy-loading or something 
like that, to reduce the footprint

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org