You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Mohsen <mo...@gmail.com> on 2012/11/06 06:47:14 UTC

Apache POI fails to save (HWPFDocument.write) large word doc files

I want to remove word metadata from .doc files. My .docx files works fine
with XWPFDocument, but the following code for removing metadata fails for
large (> 1MB) files. For example using a 6MB .doc file with images, it
outputs a 4.5MB file in which some images are removed.

public static InputStream removeMetaData(InputStream inputStream) throws
IOException {
    POIFSFileSystem fss = new POIFSFileSystem(inputStream);
    HWPFDocument doc = new HWPFDocument(fss);

    // if even fails on large files if you remove from here to 'until' below
    SummaryInformation si = doc.getSummaryInformation();
    si.removeAuthor();
    si.removeComments();
    si.removeLastAuthor();
    si.removeKeywords();
    si.removeSubject();
    si.removeTitle();

    doc.getDocumentSummaryInformation().removeCategory();
    doc.getDocumentSummaryInformation().removeCompany();
    doc.getDocumentSummaryInformation().removeManager();
    try {
        doc.getDocumentSummaryInformation().removeCustomProperties();
    } catch (Exception e) {
        // can not remove above
    }
    // until

    ByteArrayOutputStream os = new ByteArrayOutputStream();
    doc.write(os);
    os.flush();
    os.close();
    return new ByteArrayInputStream(os.toByteArray());
}




--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Apache-POI-fails-to-save-HWPFDocument-write-large-word-doc-files-tp5711410.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org