You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Barry Lagerweij <bl...@gmail.com> on 2013/03/14 20:54:12 UTC

[Bug 52949] New: How to extract VBA Macros code from Excel file by using POI?

Hi,

I've been looking for a way to extract the source-code of VBA Modules and
macros using POI. Since POI does not provide access to this, I've written a
class which allows you to extract the sourcecode as text.

The two attached classes can be used together with POI (I've tested with
3.8 and 3.9) to process the vbaProject.bin (for ooxml) and XLS file and
retrieve the sources.

The RLEDecompressingInputStream is an InputStream which can be used to
decompress the chunks as described in the MS-OVBA specification. It wraps
around a compressed inputstream (ussually a DocumentInputStream from the
POIFS) and decompresses on the fly to preserve memory.

The VBAMacroExtractor processes the OLE binary stream records, records the
CodePage (in order to convert byte-arrays to Strings) and will store the
ModuleOffset. This offset specifies the location in the MemoryStream where
the sourcecode starts. The VBAMacroExtractor has been written to
automatically detect XLSM or XLS, and uses POIFSReader to process the file
only once and preserve memory.

It might be worthwhile to enhance the POI workbook with classes which
provide access to the VBA modules, see Andrey Yesyev's contributions to the
Nabble mailinglist.

I hope it's useful, feel free to use the sources under Apache2 license.

With kind regards,

Barry