You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2012/03/20 07:37:14 UTC

DO NOT REPLY [Bug 52949] New: How to extract VBA Macros code from Excel file by using POI?

https://issues.apache.org/bugzilla/show_bug.cgi?id=52949

             Bug #: 52949
           Summary: How to extract VBA Macros code from Excel file by
                    using POI?
           Product: POI
           Version: 3.8-dev
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HSSF
        AssignedTo: dev@poi.apache.org
        ReportedBy: z.wx.js.c@qq.com
    Classification: Unclassified


Hi,Nick Burch
Recently, I want to extract VBA Macros code from Excel file.
Is it possible to extract Macros code from the inputstream of the Workbook by
using POI's method? When the value of the parameter [preserveNodes] in the
construct of HSSFWorkbook(InputStream s, boolean preserveNodes) is true, if
will preserve macros nodes. Could I have some method to extract the macros from
these nodes?

Thanks a lot!

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Sandipkaria <sk...@cimcon.com>.
Is Extracting VBA Code feature is added in NPOI version also?



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5716647.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52949

--- Comment #3 from Barry Lagerweij <bl...@gmail.com> ---
Created attachment 30052
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=30052&action=edit
Class which can extrace macro source code

Since POI does not provide access to this, I've written a class which allows
you to extract the sourcecode as text.

The two attached classes can be used together with POI (I've tested with 3.8
and 3.9) to process the xl/vbaProject.bin (for ooxml) or XLS file and retrieve
the sources.

The RLEDecompressingInputStream is an InputStream which can be used to
decompress the chunks as described in the MS-OVBA specification. It wraps
around a compressed inputstream (ussually a DocumentInputStream from the POIFS)
and decompresses on the fly to preserve memory.

The VBAMacroExtractor processes the OLE binary stream records, records the
CodePage (in order to convert byte-arrays to Strings) and will store the
ModuleOffset. This offset specifies the location in the MemoryStream where the
sourcecode starts. The VBAMacroExtractor has been written to automatically
detect XLSM or XLS, and uses POIFSReader to process the file only once and
preserve memory.

It might be worthwhile to enhance the POI workbook with classes which provide
access to the VBA modules, see Andrey Yesyev's contributions to the poi-dev
mailinglist.

I hope it's useful, feel free to use the sources under Apache2 license.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52949

Yegor Kozlov <ye...@dinom.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID

--- Comment #1 from Yegor Kozlov <ye...@dinom.ru> 2012-03-20 08:29:22 UTC ---
Unfortunately POI cannot read macro code. The main difficulty is that VBA isn't
stored as plain text but instead MS Office uses pretty complex format as
described in [MS-OVBA].pdf. 

At minimum, you can grab the node holding VBA code and try to parse it
yourself. The main source of information how to do that is [MS-OVBA].pdf, you
can download it from the Microsoft site.

Both HSSF and XSSF preserve macro nodes, this means that you can create
templates with macros in MS Office and then populates them with data using POI. 

Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Andrey Yesyev <an...@gmail.com>.
I will for sure.

Regards,
Andrey!


On 28 April 2012 11:03, Yegor Kozlov-4 [via Apache POI] <
ml-node+s1045710n5672249h92@n5.nabble.com> wrote:

> In case you figure out how to extract VBA , you are very much welcome
> to contribute this code back to POI.
>
> Yegor
>
> On Sat, Apr 28, 2012 at 11:56 AM, Andrey Yesyev <[hidden email]<http://user/SendEmail.jtp?type=node&node=5672249&i=0>>
> wrote:
>
> > Yes, you're right. It compressed with LZW algorthm (
> > http://en.wikipedia.org/wiki/LZW ). But according the spec, not all the
> > module, but only source code part.
> > Well, gonna dig deeper.
> > Thanks for reply!
> >
> > On 28 April 2012 10:51, Yegor Kozlov-4 [via Apache POI] <
> > [hidden email] <http://user/SendEmail.jtp?type=node&node=5672249&i=1>>
> wrote:
> >
> >> All I can suggest is to study the spec and write a decoder of VBA. I
> >> recall that macro code is compressed but it is not PKZIP but some
> >> other compression algorithm.
> >>
> >> If it is so, try to decompress your files first.
> >>
> >> I can't tell you exactly what to do - extracting VBA code is a big
> >> task and potentially can take days or weeks of work.
> >>
> >> Cheers,
> >> Yegor
> >>
> >> On Sat, Apr 28, 2012 at 11:32 AM, Andrey Yesyev <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=5672237&i=0>>
> >> wrote:
> >>
> >> > Hey guys!
> >> >
> >> > I'm working on the same issue, getting macros code from XLS file.
> >> > I got VBA modules raw data htis way
> >> >
> >> >                        HSSFWorkbook wb = new HSSFWorkbook(is);
> >> >
> >> >                        ExcelExtractor extractor = new
> >> ExcelExtractor(wb);
> >> >
> >> >                        List<HSSFObjectData> l =
> >> wb.getAllEmbeddedObjects();
> >> >                        DirectoryEntry root = extractor.getRoot();
> >> >                        for(Entry entry:root){
> >> >                                if(entry.isDirectoryEntry()){
> >> >                                        DirectoryEntry dirEntry =
> >> (DirectoryEntry)entry;
> >> >                                        if(dirEntry.hasEntry("VBA")){
> >> >                                                DirectoryNode node =
> >> (DirectoryNode) dirEntry.getEntry("VBA");
> >> >                                                for(Entry e:node){
> >> >                                                        DocumentNode
> >> docNode = (DocumentNode)e;
> >> >
> >>  if(!docNode.getName().startsWith("dir") &&
> >> >
> >>  !docNode.getName().startsWith("_VBA_PROJECT") &&
> >> >
> >>  !docNode.getName().startsWith("__SRP")){
> >> >
> >>  System.out.println(docNode.getName());
> >> >
> >>  DocumentInputStream dis = new DocumentInputStream(docNode);
> >> >                                                                byte[]
> >> data = IOUtils.toByteArray(dis);
> >> >
> >>  FileOutputStream fos = new
> >> > FileOutputStream("C:/"+docNode.getName()+"_data.txt");
> >> >
> >>  fos.write(data);
> >> >
> >>  fos.close();
> >> >                                                        }
> >> >                                                }
> >> >                                        }
> >> >                                }
> >> >                        }
> >> >
> >> > So I had 4 files. Sheet1_data.txt upto Sheet3_data.txt and
> >> > ThisWorkbook_data.txt.
> >> > My macros is in the ThisWorkbook_data.txt. But the problem is that I
> >> don't
> >> > recognize the format of these files as format of VBA module,
> described
> >> in
> >> > [MS-OVBA].pdf.
> >> >
> >> > Any ideas?
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672215.html
> >> > Sent from the POI - Dev mailing list archive at Nabble.com.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [hidden email]<
> http://user/SendEmail.jtp?type=node&node=5672237&i=1>
> >> > For additional commands, e-mail: [hidden email]<
> http://user/SendEmail.jtp?type=node&node=5672237&i=2>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]<
> http://user/SendEmail.jtp?type=node&node=5672237&i=3>
> >> For additional commands, e-mail: [hidden email]<
> http://user/SendEmail.jtp?type=node&node=5672237&i=4>
> >>
> >>
> >>
> >> ------------------------------
> >>  If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672237.html
> >>  To unsubscribe from DO NOT REPLY [Bug 52949] New: How to extract VBA
> >> Macros code from Excel file by using POI?, click here<
>
> >> .
> >> NAML<
> http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> >>
> >
> >
> > --
> > View this message in context:
> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672244.html
>
> > Sent from the POI - Dev mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672249&i=2>
> For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672249&i=3>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672249.html
>  To unsubscribe from DO NOT REPLY [Bug 52949] New: How to extract VBA
> Macros code from Excel file by using POI?, click here<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5579175&code=YW5kcmV5Lnllc3lldkBnbWFpbC5jb218NTU3OTE3NXwyNDA3NTQzMw==>
> .
> NAML<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672252.html
Sent from the POI - Dev mailing list archive at Nabble.com.

Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Yegor Kozlov <ye...@dinom.ru>.
In case you figure out how to extract VBA , you are very much welcome
to contribute this code back to POI.

Yegor

On Sat, Apr 28, 2012 at 11:56 AM, Andrey Yesyev <an...@gmail.com> wrote:
> Yes, you're right. It compressed with LZW algorthm (
> http://en.wikipedia.org/wiki/LZW ). But according the spec, not all the
> module, but only source code part.
> Well, gonna dig deeper.
> Thanks for reply!
>
> On 28 April 2012 10:51, Yegor Kozlov-4 [via Apache POI] <
> ml-node+s1045710n5672237h20@n5.nabble.com> wrote:
>
>> All I can suggest is to study the spec and write a decoder of VBA. I
>> recall that macro code is compressed but it is not PKZIP but some
>> other compression algorithm.
>>
>> If it is so, try to decompress your files first.
>>
>> I can't tell you exactly what to do - extracting VBA code is a big
>> task and potentially can take days or weeks of work.
>>
>> Cheers,
>> Yegor
>>
>> On Sat, Apr 28, 2012 at 11:32 AM, Andrey Yesyev <[hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=0>>
>> wrote:
>>
>> > Hey guys!
>> >
>> > I'm working on the same issue, getting macros code from XLS file.
>> > I got VBA modules raw data htis way
>> >
>> >                        HSSFWorkbook wb = new HSSFWorkbook(is);
>> >
>> >                        ExcelExtractor extractor = new
>> ExcelExtractor(wb);
>> >
>> >                        List<HSSFObjectData> l =
>> wb.getAllEmbeddedObjects();
>> >                        DirectoryEntry root = extractor.getRoot();
>> >                        for(Entry entry:root){
>> >                                if(entry.isDirectoryEntry()){
>> >                                        DirectoryEntry dirEntry =
>> (DirectoryEntry)entry;
>> >                                        if(dirEntry.hasEntry("VBA")){
>> >                                                DirectoryNode node =
>> (DirectoryNode) dirEntry.getEntry("VBA");
>> >                                                for(Entry e:node){
>> >                                                        DocumentNode
>> docNode = (DocumentNode)e;
>> >
>>  if(!docNode.getName().startsWith("dir") &&
>> >
>>  !docNode.getName().startsWith("_VBA_PROJECT") &&
>> >
>>  !docNode.getName().startsWith("__SRP")){
>> >
>>  System.out.println(docNode.getName());
>> >
>>  DocumentInputStream dis = new DocumentInputStream(docNode);
>> >                                                                byte[]
>> data = IOUtils.toByteArray(dis);
>> >
>>  FileOutputStream fos = new
>> > FileOutputStream("C:/"+docNode.getName()+"_data.txt");
>> >
>>  fos.write(data);
>> >
>>  fos.close();
>> >                                                        }
>> >                                                }
>> >                                        }
>> >                                }
>> >                        }
>> >
>> > So I had 4 files. Sheet1_data.txt upto Sheet3_data.txt and
>> > ThisWorkbook_data.txt.
>> > My macros is in the ThisWorkbook_data.txt. But the problem is that I
>> don't
>> > recognize the format of these files as format of VBA module, described
>> in
>> > [MS-OVBA].pdf.
>> >
>> > Any ideas?
>> >
>> > --
>> > View this message in context:
>> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672215.html
>> > Sent from the POI - Dev mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=1>
>> > For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=2>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=3>
>> For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=4>
>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672237.html
>>  To unsubscribe from DO NOT REPLY [Bug 52949] New: How to extract VBA
>> Macros code from Excel file by using POI?, click here<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5579175&code=YW5kcmV5Lnllc3lldkBnbWFpbC5jb218NTU3OTE3NXwyNDA3NTQzMw==>
>> .
>> NAML<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672244.html
> Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Andrey Yesyev <an...@gmail.com>.
Yes, you're right. It compressed with LZW algorthm (
http://en.wikipedia.org/wiki/LZW ). But according the spec, not all the
module, but only source code part.
Well, gonna dig deeper.
Thanks for reply!

On 28 April 2012 10:51, Yegor Kozlov-4 [via Apache POI] <
ml-node+s1045710n5672237h20@n5.nabble.com> wrote:

> All I can suggest is to study the spec and write a decoder of VBA. I
> recall that macro code is compressed but it is not PKZIP but some
> other compression algorithm.
>
> If it is so, try to decompress your files first.
>
> I can't tell you exactly what to do - extracting VBA code is a big
> task and potentially can take days or weeks of work.
>
> Cheers,
> Yegor
>
> On Sat, Apr 28, 2012 at 11:32 AM, Andrey Yesyev <[hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=0>>
> wrote:
>
> > Hey guys!
> >
> > I'm working on the same issue, getting macros code from XLS file.
> > I got VBA modules raw data htis way
> >
> >                        HSSFWorkbook wb = new HSSFWorkbook(is);
> >
> >                        ExcelExtractor extractor = new
> ExcelExtractor(wb);
> >
> >                        List<HSSFObjectData> l =
> wb.getAllEmbeddedObjects();
> >                        DirectoryEntry root = extractor.getRoot();
> >                        for(Entry entry:root){
> >                                if(entry.isDirectoryEntry()){
> >                                        DirectoryEntry dirEntry =
> (DirectoryEntry)entry;
> >                                        if(dirEntry.hasEntry("VBA")){
> >                                                DirectoryNode node =
> (DirectoryNode) dirEntry.getEntry("VBA");
> >                                                for(Entry e:node){
> >                                                        DocumentNode
> docNode = (DocumentNode)e;
> >
>  if(!docNode.getName().startsWith("dir") &&
> >
>  !docNode.getName().startsWith("_VBA_PROJECT") &&
> >
>  !docNode.getName().startsWith("__SRP")){
> >
>  System.out.println(docNode.getName());
> >
>  DocumentInputStream dis = new DocumentInputStream(docNode);
> >                                                                byte[]
> data = IOUtils.toByteArray(dis);
> >
>  FileOutputStream fos = new
> > FileOutputStream("C:/"+docNode.getName()+"_data.txt");
> >
>  fos.write(data);
> >
>  fos.close();
> >                                                        }
> >                                                }
> >                                        }
> >                                }
> >                        }
> >
> > So I had 4 files. Sheet1_data.txt upto Sheet3_data.txt and
> > ThisWorkbook_data.txt.
> > My macros is in the ThisWorkbook_data.txt. But the problem is that I
> don't
> > recognize the format of these files as format of VBA module, described
> in
> > [MS-OVBA].pdf.
> >
> > Any ideas?
> >
> > --
> > View this message in context:
> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672215.html
> > Sent from the POI - Dev mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=1>
> > For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=2>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=3>
> For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672237&i=4>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672237.html
>  To unsubscribe from DO NOT REPLY [Bug 52949] New: How to extract VBA
> Macros code from Excel file by using POI?, click here<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5579175&code=YW5kcmV5Lnllc3lldkBnbWFpbC5jb218NTU3OTE3NXwyNDA3NTQzMw==>
> .
> NAML<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672244.html
Sent from the POI - Dev mailing list archive at Nabble.com.

Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Andrey Yesyev <an...@gmail.com>.
Forgot to add diff file.
Here it is. patch.zip
<http://apache-poi.1045710.n5.nabble.com/file/n5712308/patch.zip>  



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5712308.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Andrey Yesyev <an...@gmail.com>.
Sorry guys, it takes me so long...
Here is the patch for macro extraction, hope it'll be useful..

patch.tar.gz
<http://apache-poi.1045710.n5.nabble.com/file/n5712300/patch.tar.gz>  



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5712300.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Yegor Kozlov <ye...@dinom.ru>.
Great! Can you create a patch in the "svn diff" format and upload it
in Bugzilla? You can re-open Bug 52949 or create a new one.

Basic contribution guidance can be found here:
http://poi.apache.org/guidelines.html

Yegor

On Tue, Jun 19, 2012 at 8:08 PM, Andrey Yesyev <an...@gmail.com> wrote:
> Yegor,
>
> I have implemented macro extraction from XLS and DOC files.
> Want to add it to XLSX and DOCX files.
>
> How can I contribute these changes?
>
> Regards,
> Andrey!
>
>
> On 28 April 2012 08:30, Nick Burch-11 [via Apache POI] <
> ml-node+s1045710n5672539h82@n5.nabble.com> wrote:
>
>> On Sat, 28 Apr 2012, Yegor Kozlov wrote:
>> > All I can suggest is to study the spec and write a decoder of VBA. I
>> > recall that macro code is compressed but it is not PKZIP but some other
>> > compression algorithm.
>>
>> I think it might well be a tweaked LZW. We already have two LZW
>> implementations handled in POI, as some formats use the flag bit one way,
>> some the other... See org.apache.poi.util.LZWDecompresser. You may need
>> some trial + error and/or looking at hex dumps with known text to work out
>> what the options used for VBA are.
>>
>> Nick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672539&i=0>
>> For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672539&i=1>
>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672539.html
>>  To unsubscribe from DO NOT REPLY [Bug 52949] New: How to extract VBA
>> Macros code from Excel file by using POI?, click here<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5579175&code=YW5kcmV5Lnllc3lldkBnbWFpbC5jb218NTU3OTE3NXwyNDA3NTQzMw==>
>> .
>> NAML<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5710243.html
> Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Andrey Yesyev <an...@gmail.com>.
Yegor,

I have implemented macro extraction from XLS and DOC files.
Want to add it to XLSX and DOCX files.

How can I contribute these changes?

Regards,
Andrey!


On 28 April 2012 08:30, Nick Burch-11 [via Apache POI] <
ml-node+s1045710n5672539h82@n5.nabble.com> wrote:

> On Sat, 28 Apr 2012, Yegor Kozlov wrote:
> > All I can suggest is to study the spec and write a decoder of VBA. I
> > recall that macro code is compressed but it is not PKZIP but some other
> > compression algorithm.
>
> I think it might well be a tweaked LZW. We already have two LZW
> implementations handled in POI, as some formats use the flag bit one way,
> some the other... See org.apache.poi.util.LZWDecompresser. You may need
> some trial + error and/or looking at hex dumps with known text to work out
> what the options used for VBA are.
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672539&i=0>
> For additional commands, e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=5672539&i=1>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672539.html
>  To unsubscribe from DO NOT REPLY [Bug 52949] New: How to extract VBA
> Macros code from Excel file by using POI?, click here<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5579175&code=YW5kcmV5Lnllc3lldkBnbWFpbC5jb218NTU3OTE3NXwyNDA3NTQzMw==>
> .
> NAML<http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5710243.html
Sent from the POI - Dev mailing list archive at Nabble.com.

Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Nick Burch <ni...@alfresco.com>.
On Sat, 28 Apr 2012, Yegor Kozlov wrote:
> All I can suggest is to study the spec and write a decoder of VBA. I 
> recall that macro code is compressed but it is not PKZIP but some other 
> compression algorithm.

I think it might well be a tweaked LZW. We already have two LZW 
implementations handled in POI, as some formats use the flag bit one way, 
some the other... See org.apache.poi.util.LZWDecompresser. You may need 
some trial + error and/or looking at hex dumps with known text to work out 
what the options used for VBA are.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Yegor Kozlov <ye...@dinom.ru>.
All I can suggest is to study the spec and write a decoder of VBA. I
recall that macro code is compressed but it is not PKZIP but some
other compression algorithm.

If it is so, try to decompress your files first.

I can't tell you exactly what to do - extracting VBA code is a big
task and potentially can take days or weeks of work.

Cheers,
Yegor

On Sat, Apr 28, 2012 at 11:32 AM, Andrey Yesyev <an...@gmail.com> wrote:
> Hey guys!
>
> I'm working on the same issue, getting macros code from XLS file.
> I got VBA modules raw data htis way
>
>                        HSSFWorkbook wb = new HSSFWorkbook(is);
>
>                        ExcelExtractor extractor = new ExcelExtractor(wb);
>
>                        List<HSSFObjectData> l = wb.getAllEmbeddedObjects();
>                        DirectoryEntry root = extractor.getRoot();
>                        for(Entry entry:root){
>                                if(entry.isDirectoryEntry()){
>                                        DirectoryEntry dirEntry = (DirectoryEntry)entry;
>                                        if(dirEntry.hasEntry("VBA")){
>                                                DirectoryNode node = (DirectoryNode) dirEntry.getEntry("VBA");
>                                                for(Entry e:node){
>                                                        DocumentNode docNode = (DocumentNode)e;
>                                                        if(!docNode.getName().startsWith("dir") &&
>                                                                        !docNode.getName().startsWith("_VBA_PROJECT") &&
>                                                                        !docNode.getName().startsWith("__SRP")){
>                                                                System.out.println(docNode.getName());
>                                                                DocumentInputStream dis = new DocumentInputStream(docNode);
>                                                                byte[] data = IOUtils.toByteArray(dis);
>                                                                FileOutputStream fos = new
> FileOutputStream("C:/"+docNode.getName()+"_data.txt");
>                                                                fos.write(data);
>                                                                fos.close();
>                                                        }
>                                                }
>                                        }
>                                }
>                        }
>
> So I had 4 files. Sheet1_data.txt upto Sheet3_data.txt and
> ThisWorkbook_data.txt.
> My macros is in the ThisWorkbook_data.txt. But the problem is that I don't
> recognize the format of these files as format of VBA module, described in
> [MS-OVBA].pdf.
>
> Any ideas?
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672215.html
> Sent from the POI - Dev mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by Andrey Yesyev <an...@gmail.com>.
Hey guys!

I'm working on the same issue, getting macros code from XLS file.
I got VBA modules raw data htis way

                        HSSFWorkbook wb = new HSSFWorkbook(is);
			
			ExcelExtractor extractor = new ExcelExtractor(wb);
			
			List<HSSFObjectData> l = wb.getAllEmbeddedObjects();
			DirectoryEntry root = extractor.getRoot();
			for(Entry entry:root){
				if(entry.isDirectoryEntry()){
					DirectoryEntry dirEntry = (DirectoryEntry)entry;
					if(dirEntry.hasEntry("VBA")){
						DirectoryNode node = (DirectoryNode) dirEntry.getEntry("VBA");
						for(Entry e:node){
							DocumentNode docNode = (DocumentNode)e;
							if(!docNode.getName().startsWith("dir") &&
									!docNode.getName().startsWith("_VBA_PROJECT") &&
									!docNode.getName().startsWith("__SRP")){
								System.out.println(docNode.getName());
								DocumentInputStream dis = new DocumentInputStream(docNode);
								byte[] data = IOUtils.toByteArray(dis);
								FileOutputStream fos = new                
FileOutputStream("C:/"+docNode.getName()+"_data.txt");
								fos.write(data);
								fos.close();
							}
						}
					}
				}
			}

So I had 4 files. Sheet1_data.txt upto Sheet3_data.txt and
ThisWorkbook_data.txt.
My macros is in the ThisWorkbook_data.txt. But the problem is that I don't
recognize the format of these files as format of VBA module, described in
[MS-OVBA].pdf.

Any ideas?

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/DO-NOT-REPLY-Bug-52949-New-How-to-extract-VBA-Macros-code-from-Excel-file-by-using-POI-tp5579175p5672215.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52949

--- Comment #2 from shh <z....@qq.com> 2012-03-23 12:17:56 UTC ---
Hi,Yegor
Thanks for your answer!
I also noticed your email on
[http://apache-poi.1045710.n5.nabble.com/Google-Summer-of-Code-Apache-POI-td5582557.html].
Depending on your tips, I get some ideas from this
website[http://www.cpearson.com/Excel/vbe.aspx], and in this way, it may export
the VBComponent code module to a text file(Of course, my approach is not a pure
java way).
First, using the JACOB(Java COM Bridge) to call the macro which in the VBA.
Second, the macro in the VBA which exports the existing VBComponent Code Module
to a text file.
Finally, we could extrat the source code from the output text
file(".frm",".bas" etc.).
Unfortunately, now I can't make the VBA source code visible, when the VBA
Project has been encrypted and I can't supply the password.
Is there anyway I could make the make the VBA source code fully visible without
providing password?

Any idea is welcome.
Thanks a lot!

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 52949] How to extract VBA Macros code from Excel file by using POI?

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=52949

--- Comment #4 from Parshant Sehrawat <se...@gmail.com> ---
How would I extract the vba macro code from a doc and OOXML docx file format.
Does it necessary to store them in macro enabled format. May be this is a wrong
place for posting this question. But I don't know whether I should create
another bug or continue in the same because the two are related.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org