You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Raindog <ra...@macrohmasheen.com> on 2012/01/28 01:38:32 UTC

Extracting a byte[] of a record.

Hello,

I've just found POI so I am quite new to using it (and I'm learning Java 
as I go).

Anyways, My particular use case is that I need to extract a byte[] of an 
object inserted into an office document and apply some logic to the 
byte[] (I'm looking for embedded PE files). I have a test spreadsheet 
where I embedded notepad.exe to test with.

The following is my code:

package cs.harvester.office;

import java.util.List;

import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.record.CommonObjectDataSubRecord;
import org.apache.poi.hssf.record.EmbeddedObjectRefSubRecord;
import org.apache.poi.hssf.record.ObjRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.SubRecord;
import com.sun.org.apache.xml.internal.security.utils.Base64;



public class EventExample implements HSSFListener {

     @Override
     public void processRecord(Record record) {
         switch (record.getSid()) {
             case ObjRecord.sid:
                 ObjRecord objRec = (ObjRecord)record;

                 System.out.println("Obj: ");
                 //
                 List<SubRecord> subRecords = objRec.getSubRecords();
                 for(SubRecord subRecord : subRecords) {
                     System.out.print("    SubRecord: " + 
subRecord.getClass().getSimpleName() + ": ");
                     System.out.println(subRecord.serialize().length);

                     if(subRecord instanceof CommonObjectDataSubRecord) {
                         CommonObjectDataSubRecord tmp = 
(CommonObjectDataSubRecord)subRecord;

                     } else if (subRecord instanceof 
EmbeddedObjectRefSubRecord){
                         EmbeddedObjectRefSubRecord tmp = 
(EmbeddedObjectRefSubRecord)subRecord;
                         System.out.println("        EmbeddedObject: " + 
tmp.getOLEClassName() + " Stream ID: " + tmp.getStreamId() + " 
ObjectData Length: " + tmp.getObjectData().length);
                         System.out.println("        EmbeddedObject: " + 
Base64.encode(tmp.getObjectData()));
                     }
                 }
                 break;
             default:
                 System.out.println(record.getClass().getSimpleName());
         }
     }
}


The problem I have is that the ObjectData byte[] is always length 0. How 
should I go about getting the correct byte[] for the object?

Thanks,
Raindog

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Raindog <ra...@macrohmasheen.com>.
On 1/28/2012 6:06 AM, Nick Burch wrote:
> On Fri, 27 Jan 2012, Raindog wrote:
>> The problem I have is that the ObjectData byte[] is always length 0. 
>> How should I go about getting the correct byte[] for the object?
>
> Are you sure that's the record with the data in it? You can use 
> org.apache.poi.hssf.dev.BiffViewer to check where your known content 
> lives
>
> Also, Excel normally stores embedded resources in other POIFS streams, 
> rather than in the record structure. Look for POIFS entries starting 
> with MBD and those should have your data in
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Nick,

I tried what you said, but BiffViewer didn't give me any further clues. 
This is the corresponding output for the object:

SUBRECORD: org.apache.poi.hssf.record.SubRecord$UnknownSubRecord 
[sid=0x0007 size=2 : [02, 00]]
SUBRECORD: org.apache.poi.hssf.record.SubRecord$UnknownSubRecord 
[sid=0x0008 size=2 : [01, 00]]
SUBRECORD: [ftPictFmla]
     .f2unknown     = 0x04B46900
     .f3unknown     = [02, 00, C8, E2, 01]
     .unicodeFlag   = false
     .oleClassname  = Packager Shell Object
     .streamId      = 0x01D9346F
[/ftPictFmla]SUBRECORD: [ftEnd]
[/ftEnd]
[/OBJ]

I'm guessing now that I need to somehow crack open the stream based on 
the streamId, any pointers on how to do this? Using POIFSFileSystem, it 
did not appear obvious as to how to do this.

Thanks,
Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by raindog <ra...@macrohmasheen.com>.
thanks, ill try that.
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Nick Burch <ni...@alfresco.com> wrote:

On Fri, 27 Jan 2012, Raindog wrote:
> The problem I have is that the ObjectData byte[] is always length 0. How 
> should I go about getting the correct byte[] for the object?

Are you sure that's the record with the data in it? You can use 
org.apache.poi.hssf.dev.BiffViewer to check where your known content lives

Also, Excel normally stores embedded resources in other POIFS streams, 
rather than in the record structure. Look for POIFS entries starting with 
MBD and those should have your data in

Nick

_____________________________________________

To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 27 Jan 2012, Raindog wrote:
> The problem I have is that the ObjectData byte[] is always length 0. How 
> should I go about getting the correct byte[] for the object?

Are you sure that's the record with the data in it? You can use 
org.apache.poi.hssf.dev.BiffViewer to check where your known content lives

Also, Excel normally stores embedded resources in other POIFS streams, 
rather than in the record structure. Look for POIFS entries starting with 
MBD and those should have your data in

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Raindog <ra...@macrohmasheen.com>.
On 1/31/2012 3:39 AM, Nick Burch wrote:
> On Mon, 30 Jan 2012, Raindog wrote:
>> Thanks for your suggestion. The following code from your example is 
>> close to what I wrote, but leaves me with the same problem I had: How 
>> to access the raw bytes of the item from a DocumentEntry.
>
> DocumentInputStream is what you're looking for here, you create one 
> from a DocumentEntry and then you read from it as a normal InputStream
>
> See <http://poi.apache.org/poifs/how-to.html> for more on working with 
> POIFS to read (and write) entries, and 
> <http://poi.apache.org/poifs/embeded.html> for some general info on 
> embedded documents
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
Thanks a lot for your Help guys. I now have working code.

P.S. This is an amazing library.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Thanks Nick, was just about to post that myself, I found it after playing
around with the api this morning - an older copy of 3.5 final I still had on
the PC.

Yours

Mark B

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Extracting-a-byte-of-a-record-tp5437114p5444524.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 30 Jan 2012, Raindog wrote:
> Thanks for your suggestion. The following code from your example is 
> close to what I wrote, but leaves me with the same problem I had: How to 
> access the raw bytes of the item from a DocumentEntry.

DocumentInputStream is what you're looking for here, you create one from a 
DocumentEntry and then you read from it as a normal InputStream

See <http://poi.apache.org/poifs/how-to.html> for more on working with 
POIFS to read (and write) entries, and 
<http://poi.apache.org/poifs/embeded.html> for some general info on 
embedded documents

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Well, to use a very English phrase, bu**er me. It really does seem as if this
object is being treated in a 'special' way and you will have to dig deeper
into the record structure. I no longer have access to the api on my PC but
will take a look through the javadocs today - I am in the office so should
be able to take a good dig around - and see if anything obvious presents
itself. From the looks of what you have, we need to find a way to iterate
through the lower level data structures to dig out the specific record you
need. Will post again when I have an answer; I will have to ask you to do
the testing however as I cannot any longer.

Yours

Mark B

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Extracting-a-byte-of-a-record-tp5437114p5443965.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Raindog <ra...@macrohmasheen.com>.
On 1/30/2012 11:31 PM, Mark Beardsley wrote:
> Well, if you look at the original examples code, these two lines are the key
> I think;
>
> HSSFObjectData obj : workbook.getAllEmbeddedObjects()
>
> which is actually a shorter way of saying this;
>
> HSSFObjectData[] objs - workbook.getAllEmbeddedObjects();
> for(HSSFObjectData obj : objs) {
> ...
> }
>
> That retrieves all of the embedded onjects as instances of the
> HSSFObjectData class.
>
> Then this line
>
> byte[] objectData = obj.getObjectData();
>
> allows you to get at the data of each embeded object. At it's most basic,
> all I thin you need to so is this;
>
> 1. Get the ambedded objects
> 2. Look at their data and so
>
> HSSFObjectData[] objs - workbook.getAllEmbeddedObjects();
> for(HSSFObjectData obj : objs) {
>      byte[] objectData = obj.getObjectData();
>      byte firstByte = objectData[0];
> }
>
> You do not need any of the other code IMO - unless you are doing something
> more involved. Give those four lines a try and see what happens.
>
> Yours
>
> Mark B
>    


If I do that, the only objects I get are:

Paint.Picture
Packager Shell Object

The length of the byte[] getObjectData returns is 0.

If I iterate over the entires in the Packager Shell Object (which is the 
object containing the embedded exe), I get the following children (The 
last number is object size via DocumentEntry.getSize():
Packager Shell Object.CompObj 76
Packager Shell Object.Ole10Native 180012 <--- these are the bytes I 
want, the Ole10Native entry is not contained in the output from 
workbook.getAllEmbeddedObjects()

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Well, if you look at the original examples code, these two lines are the key
I think;

HSSFObjectData obj : workbook.getAllEmbeddedObjects()

which is actually a shorter way of saying this;

HSSFObjectData[] objs - workbook.getAllEmbeddedObjects();
for(HSSFObjectData obj : objs) {
...
}

That retrieves all of the embedded onjects as instances of the
HSSFObjectData class.

Then this line

byte[] objectData = obj.getObjectData();

allows you to get at the data of each embeded object. At it's most basic,
all I thin you need to so is this;

1. Get the ambedded objects
2. Look at their data and so

HSSFObjectData[] objs - workbook.getAllEmbeddedObjects();
for(HSSFObjectData obj : objs) {
    byte[] objectData = obj.getObjectData();
    byte firstByte = objectData[0];
}

You do not need any of the other code IMO - unless you are doing something
more involved. Give those four lines a try and see what happens.

Yours

Mark B


--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Extracting-a-byte-of-a-record-tp5437114p5443859.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Raindog <ra...@macrohmasheen.com>.
On 1/29/2012 7:25 AM, Mark Beardsley wrote:
> Have you taken a look at the quick guide yet? There is an example that shows
> how to extract embedded objects from an Excel sheet;
> http://poi.apache.org/spreadsheet/quick-guide.html#Embedded
>
> Admittedly, I wrote it a while back but the fact that it remains there does
> suggest it may still be valid.
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/Extracting-a-byte-of-a-record-tp5437114p5439538.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
>    
Thanks for your suggestion. The following code from your example is 
close to what I wrote, but leaves me with the same problem I had: How to 
access the raw bytes of the item from a DocumentEntry. I've added a 
comment to the code below to show my particular trouble area.

                     } else {
                         if (obj.hasDirectoryEntry()) {
                             // The DirectoryEntry is a DocumentNode. 
Examine its
                             // entries to find out what it is
                             DirectoryNode dn = (DirectoryNode) obj
                                     .getDirectory();
                             for (Iterator<Entry> entries = 
dn.getEntries(); entries
                                     .hasNext();) {
                                 Entry entry = (Entry) entries.next();
                                 if(entry.isDocumentEntry()) {
                                     //How do I extract the object data 
from this object?
                                     System.out.println(oleName + "." + 
entry.getName());
                                     DocumentEntry docEntry = 
(DocumentEntry)entry;

                                 }

                             }
                         } else {
                             // There is no DirectoryEntry
                             // Recover the object's data from the 
HSSFObjectData
                             // instance.
                             byte[] objectData = obj.getObjectData();
                             System.out.println(objectData);
                         }


Thanks,
Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting a byte[] of a record.

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Have you taken a look at the quick guide yet? There is an example that shows
how to extract embedded objects from an Excel sheet;
http://poi.apache.org/spreadsheet/quick-guide.html#Embedded

Admittedly, I wrote it a while back but the fact that it remains there does
suggest it may still be valid.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Extracting-a-byte-of-a-record-tp5437114p5439538.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org