You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Mike Frederick <mi...@nuview.com> on 2005/03/15 17:52:30 UTC

Any info on the format of embedded objects?

Is there any info available on the format of embedded objects in Office files? 
I see the objects via the POIBrowser...


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Any info on the format of embedded objects?

Posted by an...@superlinksoftware.com.
This would make a nice addition to this: 
http://jakarta.apache.org/poi/poifs/fileformat.html

Michael Zalewski wrote:

>In the case of Excel, each embedded object goes in to a Storage (Storage is the 
>POIFS equivalent of a file folder). The Storage will have a name like 
>MBDxxxxxxxx, where the xxxxxxxx is some hexadecimal number.
>
>If the embedded object is another Office document, or if the embedded object is 
>OLE2, the streams in the MBDxxx folder will have the same names as would be 
>found in their own CombObj file. For example, if you embed an Excel file inside 
>another Excel file, you will find the following structure:
>
>  Root Property
>  |
>  +-- CompObj
>  |
>  +-- Workbook (these are the BIFF records of the outer workbook)
>  |
>  +-- SummaryInformation (POIFS Properties)
>  |
>  +-- DocumentSummaryInformation (POIFS Properties for Office Documents)
>  |
>  +-- MBD000006B0B
>      |
>      +-- CompObj
>      |
>      +-- Workbook (these are the BIFF records of the embedded workbook)
>      |
>      +-- SummaryInformation (POIFS Properties for embedded workbook)
>      |
>      +-- DocumentSummaryInformation (POIFS Properties for Office Documents)
>
>
>If the embedded object is a file which is not OLE2 (i.e., the embedded object 
>is not a OLE2 file), the structure is slightly different. It still gets put 
>into an MBDxxxxxxxx Storage, but the structure looks like this. This is the 
>format used if you embed a Bitmap or JPEG file, for example
>
>  Root Property
>  |
>  +-- CompObj
>  |
>  +-- Workbook (these are the BIFF records of the outer workbook)
>  |
>  +-- SummaryInformation (POIFS Properties)
>  |
>  +-- DocumentSummaryInformation (POIFS Properties for Office Documents)
>  |
>  +-- MBD000006B0B
>      |
>      +-- Ole
>      |
>      +-- CompObj
>      |
>      +-- Ole10Native (Embedded file)
>
>
>The 'Compobj' stream is actually important when a file is embedded. It contains 
>the CLSID of the OLE2 server which will render the object. For example, for 
>Excel, it is always the same 109 bytes.
>
>The 'Ole10Native' stream for non-OLE2 documents contains a header followed by 
>the contents of the file.
>
>The 'Ole' stream is always short, and seems to be present whenever the embedded 
>object is not of the same type as the object in which it is embedded.
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
>Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
>The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>.
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Any info on the format of embedded objects?

Posted by Michael Zalewski <za...@optonline.net>.
If you liked my comments about how to embed in structured files, you might 
enjoy the following link

Jon Honeyball, Windows IT Pro, Exploring Cairo, November 1995
http://www.windowsitpro.com/Windows/Articles/ArticleID/2312/pg/1/1.html

This article was written nearly 10 years ago! It is amusing to me that the 
article seems to speculate how all this stuff is about to change with the new 
version of NT (NT 4.0...)


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


RE: Any info on the format of embedded objects?

Posted by Kurt Stein <ku...@liquio.com>.
excellent post!

-----Original Message-----
From: news [mailto:news@sea.gmane.org]On Behalf Of Michael Zalewski
Sent: Friday, March 18, 2005 7:39 AM
To: poi-user@jakarta.apache.org
Subject: Re: Any info on the format of embedded objects?


In the case of Excel, each embedded object goes in to a Storage (Storage is
the
POIFS equivalent of a file folder). The Storage will have a name like
MBDxxxxxxxx, where the xxxxxxxx is some hexadecimal number.

If the embedded object is another Office document, or if the embedded object
is
OLE2, the streams in the MBDxxx folder will have the same names as would be
found in their own CombObj file. For example, if you embed an Excel file
inside
another Excel file, you will find the following structure:

  Root Property
  |
  +-- CompObj
  |
  +-- Workbook (these are the BIFF records of the outer workbook)
  |
  +-- SummaryInformation (POIFS Properties)
  |
  +-- DocumentSummaryInformation (POIFS Properties for Office Documents)
  |
  +-- MBD000006B0B
      |
      +-- CompObj
      |
      +-- Workbook (these are the BIFF records of the embedded workbook)
      |
      +-- SummaryInformation (POIFS Properties for embedded workbook)
      |
      +-- DocumentSummaryInformation (POIFS Properties for Office Documents)


If the embedded object is a file which is not OLE2 (i.e., the embedded
object
is not a OLE2 file), the structure is slightly different. It still gets put
into an MBDxxxxxxxx Storage, but the structure looks like this. This is the
format used if you embed a Bitmap or JPEG file, for example

  Root Property
  |
  +-- CompObj
  |
  +-- Workbook (these are the BIFF records of the outer workbook)
  |
  +-- SummaryInformation (POIFS Properties)
  |
  +-- DocumentSummaryInformation (POIFS Properties for Office Documents)
  |
  +-- MBD000006B0B
      |
      +-- Ole
      |
      +-- CompObj
      |
      +-- Ole10Native (Embedded file)


The 'Compobj' stream is actually important when a file is embedded. It
contains
the CLSID of the OLE2 server which will render the object. For example, for
Excel, it is always the same 109 bytes.

The 'Ole10Native' stream for non-OLE2 documents contains a header followed
by
the contents of the file.

The 'Ole' stream is always short, and seems to be present whenever the
embedded
object is not of the same type as the object in which it is embedded.



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Re: Any info on the format of embedded objects?

Posted by Michael Zalewski <mj...@lucent.com>.
In the case of Excel, each embedded object goes in to a Storage (Storage is the 
POIFS equivalent of a file folder). The Storage will have a name like 
MBDxxxxxxxx, where the xxxxxxxx is some hexadecimal number.

If the embedded object is another Office document, or if the embedded object is 
OLE2, the streams in the MBDxxx folder will have the same names as would be 
found in their own CombObj file. For example, if you embed an Excel file inside 
another Excel file, you will find the following structure:

  Root Property
  |
  +-- CompObj
  |
  +-- Workbook (these are the BIFF records of the outer workbook)
  |
  +-- SummaryInformation (POIFS Properties)
  |
  +-- DocumentSummaryInformation (POIFS Properties for Office Documents)
  |
  +-- MBD000006B0B
      |
      +-- CompObj
      |
      +-- Workbook (these are the BIFF records of the embedded workbook)
      |
      +-- SummaryInformation (POIFS Properties for embedded workbook)
      |
      +-- DocumentSummaryInformation (POIFS Properties for Office Documents)


If the embedded object is a file which is not OLE2 (i.e., the embedded object 
is not a OLE2 file), the structure is slightly different. It still gets put 
into an MBDxxxxxxxx Storage, but the structure looks like this. This is the 
format used if you embed a Bitmap or JPEG file, for example

  Root Property
  |
  +-- CompObj
  |
  +-- Workbook (these are the BIFF records of the outer workbook)
  |
  +-- SummaryInformation (POIFS Properties)
  |
  +-- DocumentSummaryInformation (POIFS Properties for Office Documents)
  |
  +-- MBD000006B0B
      |
      +-- Ole
      |
      +-- CompObj
      |
      +-- Ole10Native (Embedded file)


The 'Compobj' stream is actually important when a file is embedded. It contains 
the CLSID of the OLE2 server which will render the object. For example, for 
Excel, it is always the same 109 bytes.

The 'Ole10Native' stream for non-OLE2 documents contains a header followed by 
the contents of the file.

The 'Ole' stream is always short, and seems to be present whenever the embedded 
object is not of the same type as the object in which it is embedded.



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/