You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Robert Paris <rp...@hotmail.com> on 2005/04/21 19:04:27 UTC

RE: Getting a PIC - And errors found in EscherDump (got fixes if want)

No, they are stored in the Datastream BUT not in the format that the 
documentation states. But using the Escher format you should be able to grab 
MOST (not all) picture data, as images inserted into a Word file from Word 
'97 or later are now stored as Escher objects, even if they're not drawings 
but jpegs, etc.

The documentation states that the file is saved as a PIC header followed by 
the filename as a Pascal string and then the file data. That is not even 
remotely close to what actually exists there. Instead, there's the PIC 
header structure, then IF it's an Escher object, you've got the insane 
Escher heading structure (similar to, but even worse than the grppls of 
srpms) and then the actual file data.

Hope this helps!

(BTW, did anyone notice the oddly sexual nature of the Word naming 
structure? A whole host of sprms everywhere, which are linked to STDs, which 
of course require a PAP to discern and was all preceded by a whole lot of 
grppl-ing) -JK


>From: "Kais Dukes" <k....@complexar.com>
>Reply-To: "POI Developers List" <po...@jakarta.apache.org>
>To: "POI Developers List" <po...@jakarta.apache.org>
>Subject: RE: Getting a PIC
>Date: Wed, 20 Apr 2005 23:33:59 +0100
>
>Hi Robert,
>
>I am most interested in what you have found. Are you saying that the 
>picture
>data for some Escher images are not stored in the exepcted place (the
>document's Data stream?) but are instead embedded as part of the complex
>stream?
>
>Kind Regards,
>
>Kais
>
>-----Original Message-----
>From: Robert Paris [mailto:rpjava@hotmail.com]
>Sent: 20 April 2005 22:06
>To: poi-dev@jakarta.apache.org
>Subject: RE: Getting a PIC
>
>
>Thanks for the reply.
>
>OH, if only it were so simple. I believe I found it, and as with all other
>Word formats, the thing is a mess. You have to loop through and when you
>find the right record (and check a thousand fWhateverBooleans and option
>shorts), you then have to parse the complex data, and it appears to be
>stored in there.
>
>Of course, none of this follows the MS Binary Format writings and is found
>pretty much no where on the web. Ugh. But thankfully it appears the good
>folks at POI (non-scratchpad area) have done some great work in this area 
>to
>get me started.
>
>Thanks again!
>
> >From: "Kais Dukes" <k....@complexar.com>
> >Reply-To: "POI Developers List" <po...@jakarta.apache.org>
> >To: "POI Developers List" <po...@jakarta.apache.org>
> >Subject: RE: Getting a PIC
> >Date: Wed, 20 Apr 2005 18:49:02 +0100
> >
> >Hi Robert,
> >
> >Although I have not looked at the BSE record code myself, I have some
> >information from my own work on Escher diagrams.
> >A BSE record contains a fixed size header, and then may be followed by an
> >optional string (2 bytes per character). Could this string be the file 
>name
> >you have described?
> >
> >-- Kais
> >
> >-----Original Message-----
> >From: Robert Paris [mailto:rpjava@hotmail.com]
> >Sent: 20 April 2005 18:26
> >To: poi-dev@jakarta.apache.org
> >Subject: Re: Getting a PIC
> >
> >
> >Thanks for the reply. Yes, it does appear to be an Escher BSE Record,
> >however, there seems to be an issue with grabbing some of the info inside
> >it.
> >
> >When I look at the actual data in the byte stream, I can see the file 
>path
> >and name in the data (e.g. D : \ F i l e s \ S o m e I m a g e . j p g ),
> >yet I cannot find that data anywhere inside either POI's EscherBSE Record
> >reading (from 0xF007) nor in any other documentation I've found on that.
> >None of the tags seem to hold that info. Any idea where I read it from?
> >
> >Attempts to read from the case 0xF007 don't work because by the time it
> >hits
> >that tag marker, it's already past the path/filename string and when it
> >reads the name length (at offset 33), it always has length = 0.
> >
> >Thanks again for your help and time!
> >
> >
> >
> > >From: Avik Sengupta <av...@itellix.com>
> > >Reply-To: "POI Developers List" <po...@jakarta.apache.org>
> > >To: POI Developers List <po...@jakarta.apache.org>
> > >Subject: Re: Getting a PIC
> > >Date: Wed, 20 Apr 2005 12:39:53 +0530
> > >
> > >Have you seen the drawing code in HSSF? Maybe its similar/same?
> > >
> > >On Wed, 2005-04-20 at 03:02 +0000, Robert Paris wrote:
> > > > I'm working on the part of Word that stores pictures and I've run 
>into
> >a
> > > > problem. I'm able to grab the PIC structure (from the SPRM
> > > > sprmCPicLocation). However, once I've gone through that, I have a
> >chunk
> > >of
> > > > data that I believe is an "Office Shape Format". Unfortunately, I am
> > >unable
> > > > to find the definition for this structure anywhere. Does anyone know
> > >where
> > > > it is?
> > > >
> > > > The documentation for Word 97 says that all pictures "inserted with
> >Word
> > >97
> > > > are in the new Office shape format (documented elsewhere). Without
> >that
> > > > documentation, I have no way to read this data!
> > > >
> > > > Anyone?
> > > >
> > > >
> > > >
> > > > 
>---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> > > > Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
> > > > The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
> > > >
> > >--
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> > >Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
> > >The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
> > >
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> >Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
> >The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
> >
> >--
> >No virus found in this incoming message.
> >Checked by AVG Anti-Virus.
> >Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005
> >
> >--
> >No virus found in this outgoing message.
> >Checked by AVG Anti-Virus.
> >Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> >Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
> >The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
> >
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
>The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>
>--
>No virus found in this incoming message.
>Checked by AVG Anti-Virus.
>Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005
>
>--
>No virus found in this outgoing message.
>Checked by AVG Anti-Virus.
>Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
>The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/


RE: Getting a PIC - And errors found in EscherDump (got fixes if want)

Posted by Kais Dukes <k....@complexar.com>.
The first time I read about sprms in the Word docs, I also was amused :-)

-----Original Message-----
From: Robert Paris [mailto:rpjava@hotmail.com]
Sent: 21 April 2005 18:04
To: poi-dev@jakarta.apache.org
Subject: RE: Getting a PIC - And errors found in EscherDump (got fixes
if want)


No, they are stored in the Datastream BUT not in the format that the
documentation states. But using the Escher format you should be able to grab
MOST (not all) picture data, as images inserted into a Word file from Word
'97 or later are now stored as Escher objects, even if they're not drawings
but jpegs, etc.

The documentation states that the file is saved as a PIC header followed by
the filename as a Pascal string and then the file data. That is not even
remotely close to what actually exists there. Instead, there's the PIC
header structure, then IF it's an Escher object, you've got the insane
Escher heading structure (similar to, but even worse than the grppls of
srpms) and then the actual file data.

Hope this helps!

(BTW, did anyone notice the oddly sexual nature of the Word naming
structure? A whole host of sprms everywhere, which are linked to STDs, which
of course require a PAP to discern and was all preceded by a whole lot of
grppl-ing) -JK


>From: "Kais Dukes" <k....@complexar.com>
>Reply-To: "POI Developers List" <po...@jakarta.apache.org>
>To: "POI Developers List" <po...@jakarta.apache.org>
>Subject: RE: Getting a PIC
>Date: Wed, 20 Apr 2005 23:33:59 +0100
>
>Hi Robert,
>
>I am most interested in what you have found. Are you saying that the
>picture
>data for some Escher images are not stored in the exepcted place (the
>document's Data stream?) but are instead embedded as part of the complex
>stream?
>
>Kind Regards,
>
>Kais
>
>-----Original Message-----
>From: Robert Paris [mailto:rpjava@hotmail.com]
>Sent: 20 April 2005 22:06
>To: poi-dev@jakarta.apache.org
>Subject: RE: Getting a PIC
>
>
>Thanks for the reply.
>
>OH, if only it were so simple. I believe I found it, and as with all other
>Word formats, the thing is a mess. You have to loop through and when you
>find the right record (and check a thousand fWhateverBooleans and option
>shorts), you then have to parse the complex data, and it appears to be
>stored in there.
>
>Of course, none of this follows the MS Binary Format writings and is found
>pretty much no where on the web. Ugh. But thankfully it appears the good
>folks at POI (non-scratchpad area) have done some great work in this area
>to
>get me started.
>
>Thanks again!
>
> >From: "Kais Dukes" <k....@complexar.com>
> >Reply-To: "POI Developers List" <po...@jakarta.apache.org>
> >To: "POI Developers List" <po...@jakarta.apache.org>
> >Subject: RE: Getting a PIC
> >Date: Wed, 20 Apr 2005 18:49:02 +0100
> >
> >Hi Robert,
> >
> >Although I have not looked at the BSE record code myself, I have some
> >information from my own work on Escher diagrams.
> >A BSE record contains a fixed size header, and then may be followed by an
> >optional string (2 bytes per character). Could this string be the file
>name
> >you have described?
> >
> >-- Kais
> >
> >-----Original Message-----
> >From: Robert Paris [mailto:rpjava@hotmail.com]
> >Sent: 20 April 2005 18:26
> >To: poi-dev@jakarta.apache.org
> >Subject: Re: Getting a PIC
> >
> >
> >Thanks for the reply. Yes, it does appear to be an Escher BSE Record,
> >however, there seems to be an issue with grabbing some of the info inside
> >it.
> >
> >When I look at the actual data in the byte stream, I can see the file
>path
> >and name in the data (e.g. D : \ F i l e s \ S o m e I m a g e . j p g ),
> >yet I cannot find that data anywhere inside either POI's EscherBSE Record
> >reading (from 0xF007) nor in any other documentation I've found on that.
> >None of the tags seem to hold that info. Any idea where I read it from?
> >
> >Attempts to read from the case 0xF007 don't work because by the time it
> >hits
> >that tag marker, it's already past the path/filename string and when it
> >reads the name length (at offset 33), it always has length = 0.
> >
> >Thanks again for your help and time!
> >
> >
> >
> > >From: Avik Sengupta <av...@itellix.com>
> > >Reply-To: "POI Developers List" <po...@jakarta.apache.org>
> > >To: POI Developers List <po...@jakarta.apache.org>
> > >Subject: Re: Getting a PIC
> > >Date: Wed, 20 Apr 2005 12:39:53 +0530
> > >
> > >Have you seen the drawing code in HSSF? Maybe its similar/same?
> > >
> > >On Wed, 2005-04-20 at 03:02 +0000, Robert Paris wrote:
> > > > I'm working on the part of Word that stores pictures and I've run
>into
> >a
> > > > problem. I'm able to grab the PIC structure (from the SPRM
> > > > sprmCPicLocation). However, once I've gone through that, I have a
> >chunk
> > >of
> > > > data that I believe is an "Office Shape Format". Unfortunately, I am
> > >unable
> > > > to find the definition for this structure anywhere. Does anyone know
> > >where
> > > > it is?
> > > >
> > > > The documentation for Word 97 says that all pictures "inserted with
> >Word
> > >97
> > > > are in the new Office shape format (documented elsewhere). Without
> >that
> > > > documentation, I have no way to read this data!
> > > >
> > > > Anyone?
> > > >
> > > >
> > > >
> > > >
>---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> > > > Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
> > > > The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
> > > >
> > >--
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> > >Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
> > >The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
> > >
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> >Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
> >The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
> >
> >--
> >No virus found in this incoming message.
> >Checked by AVG Anti-Virus.
> >Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005
> >
> >--
> >No virus found in this outgoing message.
> >Checked by AVG Anti-Virus.
> >Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
> >Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
> >The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
> >
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
>The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>
>--
>No virus found in this incoming message.
>Checked by AVG Anti-Virus.
>Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005
>
>--
>No virus found in this outgoing message.
>Checked by AVG Anti-Virus.
>Version: 7.0.308 / Virus Database: 266.9.18 - Release Date: 19/04/2005
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
>Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
>The Apache Jakarta POI Project: http://jakarta.apache.org/poi/
>



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/

--
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.10.1 - Release Date: 20/04/2005

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.10.1 - Release Date: 20/04/2005


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/