You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Carl Lee <lj...@gmail.com> on 2011/07/30 06:20:23 UTC

Extracting OLE objects from Word document

Hi, I am trying to extract embedded equations inside a Word document into a
PowerPoint document(I want to repeat this several hundreds of time).
I have got the following information from POIFSFileSystem and POIFSLister
class

Root Entry -
  SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 / 0xdc]
  WordDocument [33475 / 0x82c3]
  1Table [13981 / 0x369d]
  ObjectPool -
    _1343968404 -
      CompObj <(0x01)CompObj> [105 / 0x69]
      ObjInfo <(0x03)ObjInfo> [6 / 0x6]
      Equation Native [71 / 0x47]
      Ole <(0x01)Ole> [20 / 0x14]

but, my question is how exactly do I get text like this (When I try to
extract embedded equations as plain text using WordExtractor)

!!EMBED Equation.3

to match the same object list above? only by orders?

Another question is can I insert directly using the OLE Object into a
HSFLSlideShow to avoid the time spend on extracting and inserting into a
.ppt? Or is it just better I extract the equations as images and insert it
into a HSFLSlideShow than insert the OLE Object directly?

Re: Extracting OLE objects from Word document

Posted by Carl Lee <lj...@gmail.com>.
thank you

On Sat, Jul 30, 2011 at 5:52 PM, Sergey Vladimirov <vl...@gmail.com>wrote:

> Carl,
>
> Try to use CharacterRun / isOle2() and CharacterRun / getPicOffset()
> methods. They are mapped to the fOle2 and fcPic properties of CHP(X).
>
> --
> Best regards,
> Sergey
>
> On Sat, Jul 30, 2011 at 1:24 PM, Carl Lee <lj...@gmail.com> wrote:
> > But which class exactly in POI library should I use?
> >
> > On Sat, Jul 30, 2011 at 5:23 PM, Carl Lee <lj...@gmail.com> wrote:
> >
> >> Thanks, I'll look into it
> >>
> >>
> >> On Sat, Jul 30, 2011 at 5:20 PM, Sergey Vladimirov <vlsergey@gmail.com
> >wrote:
> >>
> >>> Carl,
> >>>
> >>> Working on low-level structures, if CHP assotiates with character
> >>> (begining of equaltion) has fOle2=1 then fcPic will contain unique
> >>> integer pointing to OLE substream. For details see p. 11 of "Microsoft
> >>> Office Word 97-2007 Binary File Format (.doc) Specification".
> >>>
> >>> Best regards,
> >>> Sergey.
> >>>
> >>> On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
> >>> > Hi, I am trying to extract embedded equations inside a Word document
> >>> into a
> >>> > PowerPoint document(I want to repeat this several hundreds of time).
> >>> > I have got the following information from POIFSFileSystem and
> >>> POIFSLister
> >>> > class
> >>> >
> >>> > Root Entry -
> >>> >  SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
> >>> >  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 /
> >>> 0xdc]
> >>> >  WordDocument [33475 / 0x82c3]
> >>> >  1Table [13981 / 0x369d]
> >>> >  ObjectPool -
> >>> >    _1343968404 -
> >>> >      CompObj <(0x01)CompObj> [105 / 0x69]
> >>> >      ObjInfo <(0x03)ObjInfo> [6 / 0x6]
> >>> >      Equation Native [71 / 0x47]
> >>> >      Ole <(0x01)Ole> [20 / 0x14]
> >>> >
> >>> > but, my question is how exactly do I get text like this (When I try
> to
> >>> > extract embedded equations as plain text using WordExtractor)
> >>> >
> >>> > !!EMBED Equation.3
> >>> >
> >>> > to match the same object list above? only by orders?
> >>> >
> >>> > Another question is can I insert directly using the OLE Object into a
> >>> > HSFLSlideShow to avoid the time spend on extracting and inserting
> into a
> >>> > .ppt? Or is it just better I extract the equations as images and
> insert
> >>> it
> >>> > into a HSFLSlideShow than insert the OLE Object directly?
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Sergey Vladimirov
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> >>> For additional commands, e-mail: user-help@poi.apache.org
> >>>
> >>>
> >>
> >
>
>
>
> --
> Sergey Vladimirov
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: Extracting OLE objects from Word document

Posted by Sergey Vladimirov <vl...@gmail.com>.
Carl,

Try to use CharacterRun / isOle2() and CharacterRun / getPicOffset()
methods. They are mapped to the fOle2 and fcPic properties of CHP(X).

-- 
Best regards,
Sergey

On Sat, Jul 30, 2011 at 1:24 PM, Carl Lee <lj...@gmail.com> wrote:
> But which class exactly in POI library should I use?
>
> On Sat, Jul 30, 2011 at 5:23 PM, Carl Lee <lj...@gmail.com> wrote:
>
>> Thanks, I'll look into it
>>
>>
>> On Sat, Jul 30, 2011 at 5:20 PM, Sergey Vladimirov <vl...@gmail.com>wrote:
>>
>>> Carl,
>>>
>>> Working on low-level structures, if CHP assotiates with character
>>> (begining of equaltion) has fOle2=1 then fcPic will contain unique
>>> integer pointing to OLE substream. For details see p. 11 of "Microsoft
>>> Office Word 97-2007 Binary File Format (.doc) Specification".
>>>
>>> Best regards,
>>> Sergey.
>>>
>>> On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
>>> > Hi, I am trying to extract embedded equations inside a Word document
>>> into a
>>> > PowerPoint document(I want to repeat this several hundreds of time).
>>> > I have got the following information from POIFSFileSystem and
>>> POIFSLister
>>> > class
>>> >
>>> > Root Entry -
>>> >  SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
>>> >  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 /
>>> 0xdc]
>>> >  WordDocument [33475 / 0x82c3]
>>> >  1Table [13981 / 0x369d]
>>> >  ObjectPool -
>>> >    _1343968404 -
>>> >      CompObj <(0x01)CompObj> [105 / 0x69]
>>> >      ObjInfo <(0x03)ObjInfo> [6 / 0x6]
>>> >      Equation Native [71 / 0x47]
>>> >      Ole <(0x01)Ole> [20 / 0x14]
>>> >
>>> > but, my question is how exactly do I get text like this (When I try to
>>> > extract embedded equations as plain text using WordExtractor)
>>> >
>>> > !!EMBED Equation.3
>>> >
>>> > to match the same object list above? only by orders?
>>> >
>>> > Another question is can I insert directly using the OLE Object into a
>>> > HSFLSlideShow to avoid the time spend on extracting and inserting into a
>>> > .ppt? Or is it just better I extract the equations as images and insert
>>> it
>>> > into a HSFLSlideShow than insert the OLE Object directly?
>>> >
>>>
>>>
>>>
>>> --
>>> Sergey Vladimirov
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>
>



-- 
Sergey Vladimirov

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Extracting OLE objects from Word document

Posted by Carl Lee <lj...@gmail.com>.
But which class exactly in POI library should I use?

On Sat, Jul 30, 2011 at 5:23 PM, Carl Lee <lj...@gmail.com> wrote:

> Thanks, I'll look into it
>
>
> On Sat, Jul 30, 2011 at 5:20 PM, Sergey Vladimirov <vl...@gmail.com>wrote:
>
>> Carl,
>>
>> Working on low-level structures, if CHP assotiates with character
>> (begining of equaltion) has fOle2=1 then fcPic will contain unique
>> integer pointing to OLE substream. For details see p. 11 of "Microsoft
>> Office Word 97-2007 Binary File Format (.doc) Specification".
>>
>> Best regards,
>> Sergey.
>>
>> On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
>> > Hi, I am trying to extract embedded equations inside a Word document
>> into a
>> > PowerPoint document(I want to repeat this several hundreds of time).
>> > I have got the following information from POIFSFileSystem and
>> POIFSLister
>> > class
>> >
>> > Root Entry -
>> >  SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
>> >  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 /
>> 0xdc]
>> >  WordDocument [33475 / 0x82c3]
>> >  1Table [13981 / 0x369d]
>> >  ObjectPool -
>> >    _1343968404 -
>> >      CompObj <(0x01)CompObj> [105 / 0x69]
>> >      ObjInfo <(0x03)ObjInfo> [6 / 0x6]
>> >      Equation Native [71 / 0x47]
>> >      Ole <(0x01)Ole> [20 / 0x14]
>> >
>> > but, my question is how exactly do I get text like this (When I try to
>> > extract embedded equations as plain text using WordExtractor)
>> >
>> > !!EMBED Equation.3
>> >
>> > to match the same object list above? only by orders?
>> >
>> > Another question is can I insert directly using the OLE Object into a
>> > HSFLSlideShow to avoid the time spend on extracting and inserting into a
>> > .ppt? Or is it just better I extract the equations as images and insert
>> it
>> > into a HSFLSlideShow than insert the OLE Object directly?
>> >
>>
>>
>>
>> --
>> Sergey Vladimirov
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>

Re: Extracting OLE objects from Word document

Posted by Carl Lee <lj...@gmail.com>.
Thanks, I'll look into it

On Sat, Jul 30, 2011 at 5:20 PM, Sergey Vladimirov <vl...@gmail.com>wrote:

> Carl,
>
> Working on low-level structures, if CHP assotiates with character
> (begining of equaltion) has fOle2=1 then fcPic will contain unique
> integer pointing to OLE substream. For details see p. 11 of "Microsoft
> Office Word 97-2007 Binary File Format (.doc) Specification".
>
> Best regards,
> Sergey.
>
> On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
> > Hi, I am trying to extract embedded equations inside a Word document into
> a
> > PowerPoint document(I want to repeat this several hundreds of time).
> > I have got the following information from POIFSFileSystem and POIFSLister
> > class
> >
> > Root Entry -
> >  SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
> >  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 /
> 0xdc]
> >  WordDocument [33475 / 0x82c3]
> >  1Table [13981 / 0x369d]
> >  ObjectPool -
> >    _1343968404 -
> >      CompObj <(0x01)CompObj> [105 / 0x69]
> >      ObjInfo <(0x03)ObjInfo> [6 / 0x6]
> >      Equation Native [71 / 0x47]
> >      Ole <(0x01)Ole> [20 / 0x14]
> >
> > but, my question is how exactly do I get text like this (When I try to
> > extract embedded equations as plain text using WordExtractor)
> >
> > !!EMBED Equation.3
> >
> > to match the same object list above? only by orders?
> >
> > Another question is can I insert directly using the OLE Object into a
> > HSFLSlideShow to avoid the time spend on extracting and inserting into a
> > .ppt? Or is it just better I extract the equations as images and insert
> it
> > into a HSFLSlideShow than insert the OLE Object directly?
> >
>
>
>
> --
> Sergey Vladimirov
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: Extracting OLE objects from Word document

Posted by Sergey Vladimirov <vl...@gmail.com>.
Carl,

Working on low-level structures, if CHP assotiates with character
(begining of equaltion) has fOle2=1 then fcPic will contain unique
integer pointing to OLE substream. For details see p. 11 of "Microsoft
Office Word 97-2007 Binary File Format (.doc) Specification".

Best regards,
Sergey.

On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
> Hi, I am trying to extract embedded equations inside a Word document into a
> PowerPoint document(I want to repeat this several hundreds of time).
> I have got the following information from POIFSFileSystem and POIFSLister
> class
>
> Root Entry -
>  SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
>  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 / 0xdc]
>  WordDocument [33475 / 0x82c3]
>  1Table [13981 / 0x369d]
>  ObjectPool -
>    _1343968404 -
>      CompObj <(0x01)CompObj> [105 / 0x69]
>      ObjInfo <(0x03)ObjInfo> [6 / 0x6]
>      Equation Native [71 / 0x47]
>      Ole <(0x01)Ole> [20 / 0x14]
>
> but, my question is how exactly do I get text like this (When I try to
> extract embedded equations as plain text using WordExtractor)
>
> !!EMBED Equation.3
>
> to match the same object list above? only by orders?
>
> Another question is can I insert directly using the OLE Object into a
> HSFLSlideShow to avoid the time spend on extracting and inserting into a
> .ppt? Or is it just better I extract the equations as images and insert it
> into a HSFLSlideShow than insert the OLE Object directly?
>



-- 
Sergey Vladimirov

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org