You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Carl Lee <lj...@gmail.com> on 2011/07/30 06:20:23 UTC
Extracting OLE objects from Word document
Hi, I am trying to extract embedded equations inside a Word document into a
PowerPoint document(I want to repeat this several hundreds of time).
I have got the following information from POIFSFileSystem and POIFSLister
class
Root Entry -
SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 / 0xdc]
WordDocument [33475 / 0x82c3]
1Table [13981 / 0x369d]
ObjectPool -
_1343968404 -
CompObj <(0x01)CompObj> [105 / 0x69]
ObjInfo <(0x03)ObjInfo> [6 / 0x6]
Equation Native [71 / 0x47]
Ole <(0x01)Ole> [20 / 0x14]
but, my question is how exactly do I get text like this (When I try to
extract embedded equations as plain text using WordExtractor)
!!EMBED Equation.3
to match the same object list above? only by orders?
Another question is can I insert directly using the OLE Object into a
HSFLSlideShow to avoid the time spend on extracting and inserting into a
.ppt? Or is it just better I extract the equations as images and insert it
into a HSFLSlideShow than insert the OLE Object directly?
Re: Extracting OLE objects from Word document
Posted by Carl Lee <lj...@gmail.com>.
thank you
On Sat, Jul 30, 2011 at 5:52 PM, Sergey Vladimirov <vl...@gmail.com>wrote:
> Carl,
>
> Try to use CharacterRun / isOle2() and CharacterRun / getPicOffset()
> methods. They are mapped to the fOle2 and fcPic properties of CHP(X).
>
> --
> Best regards,
> Sergey
>
> On Sat, Jul 30, 2011 at 1:24 PM, Carl Lee <lj...@gmail.com> wrote:
> > But which class exactly in POI library should I use?
> >
> > On Sat, Jul 30, 2011 at 5:23 PM, Carl Lee <lj...@gmail.com> wrote:
> >
> >> Thanks, I'll look into it
> >>
> >>
> >> On Sat, Jul 30, 2011 at 5:20 PM, Sergey Vladimirov <vlsergey@gmail.com
> >wrote:
> >>
> >>> Carl,
> >>>
> >>> Working on low-level structures, if CHP assotiates with character
> >>> (begining of equaltion) has fOle2=1 then fcPic will contain unique
> >>> integer pointing to OLE substream. For details see p. 11 of "Microsoft
> >>> Office Word 97-2007 Binary File Format (.doc) Specification".
> >>>
> >>> Best regards,
> >>> Sergey.
> >>>
> >>> On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
> >>> > Hi, I am trying to extract embedded equations inside a Word document
> >>> into a
> >>> > PowerPoint document(I want to repeat this several hundreds of time).
> >>> > I have got the following information from POIFSFileSystem and
> >>> POIFSLister
> >>> > class
> >>> >
> >>> > Root Entry -
> >>> > SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
> >>> > DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 /
> >>> 0xdc]
> >>> > WordDocument [33475 / 0x82c3]
> >>> > 1Table [13981 / 0x369d]
> >>> > ObjectPool -
> >>> > _1343968404 -
> >>> > CompObj <(0x01)CompObj> [105 / 0x69]
> >>> > ObjInfo <(0x03)ObjInfo> [6 / 0x6]
> >>> > Equation Native [71 / 0x47]
> >>> > Ole <(0x01)Ole> [20 / 0x14]
> >>> >
> >>> > but, my question is how exactly do I get text like this (When I try
> to
> >>> > extract embedded equations as plain text using WordExtractor)
> >>> >
> >>> > !!EMBED Equation.3
> >>> >
> >>> > to match the same object list above? only by orders?
> >>> >
> >>> > Another question is can I insert directly using the OLE Object into a
> >>> > HSFLSlideShow to avoid the time spend on extracting and inserting
> into a
> >>> > .ppt? Or is it just better I extract the equations as images and
> insert
> >>> it
> >>> > into a HSFLSlideShow than insert the OLE Object directly?
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Sergey Vladimirov
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> >>> For additional commands, e-mail: user-help@poi.apache.org
> >>>
> >>>
> >>
> >
>
>
>
> --
> Sergey Vladimirov
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
Re: Extracting OLE objects from Word document
Posted by Sergey Vladimirov <vl...@gmail.com>.
Carl,
Try to use CharacterRun / isOle2() and CharacterRun / getPicOffset()
methods. They are mapped to the fOle2 and fcPic properties of CHP(X).
--
Best regards,
Sergey
On Sat, Jul 30, 2011 at 1:24 PM, Carl Lee <lj...@gmail.com> wrote:
> But which class exactly in POI library should I use?
>
> On Sat, Jul 30, 2011 at 5:23 PM, Carl Lee <lj...@gmail.com> wrote:
>
>> Thanks, I'll look into it
>>
>>
>> On Sat, Jul 30, 2011 at 5:20 PM, Sergey Vladimirov <vl...@gmail.com>wrote:
>>
>>> Carl,
>>>
>>> Working on low-level structures, if CHP assotiates with character
>>> (begining of equaltion) has fOle2=1 then fcPic will contain unique
>>> integer pointing to OLE substream. For details see p. 11 of "Microsoft
>>> Office Word 97-2007 Binary File Format (.doc) Specification".
>>>
>>> Best regards,
>>> Sergey.
>>>
>>> On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
>>> > Hi, I am trying to extract embedded equations inside a Word document
>>> into a
>>> > PowerPoint document(I want to repeat this several hundreds of time).
>>> > I have got the following information from POIFSFileSystem and
>>> POIFSLister
>>> > class
>>> >
>>> > Root Entry -
>>> > SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
>>> > DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 /
>>> 0xdc]
>>> > WordDocument [33475 / 0x82c3]
>>> > 1Table [13981 / 0x369d]
>>> > ObjectPool -
>>> > _1343968404 -
>>> > CompObj <(0x01)CompObj> [105 / 0x69]
>>> > ObjInfo <(0x03)ObjInfo> [6 / 0x6]
>>> > Equation Native [71 / 0x47]
>>> > Ole <(0x01)Ole> [20 / 0x14]
>>> >
>>> > but, my question is how exactly do I get text like this (When I try to
>>> > extract embedded equations as plain text using WordExtractor)
>>> >
>>> > !!EMBED Equation.3
>>> >
>>> > to match the same object list above? only by orders?
>>> >
>>> > Another question is can I insert directly using the OLE Object into a
>>> > HSFLSlideShow to avoid the time spend on extracting and inserting into a
>>> > .ppt? Or is it just better I extract the equations as images and insert
>>> it
>>> > into a HSFLSlideShow than insert the OLE Object directly?
>>> >
>>>
>>>
>>>
>>> --
>>> Sergey Vladimirov
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>
>
--
Sergey Vladimirov
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Extracting OLE objects from Word document
Posted by Carl Lee <lj...@gmail.com>.
But which class exactly in POI library should I use?
On Sat, Jul 30, 2011 at 5:23 PM, Carl Lee <lj...@gmail.com> wrote:
> Thanks, I'll look into it
>
>
> On Sat, Jul 30, 2011 at 5:20 PM, Sergey Vladimirov <vl...@gmail.com>wrote:
>
>> Carl,
>>
>> Working on low-level structures, if CHP assotiates with character
>> (begining of equaltion) has fOle2=1 then fcPic will contain unique
>> integer pointing to OLE substream. For details see p. 11 of "Microsoft
>> Office Word 97-2007 Binary File Format (.doc) Specification".
>>
>> Best regards,
>> Sergey.
>>
>> On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
>> > Hi, I am trying to extract embedded equations inside a Word document
>> into a
>> > PowerPoint document(I want to repeat this several hundreds of time).
>> > I have got the following information from POIFSFileSystem and
>> POIFSLister
>> > class
>> >
>> > Root Entry -
>> > SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
>> > DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 /
>> 0xdc]
>> > WordDocument [33475 / 0x82c3]
>> > 1Table [13981 / 0x369d]
>> > ObjectPool -
>> > _1343968404 -
>> > CompObj <(0x01)CompObj> [105 / 0x69]
>> > ObjInfo <(0x03)ObjInfo> [6 / 0x6]
>> > Equation Native [71 / 0x47]
>> > Ole <(0x01)Ole> [20 / 0x14]
>> >
>> > but, my question is how exactly do I get text like this (When I try to
>> > extract embedded equations as plain text using WordExtractor)
>> >
>> > !!EMBED Equation.3
>> >
>> > to match the same object list above? only by orders?
>> >
>> > Another question is can I insert directly using the OLE Object into a
>> > HSFLSlideShow to avoid the time spend on extracting and inserting into a
>> > .ppt? Or is it just better I extract the equations as images and insert
>> it
>> > into a HSFLSlideShow than insert the OLE Object directly?
>> >
>>
>>
>>
>> --
>> Sergey Vladimirov
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>
Re: Extracting OLE objects from Word document
Posted by Carl Lee <lj...@gmail.com>.
Thanks, I'll look into it
On Sat, Jul 30, 2011 at 5:20 PM, Sergey Vladimirov <vl...@gmail.com>wrote:
> Carl,
>
> Working on low-level structures, if CHP assotiates with character
> (begining of equaltion) has fOle2=1 then fcPic will contain unique
> integer pointing to OLE substream. For details see p. 11 of "Microsoft
> Office Word 97-2007 Binary File Format (.doc) Specification".
>
> Best regards,
> Sergey.
>
> On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
> > Hi, I am trying to extract embedded equations inside a Word document into
> a
> > PowerPoint document(I want to repeat this several hundreds of time).
> > I have got the following information from POIFSFileSystem and POIFSLister
> > class
> >
> > Root Entry -
> > SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
> > DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 /
> 0xdc]
> > WordDocument [33475 / 0x82c3]
> > 1Table [13981 / 0x369d]
> > ObjectPool -
> > _1343968404 -
> > CompObj <(0x01)CompObj> [105 / 0x69]
> > ObjInfo <(0x03)ObjInfo> [6 / 0x6]
> > Equation Native [71 / 0x47]
> > Ole <(0x01)Ole> [20 / 0x14]
> >
> > but, my question is how exactly do I get text like this (When I try to
> > extract embedded equations as plain text using WordExtractor)
> >
> > !!EMBED Equation.3
> >
> > to match the same object list above? only by orders?
> >
> > Another question is can I insert directly using the OLE Object into a
> > HSFLSlideShow to avoid the time spend on extracting and inserting into a
> > .ppt? Or is it just better I extract the equations as images and insert
> it
> > into a HSFLSlideShow than insert the OLE Object directly?
> >
>
>
>
> --
> Sergey Vladimirov
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
Re: Extracting OLE objects from Word document
Posted by Sergey Vladimirov <vl...@gmail.com>.
Carl,
Working on low-level structures, if CHP assotiates with character
(begining of equaltion) has fOle2=1 then fcPic will contain unique
integer pointing to OLE substream. For details see p. 11 of "Microsoft
Office Word 97-2007 Binary File Format (.doc) Specification".
Best regards,
Sergey.
On Sat, Jul 30, 2011 at 8:20 AM, Carl Lee <lj...@gmail.com> wrote:
> Hi, I am trying to extract embedded equations inside a Word document into a
> PowerPoint document(I want to repeat this several hundreds of time).
> I have got the following information from POIFSFileSystem and POIFSLister
> class
>
> Root Entry -
> SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
> DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 / 0xdc]
> WordDocument [33475 / 0x82c3]
> 1Table [13981 / 0x369d]
> ObjectPool -
> _1343968404 -
> CompObj <(0x01)CompObj> [105 / 0x69]
> ObjInfo <(0x03)ObjInfo> [6 / 0x6]
> Equation Native [71 / 0x47]
> Ole <(0x01)Ole> [20 / 0x14]
>
> but, my question is how exactly do I get text like this (When I try to
> extract embedded equations as plain text using WordExtractor)
>
> !!EMBED Equation.3
>
> to match the same object list above? only by orders?
>
> Another question is can I insert directly using the OLE Object into a
> HSFLSlideShow to avoid the time spend on extracting and inserting into a
> .ppt? Or is it just better I extract the equations as images and insert it
> into a HSFLSlideShow than insert the OLE Object directly?
>
--
Sergey Vladimirov
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org