You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Carl Lee <lj...@gmail.com> on 2011/07/27 10:55:20 UTC

How to extract Embedded Equations inside a .doc file?

I am trying to use "Apache POI" to extract embedded equation and text from a
.doc MS Word file into a .ppt MS Powerpoint file, I have successfully
extracted text, but how do I extract embedded equations?

the Embedded Equations comes out like this if I only extract it as text:

!!EMBED Equation.3

Re: How to extract Embedded Equations inside a .doc file?

Posted by Carl Lee <lj...@gmail.com>.
Hi, I have got the following information from POIFSFileSystem and
POIFSLister class

Root Entry -
  SummaryInformation <(0x05)SummaryInformation> [424 / 0x1a8]
  DocumentSummaryInformation <(0x05)DocumentSummaryInformation> [220 / 0xdc]
  WordDocument [33475 / 0x82c3]
  1Table [13981 / 0x369d]
  ObjectPool -
    _1343968404 -
      CompObj <(0x01)CompObj> [105 / 0x69]
      ObjInfo <(0x03)ObjInfo> [6 / 0x6]
      Equation Native [71 / 0x47]
      Ole <(0x01)Ole> [20 / 0x14]

but, my question is how exactly do I get text like this

!!EMBED Equation.3

to match the same object list above? only by orders?

Another question is can I insert directly using the OLE Object into a
HSFLSlideShow to avoid the time spend on extracting and inserting into a
.ppt? Or is it just better I extract the equations as images and insert it
into a HSFLSlideShow than insert the OLE Object directly?

Thank you in advance

On Thu, Jul 28, 2011 at 9:51 PM, Carl Lee <lj...@gmail.com> wrote:

> I looked at the HWPF APIs but can't find any class correspondent for "OLE
> objects" , could you please instruct me further on which class should I use
> for extracting "OLE objects" from a HWPF document?
>
>
> On Thu, Jul 28, 2011 at 9:39 PM, Carl Lee <lj...@gmail.com> wrote:
>
>> I don't know how to insert images into PPT already, but there's an example
>> for that in the source code, I'll look into that. So thank you very much for
>> replying, I'll try extract OLE objects from my document, I'll let you know
>> if I encountered any more problem.
>>
>>
>> On Thu, Jul 28, 2011 at 1:30 AM, Sergey Vladimirov <vl...@gmail.com>wrote:
>>
>>> Carl Lee,
>>>
>>> Each equation is OLE object, more specifically MathType Equation OLE
>>> object. You can extract those OLE objects from Word document and/or
>>> you can extract image related to particular equation.
>>>
>>> Do you know how to insert image into PPT already?
>>>
>>> Best regards,
>>> Sergey
>>>
>>> On Wed, Jul 27, 2011 at 12:55 PM, Carl Lee <lj...@gmail.com> wrote:
>>> > I am trying to use "Apache POI" to extract embedded equation and text
>>> from a
>>> > .doc MS Word file into a .ppt MS Powerpoint file, I have successfully
>>> > extracted text, but how do I extract embedded equations?
>>> >
>>> > the Embedded Equations comes out like this if I only extract it as
>>> text:
>>> >
>>> > !!EMBED Equation.3
>>> >
>>>
>>>
>>>
>>> --
>>> Sergey Vladimirov
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>
>

Re: How to extract Embedded Equations inside a .doc file?

Posted by Carl Lee <lj...@gmail.com>.
I looked at the HWPF APIs but can't find any class correspondent for "OLE
objects" , could you please instruct me further on which class should I use
for extracting "OLE objects" from a HWPF document?

On Thu, Jul 28, 2011 at 9:39 PM, Carl Lee <lj...@gmail.com> wrote:

> I don't know how to insert images into PPT already, but there's an example
> for that in the source code, I'll look into that. So thank you very much for
> replying, I'll try extract OLE objects from my document, I'll let you know
> if I encountered any more problem.
>
>
> On Thu, Jul 28, 2011 at 1:30 AM, Sergey Vladimirov <vl...@gmail.com>wrote:
>
>> Carl Lee,
>>
>> Each equation is OLE object, more specifically MathType Equation OLE
>> object. You can extract those OLE objects from Word document and/or
>> you can extract image related to particular equation.
>>
>> Do you know how to insert image into PPT already?
>>
>> Best regards,
>> Sergey
>>
>> On Wed, Jul 27, 2011 at 12:55 PM, Carl Lee <lj...@gmail.com> wrote:
>> > I am trying to use "Apache POI" to extract embedded equation and text
>> from a
>> > .doc MS Word file into a .ppt MS Powerpoint file, I have successfully
>> > extracted text, but how do I extract embedded equations?
>> >
>> > the Embedded Equations comes out like this if I only extract it as text:
>> >
>> > !!EMBED Equation.3
>> >
>>
>>
>>
>> --
>> Sergey Vladimirov
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>

Re: How to extract Embedded Equations inside a .doc file?

Posted by Carl Lee <lj...@gmail.com>.
I don't know how to insert images into PPT already, but there's an example
for that in the source code, I'll look into that. So thank you very much for
replying, I'll try extract OLE objects from my document, I'll let you know
if I encountered any more problem.

On Thu, Jul 28, 2011 at 1:30 AM, Sergey Vladimirov <vl...@gmail.com>wrote:

> Carl Lee,
>
> Each equation is OLE object, more specifically MathType Equation OLE
> object. You can extract those OLE objects from Word document and/or
> you can extract image related to particular equation.
>
> Do you know how to insert image into PPT already?
>
> Best regards,
> Sergey
>
> On Wed, Jul 27, 2011 at 12:55 PM, Carl Lee <lj...@gmail.com> wrote:
> > I am trying to use "Apache POI" to extract embedded equation and text
> from a
> > .doc MS Word file into a .ppt MS Powerpoint file, I have successfully
> > extracted text, but how do I extract embedded equations?
> >
> > the Embedded Equations comes out like this if I only extract it as text:
> >
> > !!EMBED Equation.3
> >
>
>
>
> --
> Sergey Vladimirov
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: How to extract Embedded Equations inside a .doc file?

Posted by Sergey Vladimirov <vl...@gmail.com>.
Carl Lee,

Each equation is OLE object, more specifically MathType Equation OLE
object. You can extract those OLE objects from Word document and/or
you can extract image related to particular equation.

Do you know how to insert image into PPT already?

Best regards,
Sergey

On Wed, Jul 27, 2011 at 12:55 PM, Carl Lee <lj...@gmail.com> wrote:
> I am trying to use "Apache POI" to extract embedded equation and text from a
> .doc MS Word file into a .ppt MS Powerpoint file, I have successfully
> extracted text, but how do I extract embedded equations?
>
> the Embedded Equations comes out like this if I only extract it as text:
>
> !!EMBED Equation.3
>



-- 
Sergey Vladimirov

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org