You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Yury Batrakov <ba...@gmail.com> on 2008/03/31 14:20:01 UTC
Microsoft Graph objects
Is there any way to extract MS Graph objects from Office documents using POI?
First of all I'm interested in MS Graph charts, Organisation charts and WordArt.
Looks like HSLF can extract these objects, but other not. Am I right?
Are there plans to support them in other formats?
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Re[2]: Microsoft Graph objects
Posted by Yury Batrakov <ba...@gmail.com>.
Looks like it is Escher - this chart can be extracted and saved
getPicturesTable() and company, but this code:
InputStream stream = fs.createDocumentInputStream(dir,
workbookName); // modified version, tested ok on Word and Excel embeds
EventRecordFactory factory = new EventRecordFactory();
List records = RecordFactory.createRecords(stream);
causes:
Exception in thread "main" java.lang.RuntimeException:
org.apache.poi.hssf.record.RecordFormatException: Unable to construct
record instance
at ru.mera.ofa.ReadOLE$MyPOIFSReaderListener.processPOIFSReaderEvent(ReadOLE.java:118)
at org.apache.poi.poifs.eventfilesystem.POIFSReader.processProperties(POIFSReader.java:261)
at org.apache.poi.poifs.eventfilesystem.POIFSReader.processProperties(POIFSReader.java:230)
at org.apache.poi.poifs.eventfilesystem.POIFSReader.processProperties(POIFSReader.java:230)
at org.apache.poi.poifs.eventfilesystem.POIFSReader.read(POIFSReader.java:97)
at ru.mera.ofa.ReadOLE.main(ReadOLE.java:84)
Caused by: org.apache.poi.hssf.record.RecordFormatException: Unable to
construct record instance
at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:199)
at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:117)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:285)
at ru.mera.ofa.ReadOLE$MyPOIFSReaderListener.processPOIFSReaderEvent(ReadOLE.java:111)
... 5 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:187)
... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:132)
at org.apache.poi.hssf.record.RecordInputStream.readShort(RecordInputStream.java:152)
at org.apache.poi.hssf.record.WindowOneRecord.fillFields(WindowOneRecord.java:94)
at org.apache.poi.hssf.record.Record.<init>(Record.java:55)
at org.apache.poi.hssf.record.WindowOneRecord.<init>(WindowOneRecord.java:76)
... 13 more
I'll try org.apache.poi.ddf.EscherDump, thank you!
On 4/1/08, Yegor Kozlov <ye...@dinom.ru> wrote:
> It should be either in Excel or in Escher format.
>
> In the first case it should be readable by
> org.apache.poi.hssf.record.RecordFactory.createRecords(inputstream).
> See if you can read the contents of the Workbook stream this way:
>
> List records = RecordFactory.createRecords(new ByteArrayInputStrema(ole_bytes));
> where ole_bytes is what you read from the OLE stream.
>
> In the second case try the same idea with org.apache.poi.ddf.EscherDump.
>
>
> Yegor
>
>
> > Nick, Yegor,
>
> > Thanks for your replies. I've also tried to get Charts from Word: they
> > are stored as Workbook stream in OLE fs, but don't contain either
> > \005DocumentSummaryInformation or \005SummaryInformation. I've
> > commented readProperties() in HSSFWorkbook constructor but it failed
> > to construct Excel spreadsheet from it. Is it reasonable for me to
> > continue investigation in this way or these objects aren't Excel at
> > all?
>
>
> > On 3/31/08, Yegor Kozlov <ye...@dinom.ru> wrote:
> >>
> >>
> >> In Excel MS Graph and Organization Chart are intrinsic objects. Basically you need to
> >> iterate through worksheet records and look for the appropriate records.
> >> (ChartRecord, ChartTitleFormatRecord, ChartFormatRecord, etc).
> >>
> >> In HSLF MS Graph is an embedded OLE object. You can get it in raw
> >> format using HSLFSlideShow.getEmbeddedObjects(). For now that's all.
> >> We don't have a high level API for it.
> >> Organization Chart is just a group of shapes. It should be accessible
> >> using Slide.getShapes(). There should be a flag indicating that this
> >> group is a Organization Chart but I didn't figure it out.
> >>
> >>
> >> Regards,
> >>
> >> Yegor
> >>
> >>
> >> > Is there any way to extract MS Graph objects from Office documents using POI?
> >> > First of all I'm interested in MS Graph charts, Organisation charts and WordArt.
> >>
> >> > Looks like HSLF can extract these objects, but other not. Am I right?
> >> > Are there plans to support them in other formats?
> >>
> >>
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> >> > For additional commands, e-mail: user-help@poi.apache.org
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> >> For additional commands, e-mail: user-help@poi.apache.org
> >>
> >>
>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> > For additional commands, e-mail: user-help@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re[2]: Microsoft Graph objects
Posted by Yegor Kozlov <ye...@dinom.ru>.
It should be either in Excel or in Escher format.
In the first case it should be readable by
org.apache.poi.hssf.record.RecordFactory.createRecords(inputstream).
See if you can read the contents of the Workbook stream this way:
List records = RecordFactory.createRecords(new ByteArrayInputStrema(ole_bytes));
where ole_bytes is what you read from the OLE stream.
In the second case try the same idea with org.apache.poi.ddf.EscherDump.
Yegor
> Nick, Yegor,
> Thanks for your replies. I've also tried to get Charts from Word: they
> are stored as Workbook stream in OLE fs, but don't contain either
> \005DocumentSummaryInformation or \005SummaryInformation. I've
> commented readProperties() in HSSFWorkbook constructor but it failed
> to construct Excel spreadsheet from it. Is it reasonable for me to
> continue investigation in this way or these objects aren't Excel at
> all?
> On 3/31/08, Yegor Kozlov <ye...@dinom.ru> wrote:
>>
>>
>> In Excel MS Graph and Organization Chart are intrinsic objects. Basically you need to
>> iterate through worksheet records and look for the appropriate records.
>> (ChartRecord, ChartTitleFormatRecord, ChartFormatRecord, etc).
>>
>> In HSLF MS Graph is an embedded OLE object. You can get it in raw
>> format using HSLFSlideShow.getEmbeddedObjects(). For now that's all.
>> We don't have a high level API for it.
>> Organization Chart is just a group of shapes. It should be accessible
>> using Slide.getShapes(). There should be a flag indicating that this
>> group is a Organization Chart but I didn't figure it out.
>>
>>
>> Regards,
>>
>> Yegor
>>
>>
>> > Is there any way to extract MS Graph objects from Office documents using POI?
>> > First of all I'm interested in MS Graph charts, Organisation charts and WordArt.
>>
>> > Looks like HSLF can extract these objects, but other not. Am I right?
>> > Are there plans to support them in other formats?
>>
>>
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> > For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Microsoft Graph objects
Posted by Yury Batrakov <ba...@gmail.com>.
Nick, Yegor,
Thanks for your replies. I've also tried to get Charts from Word: they
are stored as Workbook stream in OLE fs, but don't contain either
\005DocumentSummaryInformation or \005SummaryInformation. I've
commented readProperties() in HSSFWorkbook constructor but it failed
to construct Excel spreadsheet from it. Is it reasonable for me to
continue investigation in this way or these objects aren't Excel at
all?
On 3/31/08, Yegor Kozlov <ye...@dinom.ru> wrote:
>
>
> In Excel MS Graph and Organization Chart are intrinsic objects. Basically you need to
> iterate through worksheet records and look for the appropriate records.
> (ChartRecord, ChartTitleFormatRecord, ChartFormatRecord, etc).
>
> In HSLF MS Graph is an embedded OLE object. You can get it in raw
> format using HSLFSlideShow.getEmbeddedObjects(). For now that's all.
> We don't have a high level API for it.
> Organization Chart is just a group of shapes. It should be accessible
> using Slide.getShapes(). There should be a flag indicating that this
> group is a Organization Chart but I didn't figure it out.
>
>
> Regards,
>
> Yegor
>
>
> > Is there any way to extract MS Graph objects from Office documents using POI?
> > First of all I'm interested in MS Graph charts, Organisation charts and WordArt.
>
> > Looks like HSLF can extract these objects, but other not. Am I right?
> > Are there plans to support them in other formats?
>
>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> > For additional commands, e-mail: user-help@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Microsoft Graph objects
Posted by Yegor Kozlov <ye...@dinom.ru>.
In Excel MS Graph and Organization Chart are intrinsic objects. Basically you need to
iterate through worksheet records and look for the appropriate records.
(ChartRecord, ChartTitleFormatRecord, ChartFormatRecord, etc).
In HSLF MS Graph is an embedded OLE object. You can get it in raw
format using HSLFSlideShow.getEmbeddedObjects(). For now that's all.
We don't have a high level API for it.
Organization Chart is just a group of shapes. It should be accessible
using Slide.getShapes(). There should be a flag indicating that this
group is a Organization Chart but I didn't figure it out.
Regards,
Yegor
> Is there any way to extract MS Graph objects from Office documents using POI?
> First of all I'm interested in MS Graph charts, Organisation charts and WordArt.
> Looks like HSLF can extract these objects, but other not. Am I right?
> Are there plans to support them in other formats?
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: Microsoft Graph objects
Posted by Nick Burch <ni...@torchbox.com>.
On Mon, 31 Mar 2008, Yury Batrakov wrote:
> Is there any way to extract MS Graph objects from Office documents using
> POI?
Yes, but generally only in their low level (record) form. There isn't yet
any support for turning the records into high level, easy to work with
objects.
You'll want to read up on Escher, which is what microsoft call their
office drawing stuff.
http://www.microsoft.com/interop/docs/OfficeBinaryFormats.mspx#ECC
Be warned though, it's all pretty nasty stuff. I had a try at processing
some of the simpler shape records into higher level objects, and was
defeated :(
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org