You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2009/08/10 08:36:01 UTC

DO NOT REPLY [Bug 47668] New: OOXML is parsed as tree, but PPTX is a graph

https://issues.apache.org/bugzilla/show_bug.cgi?id=47668

           Summary: OOXML is parsed as tree, but PPTX is a graph
           Product: POI
           Version: 3.5-dev
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: critical
          Priority: P2
         Component: POI Overall
        AssignedTo: dev@poi.apache.org
        ReportedBy: stefan.stern@mind8.com


--- Comment #0 from Stefan Stern <st...@mind8.com> 2009-08-09 23:35:58 PDT ---
The 'POIXMLDocumentPart' and 'POIXMLDocument' parse an OOXML document by
seeking the main part of the OOXML, represented by an instance of a subclass of
'POIXMLDocument', and then invoke the method "read(POIXMLFactory factory)"
recursively on all relations to other PackageParts. 

This works fine in Excel and Word files, as these seem to be trees to a far
extent. In PowerPoint, the Slide, SlideLayout and SlideMaster form a graph. The
Slides have a relationship to the SlideLayout. SlideLayout has a relationship
to the SlideMaster. SlideMaster has a relationship to all SlideLayouts. And the
presentation has relations to all Slides and SlideMaster classes. 

When using the existing classes in my current XSLF-Implementation, I end up in
an endless loop. The only option to avoid this, is to pass a context object to
the loading classes, where all loaded PackagePart and their corresponding
XSLF-classes are chached. This allows to avoid any loops and every
POIXML-instance can be linked with its related parts. 

Storing is analogue, although a Set is sufficient to prevent endless loops.
When storing the document, a Set is passed as context. When invoking all
related parts recursively, only those not yet referenced by the Set are stored. 

The change in this behavior implies additional changes: Images used in multiple
places refernce from several places to the same PackagePart. There is no 1:1
mapping from appearance in the document and existence of PackageParts.
Currently, every relation ends up in a separate instance each time an image is
referenced, but all images point to the same PackagePart. This may cause weired
behavior when saving the document. 

Example: You have an Excel xlsx-file. This contains two sheets. On each sheet,
you have the same JPG image inserted. If you look into the file, you will see
that both sheet-object refer to the very same image-PackagePart. But in POI,
these references end up in two instances of XSSFPicture. But these two
instances refer to the same PackagePart and yet represent two different objects
in Excel. In case one of the images is altered, depending on the store-sequence
you may encounter an effect to the other image as well or your modification is
not persisted at all. 

See the patch attached to this bug to enable a 1:1 mapping between PackageParts
and POIXMLDocumentPart-objects. I will try to come up with a test-case for the
Excel example.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47668] OOXML is parsed as tree, but PPTX is a graph

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47668


Yegor Kozlov <ye...@dinom.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


--- Comment #3 from Yegor Kozlov <ye...@dinom.ru> 2009-08-12 12:03:00 PDT ---
fixed in r803667 with some tweaks:

 - POIXMLDocument.load is protected and final. User should neither override it
nor call via usermodel API, this method should only be called when constructing
documents. I made XSSFWorkbook and XWPFDocument use it.
 - entries in the context are keyed by PackageRelationship, not by PackagePart.
PackageRelationship is more appropriate as it overrides hashCode() and equals()
 - removed the old context-less onSave() and write()
 - added the unit test to TestXSSFWorkbook and also created
org.apache.poi.TestPOIXMLDocument to test generic parsing / saving of OOXML
packages

Regards,
Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47668] OOXML is parsed as tree, but PPTX is a graph

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47668


Stefan Stern <st...@mind8.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |VERIFIED


--- Comment #6 from Stefan Stern <st...@mind8.com> 2009-08-17 03:16:46 PDT ---
ok, works fine now :)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47668] OOXML is parsed as tree, but PPTX is a graph

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47668


Stefan Stern <st...@mind8.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


--- Comment #4 from Stefan Stern <st...@mind8.com> 2009-08-14 06:31:44 PDT ---
>  - entries in the context are keyed by PackageRelationship, not by PackagePart.
> PackageRelationship is more appropriate as it overrides hashCode() and equals()

PackageRelationship.hashCode method makes use of the ID of the relationship.
This will most likely always result in different hashes for different
PackageRelationship-objects, although these refer to the same PackagePart. 

In the context-map and context-set for loading and writing, the hashcode is
used when adding and checking whether element is already contained in list. 

Result is a very similiar behavior then before: each PackagePart has several
instances in memory, means I can find more than one XSLFSlide object for the
same PackagePart '/ppt/slide/slide1.xml', each one referenced by another part
(notesPart, slideMasterPart, etc). 

The only benefit now is, that each PackageRelationship-instance is added only
once, so there is not an endless-loop any more. 

Is there a major benefit in using PackageRelationship instead of the
PackagePart itself? Or is there a need to instantiate more than one XSLFSLide
object for the very same Slide-PackagePart?

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47668] OOXML is parsed as tree, but PPTX is a graph

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47668



--- Comment #2 from Stefan Stern <st...@mind8.com> 2009-08-10 00:26:02 PDT ---
Created an attachment (id=24121)
Eclipse project with a small testcase

This Project contains a testcase. As I was not able to find POI API to modify
the XSSFPictureData, the testcases purpose is to show that there are several
parts pointing to the same Picture, but  below is the very same PackagePart.
Once there is API to modify the Package, the underlying code must be aware that
manipulating the image might affect other parts.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47668] OOXML is parsed as tree, but PPTX is a graph

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47668


Yegor Kozlov <ye...@dinom.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


--- Comment #5 from Yegor Kozlov <ye...@dinom.ru> 2009-08-14 10:31:55 PDT ---
I see.
Use of PackageRelationship was a wrong decision, I changed it back to use
PackagePart as keys. 

I also improved TestPOIXMLDocument to assert that same logical parts correspond
to the same physical instances of POIXMLDocumentPart

Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 47668] OOXML is parsed as tree, but PPTX is a graph

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47668



--- Comment #1 from Stefan Stern <st...@mind8.com> 2009-08-10 00:23:17 PDT ---
Created an attachment (id=24120)
Suggested Patch on trunk revision 802651

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org