You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2013/05/01 04:07:56 UTC

[Bug 54916] New: [PATCH] POI does not always read all the slides in pptx files

https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

            Bug ID: 54916
           Summary: [PATCH] POI does not always read all the slides in
                    pptx files
           Product: POI
           Version: 4.0-dev
          Hardware: PC
                OS: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: XSLF
          Assignee: dev@poi.apache.org
          Reporter: dustin@virtualroadside.com
    Classification: Unclassified

Created attachment 30248
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=30248&action=edit
Patch for bug

I believe this bug may affect all XML-based formats that POI supports, but my
specific use case is with pptx files.

For some reason, a number of pptx files that I was given contain slides with
rels that contain relationships to other slides in the
ppt/slides/_rels/slideX.xml.rels files. Microsoft Office 2010 is able to deal
with these files and open them correctly.

When POI encounters such a file, generally it will omit one or more of the
slides when you call getSlides() on the XMLSlideShow object. When this occurs,
error messages such as "Slide with r:id 256 was defined, but didn't exist in
package, skipping" will be logged. 

I believe what is happening is during building of the parse tree, POI treats
all references/relationships as equal, and if there are two or more
relationships that occur in the file, then POI uses the first occurrence to set
the package relationship field. If the first time that POI runs across it is as
a child reference, then the package relationship field is set incorrectly, and
later on POI sometimes may not be able to resolve it correctly. 

I've determined that the simplest fix is to first parse all elements at a given
level, before trying to resolve elements at lower levels. This fixes my
problem, and doesn't break any unit tests.

The attached patch contains the fix, and a sample pptx file that exhibits this
behavior with a very simple test case.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

--- Comment #8 from Tim Allison <ta...@mitre.org> ---
http://social.msdn.microsoft.com/Forums/office/en-US/8982d0bb-9469-4f9c-985d-57fadcc9a655/apparent-duplicate-id-in-relationship-office-open-xml-elements?forum=oxmlsdk

Looks like we'll need to store relationship ids per "relationships" object? 
Ugh...

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

vladk <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #7 from vladk <vl...@gmail.com> ---
There are two slide masters:
slideMaster1.xml / rId8 and slideMaster2.xml / rId9, but they both have the
same rId1 when referencing from their slideLayouts.

presentation.xml.rels:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships
xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId8"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideMaster"
Target="slideMasters/slideMaster1.xml"/>
    <Relationship Id="rId13"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide4.xml"/>
    <Relationship Id="rId18"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide9.xml"/>
    <Relationship Id="rId26"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide17.xml"/>
    <Relationship Id="rId39"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/tags"
Target="tags/tag1.xml"/>
    <Relationship Id="rId3"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
Target="../customXml/item3.xml"/>
    <Relationship Id="rId21"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide12.xml"/>
    <Relationship Id="rId34"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide25.xml"/>
    <Relationship Id="rId42"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme"
Target="theme/theme1.xml"/>
    <Relationship Id="rId7"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
Target="../customXml/item7.xml"/>
    <Relationship Id="rId12"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide3.xml"/>
    <Relationship Id="rId17"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide8.xml"/>
    <Relationship Id="rId25"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide16.xml"/>
    <Relationship Id="rId33"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide24.xml"/>
    <Relationship Id="rId38"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/notesMaster"
Target="notesMasters/notesMaster1.xml"/>
    <Relationship Id="rId2"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
Target="../customXml/item2.xml"/>
    <Relationship Id="rId16"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide7.xml"/>
    <Relationship Id="rId20"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide11.xml"/>
    <Relationship Id="rId29"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide20.xml"/>
    <Relationship Id="rId41"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps"
Target="viewProps.xml"/>
    <Relationship Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
Target="../customXml/item1.xml"/>
    <Relationship Id="rId6"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
Target="../customXml/item6.xml"/>
    <Relationship Id="rId11"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide2.xml"/>
    <Relationship Id="rId24"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide15.xml"/>
    <Relationship Id="rId32"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide23.xml"/>
    <Relationship Id="rId37"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide28.xml"/>
    <Relationship Id="rId40"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps"
Target="presProps.xml"/>
    <Relationship Id="rId5"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
Target="../customXml/item5.xml"/>
    <Relationship Id="rId15"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide6.xml"/>
    <Relationship Id="rId23"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide14.xml"/>
    <Relationship Id="rId28"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide19.xml"/>
    <Relationship Id="rId36"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide27.xml"/>
    <Relationship Id="rId10"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide1.xml"/>
    <Relationship Id="rId19"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide10.xml"/>
    <Relationship Id="rId31"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide22.xml"/>
    <Relationship Id="rId4"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml"
Target="../customXml/item4.xml"/>
    <Relationship Id="rId9"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideMaster"
Target="slideMasters/slideMaster2.xml"/>
    <Relationship Id="rId14"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide5.xml"/>
    <Relationship Id="rId22"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide13.xml"/>
    <Relationship Id="rId27"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide18.xml"/>
    <Relationship Id="rId30"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide21.xml"/>
    <Relationship Id="rId35"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide"
Target="slides/slide26.xml"/>
    <Relationship Id="rId43"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/tableStyles"
Target="tableStyles.xml"/>
</Relationships>

slideLayout1.xml.rels:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships
xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId3"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="../media/image3.emf"/>
    <Relationship Id="rId2"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="../media/image2.jpeg"/>
    <Relationship Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideMaster"
Target="../slideMasters/slideMaster1.xml"/>
</Relationships>

slidelayout17.xml.rels:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships
xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId2"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="../media/image3.emf"/>
    <Relationship Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideMaster"
Target="../slideMasters/slideMaster2.xml"/>
</Relationships>

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

--- Comment #5 from vladk <vl...@gmail.com> ---
I am observing a related issue when loading one specific PPTX file, that I may
not post here, unfortunately.

Some slideMasters cannot be retrieved via XmlSlideShow.getSlideMasters(). The
problem is in org.apache.poi.xslf.usermodel.XMLSlideShow.onDocumentRead() where
the _masters instance variable is being populated.

In my case a possible fix would be to replace:

_masters.put(p.getPackageRelationship().getId(), master);

by

_masters.put(getRelationId(p), master);

This change should fix the problem for slideMasters only.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

--- Comment #1 from virtuald <du...@virtualroadside.com> ---
Created attachment 30249
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=30249&action=edit
Updated patch with diagnostic warning log messages

Some testing has shown that other element types sometimes get duplicated,
particularly slideMaster elements. One of the problems with diagnosing this
problem is that POI doesn't try to detect when this problem occurs, which could
be useful for trying to figure out why corrupted pptx files aren't working.
I've updated my patch with some log messages that can assist when things aren't
quite how they should be.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #6 from Nick Burch <ap...@gagravarr.org> ---
(In reply to vladk from comment #5)
> I am observing a related issue when loading one specific PPTX file, that I
> may not post here, unfortunately.

Any chance you could unzip the .pptx file, then zip up just the _rels files and
post that? That should let us check if it's the same issue, without sharing any
of the actual content.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

--- Comment #3 from Andreas Beeker <an...@gmx.de> ---
That's really a pity ... I've implemented exactly the same patch (apart of the
checks in XMLSlideShow ...) and was just before committing it ...

I think, the duplicated relations (but with different parent/ref-id) should be
handled through some kind of proxy or delegate, so when the part is called from
the outside, it will behave (i.e. return its ref-ids) like it would belong to
the embracing part ...

I have a look (and hope) to patch this soon ...

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

vladk <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vladk.dev@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

--- Comment #2 from Nick Burch <ap...@gagravarr.org> ---
I've added a disabled unit test in r1496696, based on your supplied file

Patch still needs a little bit more thinking, and possibly a deeper fix...

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 54916] [PATCH] POI does not always read all the slides in pptx files

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54916

--- Comment #4 from Andreas Beeker <an...@gmx.de> ---
Created attachment 31296
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=31296&action=edit
delegate-example to handle parent relations

In the attached example I'd like to demonstrate what I want to change in those
many subclasses of POIXMLDocumentPart.

The main problem (as stated in the bug thread) is, that the current
implementation doesn't track the (incoming or parent) package relations right,
i.e. a referenced child has only one package relation, which is definitely not
true for shared objects like slide masters/layouts.

The straightforward implementation would be, to deprecate
POIXMLDocumentPart.getPackageRelationship() and introduce a
POIXMLDocumentPart.getPackageRelationship(POIXMLDocumentPart parent) method
instead, but as the code is already out in the wild, this is probably not the
way to go.

So instead I'd like to introduce a delegate mechanism. When the objects are
linked together in POIXMLDocumentPart.read() this would take to care to create
the necessary delegates.

The delegate itself is a simple inner class extending the outer class so
instance-of checks stay valid and the delegate can be used the same way the
original class was.

Instead of duplicating all method calls in the inner class and maybe forgetting
to insert a call when a new method is introduced, I thought it's better to
handle it via the following method prefix ...

public sometype methodname(args ...) {
   if (isDelegate()) {
      return getThis().methodname(args ...);
   }
   // and now the normal method content
}

So what do you think about this?
(... or maybe I missed a point of how these ooxml classes work ...)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org