You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jeremy Anderson (Created) (JIRA)" <ji...@apache.org> on 2011/11/29 18:17:40 UTC

[jira] [Created] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

[PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()
---------------------------------------------------------------------------------------------------

                 Key: TIKA-795
                 URL: https://issues.apache.org/jira/browse/TIKA-795
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.1
            Reporter: Jeremy Anderson


POI-3.8-beta5-daily exposed bug after poi.revision 1190347.  (POI bugzilla bug #52262 already opened for root cause).

Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.

I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.

java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

Posted by "Jeremy Anderson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160050#comment-13160050 ] 

Jeremy Anderson commented on TIKA-795:
--------------------------------------

Thanks Nick.

Do you know if the modification by yegor in POI for the return type of getMasterSheet() from XSLFSlideMaster to XSLFSlideLayout was done on purpose for a reason or is it truly a bug? (I did miss that the getSlideMaster() also exposes the XSLFSlideMaster, thanks.)

If you think that this change in POI was correct, then I'll go ahead and close my POI Bugzilla 52262 bug.

Noting the return type for the classes overriding the XSLFSheet.getMasterSheet() method:

CLASS                 RETURN TYPE
XSLFNotes               XSLFSheet
XSLFNotesMaster         XSLFSheet
XSLFSlide               XSLFSlideLayout
XSLFSlideLayout         XSLFSlideMaster
XSLFSlideMaster         XSLFSheet

You can also probably go ahead and close this issue noting it is fixed by the Tika-700 patch.
                
> [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-795
>                 URL: https://issues.apache.org/jira/browse/TIKA-795
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Jeremy Anderson
>              Labels: patch, poi
>         Attachments: Patch_795_XSLF.patch, testWORD_embeded.docx
>
>
> POI-3.8-beta5-daily exposed bug after poi.revision 1190347.  (POI bugzilla bug #52262 already opened for root cause).
> Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.
> I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.
> java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
> 	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
> 	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> 	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

Posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-795.
-----------------------------

    Resolution: Duplicate

Resolving as a duplicate of TIKA-700, as the change was deliberate and the patch on TIKA-700 covers this
                
> [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-795
>                 URL: https://issues.apache.org/jira/browse/TIKA-795
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Jeremy Anderson
>              Labels: patch, poi
>         Attachments: Patch_795_XSLF.patch, testWORD_embeded.docx
>
>
> POI-3.8-beta5-daily exposed bug after poi.revision 1198658.  (POI bugzilla bug #52262 already opened for root cause).
> Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.
> I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.
> java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
> 	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
> 	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> 	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159397#comment-13159397 ] 

Nick Burch commented on TIKA-795:
---------------------------------

We are going to want this variable though, as it's needed for TIKA-712 (once we're able to re-enable that)

I'd suggest you use the patch I uploaded to TIKA-700
                
> [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-795
>                 URL: https://issues.apache.org/jira/browse/TIKA-795
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Jeremy Anderson
>              Labels: patch, poi
>         Attachments: Patch_795_XSLF.patch, testWORD_embeded.docx
>
>
> POI-3.8-beta5-daily exposed bug after poi.revision 1190347.  (POI bugzilla bug #52262 already opened for root cause).
> Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.
> I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.
> java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
> 	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
> 	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> 	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

Posted by "Jeremy Anderson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Anderson updated TIKA-795:
---------------------------------

    Comment: was deleted

(was: Thanks Nick.

Do you know if the modification by yegor in POI for the return type of getMasterSheet() from XSLFSlideMaster to XSLFSlideLayout was done on purpose for a reason or is it truly a bug? (I did miss that the getSlideMaster() also exposes the XSLFSlideMaster, thanks.)

If you think that this change in POI was correct, then I'll go ahead and close my POI Bugzilla 52262 bug.

Noting the return type for the classes overriding the XSLFSheet.getMasterSheet() method:

CLASS                 RETURN TYPE
XSLFNotes               XSLFSheet
XSLFNotesMaster         XSLFSheet
XSLFSlide               XSLFSlideLayout
XSLFSlideLayout         XSLFSlideMaster
XSLFSlideMaster         XSLFSheet

You can also probably go ahead and close this issue noting it is fixed by the Tika-700 patch.)
    
> [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-795
>                 URL: https://issues.apache.org/jira/browse/TIKA-795
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Jeremy Anderson
>              Labels: patch, poi
>         Attachments: Patch_795_XSLF.patch, testWORD_embeded.docx
>
>
> POI-3.8-beta5-daily exposed bug after poi.revision 1198658.  (POI bugzilla bug #52262 already opened for root cause).
> Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.
> I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.
> java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
> 	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
> 	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> 	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

Posted by "Jeremy Anderson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Anderson updated TIKA-795:
---------------------------------

    Description: 
POI-3.8-beta5-daily exposed bug after poi.revision 1198658.  (POI bugzilla bug #52262 already opened for root cause).

Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.

I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.

java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)




  was:
POI-3.8-beta5-daily exposed bug after poi.revision 1190347.  (POI bugzilla bug #52262 already opened for root cause).

Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.

I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.

java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)




    
> [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-795
>                 URL: https://issues.apache.org/jira/browse/TIKA-795
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Jeremy Anderson
>              Labels: patch, poi
>         Attachments: Patch_795_XSLF.patch, testWORD_embeded.docx
>
>
> POI-3.8-beta5-daily exposed bug after poi.revision 1198658.  (POI bugzilla bug #52262 already opened for root cause).
> Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.
> I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.
> java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
> 	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
> 	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> 	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

Posted by "Jeremy Anderson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Anderson updated TIKA-795:
---------------------------------

    Attachment: testWORD_embeded.docx
                Patch_795_XSLF.patch

Patch to remove unused variable.  Example multi-embedded word document used by RecursiveMetadataParser to expose issue.
                
> [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-795
>                 URL: https://issues.apache.org/jira/browse/TIKA-795
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Jeremy Anderson
>              Labels: patch, poi
>         Attachments: Patch_795_XSLF.patch, testWORD_embeded.docx
>
>
> POI-3.8-beta5-daily exposed bug after poi.revision 1190347.  (POI bugzilla bug #52262 already opened for root cause).
> Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.
> I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.
> java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
> 	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
> 	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> 	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-795) [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

Posted by "Jeremy Anderson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160052#comment-13160052 ] 

Jeremy Anderson commented on TIKA-795:
--------------------------------------

Thanks nick.  Yegor follwed up on this issue on POI's side and confirmed the changes done there were on purpose and that using the other method in the TiKA-700 patch is correct.

Can you close this issue noting that Tika-700 resolves it?  

Thanks again.
                
> [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-795
>                 URL: https://issues.apache.org/jira/browse/TIKA-795
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Jeremy Anderson
>              Labels: patch, poi
>         Attachments: Patch_795_XSLF.patch, testWORD_embeded.docx
>
>
> POI-3.8-beta5-daily exposed bug after poi.revision 1198658.  (POI bugzilla bug #52262 already opened for root cause).
> Bug was discovered using Daily builds of both TIKA and POI.  Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making use of this call with an unused variable.
> I've included a patch file which removes the instance of the unused variable.  An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.
> java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
> 	at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
> 	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> 	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
> 	at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
> 	at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> 	at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira