You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Brian McColgan (JIRA)" <ji...@apache.org> on 2018/02/24 16:01:00 UTC
[jira] [Created] (TIKA-2588) Tika detecting/parsing pptx with
embedded Excel worksheet(s)...
Brian McColgan created TIKA-2588:
------------------------------------
Summary: Tika detecting/parsing pptx with embedded Excel worksheet(s)...
Key: TIKA-2588
URL: https://issues.apache.org/jira/browse/TIKA-2588
Project: Tika
Issue Type: Bug
Components: detector, parser
Affects Versions: 1.17
Environment:
Reporter: Brian McColgan
Attachments: foo.out, pptEmbedExcelDoubleClickFromWorkbook.PNG, pptEmbedExcelInEmptyWorkbook.PNG, tikaSample.pptx
Hello tika-developers,
First, a big 'thank-you' for creating and maintaining Apache-Tika! A really useful capability/service that can be used in so many different ways. You folks are the true Debabelizer (h2g2.com).
On to issue-encountered: using Tika 1.17 to extract an embedded Excel object out of a pptx is causing issues. Simple example attached to this Jira-issue ([^tikaSample.pptx]) which if run against Tika 1.17 (with verbose/list-parsers/list-detectors) provides the output in ([^foo.out]). The deck contains a title slide, and a single-slide with embedded Excel object on it.
As noted to [~gagravarr] on S-Overflow, I grabbed the unit-test data which you use in your parser/office JUnit suite (test_ppt_embedded_two_slides.pptx) and tried opening in Office/PPT 2016. I selected (with mouse) the embedded sheet (had Alfresco logo in it) and pasted it into an empty Office/Excel 2016 workbook. When I tried to interact with it, I had to double-click to make it active. As a result, I ended up with two Excel instances on my Windows 10 desktop (the original object in 1, the Excel worksheet in another). I have included a picture of the embedded Excel object pasted into the workbook... !pptEmbedExcelInEmptyWorkbook.PNG! ).
followed by the worksheet opened inside the workbook (required double-click within the black-bordered area in the first pic above):
!pptEmbedExcelDoubleClickFromWorkbook.PNG!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)