You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Tucker B <ba...@gmail.com> on 2019/05/29 10:56:49 UTC

StreamingZipContainerDetector XLSX template workbook

After upgrading to Tika 1.21 I have noticed several known XLSX files
are detected by Tika as "application/x-tika-ooxml". I think I've
narrowed it down to the new StreamingZipContainerDetector. After
inspecting the "[Content_Types].xml" of these XLSX files there is no
reference to any of the configured content types for XLSX in the
OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,

"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
"application/vnd.ms-excel.sheet.macroEnabled.main+xml"
"application/vnd.ms-excel.sheet.binary.macroEnabled.main"

I do see a content type of

"application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"

in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?

Re: StreamingZipContainerDetector XLSX template workbook

Posted by Tim Allison <ta...@apache.org>.
Tucker,

This should be fixed now in branch_1x and master.  Let me know if
you'd like to try with a nightly build.  Many thanks for the report!

Cheers,

          Tim

On Wed, May 29, 2019 at 6:57 AM Tucker B <ba...@gmail.com> wrote:
>
> After upgrading to Tika 1.21 I have noticed several known XLSX files
> are detected by Tika as "application/x-tika-ooxml". I think I've
> narrowed it down to the new StreamingZipContainerDetector. After
> inspecting the "[Content_Types].xml" of these XLSX files there is no
> reference to any of the configured content types for XLSX in the
> OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,
>
> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
> "application/vnd.ms-excel.sheet.macroEnabled.main+xml"
> "application/vnd.ms-excel.sheet.binary.macroEnabled.main"
>
> I do see a content type of
>
> "application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"
>
> in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
> the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?

Re: StreamingZipContainerDetector XLSX template workbook

Posted by Tucker B <ba...@gmail.com>.
Thanks for creating the JIRA ticket.

On Wed, May 29, 2019 at 6:23 PM Tim Allison <ta...@apache.org> wrote:
>
> Ugh.  Thank you!
>
> https://issues.apache.org/jira/browse/TIKA-2886
>
> On Wed, May 29, 2019 at 6:57 AM Tucker B <ba...@gmail.com> wrote:
> >
> > After upgrading to Tika 1.21 I have noticed several known XLSX files
> > are detected by Tika as "application/x-tika-ooxml". I think I've
> > narrowed it down to the new StreamingZipContainerDetector. After
> > inspecting the "[Content_Types].xml" of these XLSX files there is no
> > reference to any of the configured content types for XLSX in the
> > OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,
> >
> > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
> > "application/vnd.ms-excel.sheet.macroEnabled.main+xml"
> > "application/vnd.ms-excel.sheet.binary.macroEnabled.main"
> >
> > I do see a content type of
> >
> > "application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"
> >
> > in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
> > the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?

Re: StreamingZipContainerDetector XLSX template workbook

Posted by Tim Allison <ta...@apache.org>.
Ugh.  Thank you!

https://issues.apache.org/jira/browse/TIKA-2886

On Wed, May 29, 2019 at 6:57 AM Tucker B <ba...@gmail.com> wrote:
>
> After upgrading to Tika 1.21 I have noticed several known XLSX files
> are detected by Tika as "application/x-tika-ooxml". I think I've
> narrowed it down to the new StreamingZipContainerDetector. After
> inspecting the "[Content_Types].xml" of these XLSX files there is no
> reference to any of the configured content types for XLSX in the
> OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,
>
> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
> "application/vnd.ms-excel.sheet.macroEnabled.main+xml"
> "application/vnd.ms-excel.sheet.binary.macroEnabled.main"
>
> I do see a content type of
>
> "application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"
>
> in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
> the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?