You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Tucker B <ba...@gmail.com> on 2019/05/29 10:56:49 UTC
StreamingZipContainerDetector XLSX template workbook
After upgrading to Tika 1.21 I have noticed several known XLSX files
are detected by Tika as "application/x-tika-ooxml". I think I've
narrowed it down to the new StreamingZipContainerDetector. After
inspecting the "[Content_Types].xml" of these XLSX files there is no
reference to any of the configured content types for XLSX in the
OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
"application/vnd.ms-excel.sheet.macroEnabled.main+xml"
"application/vnd.ms-excel.sheet.binary.macroEnabled.main"
I do see a content type of
"application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"
in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?
Re: StreamingZipContainerDetector XLSX template workbook
Posted by Tim Allison <ta...@apache.org>.
Tucker,
This should be fixed now in branch_1x and master. Let me know if
you'd like to try with a nightly build. Many thanks for the report!
Cheers,
Tim
On Wed, May 29, 2019 at 6:57 AM Tucker B <ba...@gmail.com> wrote:
>
> After upgrading to Tika 1.21 I have noticed several known XLSX files
> are detected by Tika as "application/x-tika-ooxml". I think I've
> narrowed it down to the new StreamingZipContainerDetector. After
> inspecting the "[Content_Types].xml" of these XLSX files there is no
> reference to any of the configured content types for XLSX in the
> OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,
>
> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
> "application/vnd.ms-excel.sheet.macroEnabled.main+xml"
> "application/vnd.ms-excel.sheet.binary.macroEnabled.main"
>
> I do see a content type of
>
> "application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"
>
> in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
> the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?
Re: StreamingZipContainerDetector XLSX template workbook
Posted by Tucker B <ba...@gmail.com>.
Thanks for creating the JIRA ticket.
On Wed, May 29, 2019 at 6:23 PM Tim Allison <ta...@apache.org> wrote:
>
> Ugh. Thank you!
>
> https://issues.apache.org/jira/browse/TIKA-2886
>
> On Wed, May 29, 2019 at 6:57 AM Tucker B <ba...@gmail.com> wrote:
> >
> > After upgrading to Tika 1.21 I have noticed several known XLSX files
> > are detected by Tika as "application/x-tika-ooxml". I think I've
> > narrowed it down to the new StreamingZipContainerDetector. After
> > inspecting the "[Content_Types].xml" of these XLSX files there is no
> > reference to any of the configured content types for XLSX in the
> > OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,
> >
> > "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
> > "application/vnd.ms-excel.sheet.macroEnabled.main+xml"
> > "application/vnd.ms-excel.sheet.binary.macroEnabled.main"
> >
> > I do see a content type of
> >
> > "application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"
> >
> > in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
> > the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?
Re: StreamingZipContainerDetector XLSX template workbook
Posted by Tim Allison <ta...@apache.org>.
Ugh. Thank you!
https://issues.apache.org/jira/browse/TIKA-2886
On Wed, May 29, 2019 at 6:57 AM Tucker B <ba...@gmail.com> wrote:
>
> After upgrading to Tika 1.21 I have noticed several known XLSX files
> are detected by Tika as "application/x-tika-ooxml". I think I've
> narrowed it down to the new StreamingZipContainerDetector. After
> inspecting the "[Content_Types].xml" of these XLSX files there is no
> reference to any of the configured content types for XLSX in the
> OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,
>
> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
> "application/vnd.ms-excel.sheet.macroEnabled.main+xml"
> "application/vnd.ms-excel.sheet.binary.macroEnabled.main"
>
> I do see a content type of
>
> "application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"
>
> in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
> the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?