You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/12/22 06:35:13 UTC
[jira] [Commented] (TIKA-976) Inaccurate XLS detection trough
POIFSContainerDetector
[ https://issues.apache.org/jira/browse/TIKA-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255450#comment-14255450 ]
Nick Burch commented on TIKA-976:
---------------------------------
This is now being handled fully through TIKA-1490
> Inaccurate XLS detection trough POIFSContainerDetector
> ------------------------------------------------------
>
> Key: TIKA-976
> URL: https://issues.apache.org/jira/browse/TIKA-976
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 1.2
> Reporter: Marco Quaranta
> Labels: detection, mime, poi, xls
> Fix For: 1.3
>
> Attachments: test_book.xls
>
>
> I've found an inaccurate detection with the attached xls file. POIFSContainerDetector is unable to determine the exact mimetype (vnd.ms-excel) and returns the generic "x-tika-msoffice". This is due to the fact this file's root names are :[Book, DocumentSummaryInformation, SummaryInformation]. POIFSContainerDetector checks only that names contains "WorkBook".
> Could it be possible to add a further or-check like this:
> if (names.contains("Workbook") || names.contains("Book"))
> Thank you,
> Marco
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)