You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Keith R. Bennett (JIRA)" <ji...@apache.org> on 2007/10/18 05:00:57 UTC
[jira] Updated: (TIKA-79) Mime type detection from file header appears to be failing.

     [ https://issues.apache.org/jira/browse/TIKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith R. Bennett updated TIKA-79:
---------------------------------

    Attachment: AutoDetectParser.patch

The attached patch file reorganizes the MIME type determination in AutoDetectParser so that it is easier to print out the types found by the various methods, and the logic for choosing the predominant result is confined to a smaller area (assuming I understood the intent correctly, that is).  In other words, I found it easier to debug.  If you like, I can commit it, minus the print statements.

I also found it helpful to comment out the LOG.info() call in MimeTypes.load().  (Is there a better way to disable it, by setting that logger to some kind of null appender or someting like that?)


> Mime type detection from file header appears to be failing.
> -----------------------------------------------------------
>
>                 Key: TIKA-79
>                 URL: https://issues.apache.org/jira/browse/TIKA-79
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>         Attachments: AutoDetectParser.patch
>
>
> Unit tests to test the behavior of AutoDetectParser fail when byte header detection is needed.  When correct names of resources and MIME types are passed into the Metadata object, the values below show what was found.  Note that some of the document types have null for typeFromHeader:
> typeFromContentTypeHint = application/vnd.ms-excel
> typeFromResourceName = application/vnd.ms-excel
> typeFromHeader = null
> type = application/vnd.ms-excel
> typeFromContentTypeHint = text/html
> typeFromResourceName = text/html
> typeFromHeader = text/html
> type = text/html
> typeFromContentTypeHint = application/vnd.oasis.opendocument.text
> typeFromResourceName = application/vnd.oasis.opendocument.text
> typeFromHeader = application/vnd.oasis.opendocument.text
> type = application/vnd.oasis.opendocument.text
> typeFromContentTypeHint = application/pdf
> typeFromResourceName = application/pdf
> typeFromHeader = application/pdf
> type = application/pdf
> typeFromContentTypeHint = application/vnd.ms-powerpoint
> typeFromResourceName = application/vnd.ms-powerpoint
> typeFromHeader = null
> type = application/vnd.ms-powerpoint
> log4j:WARN No appenders could be found for logger (root).
> log4j:WARN Please initialize the log4j system properly.
> typeFromContentTypeHint = application/rtf
> typeFromResourceName = application/rtf
> typeFromHeader = null
> type = application/rtf
> typeFromContentTypeHint = text/plain
> typeFromResourceName = text/plain
> typeFromHeader = null
> type = text/plain
> typeFromContentTypeHint = application/msword
> typeFromResourceName = application/msword
> typeFromHeader = null
> type = application/msword
> typeFromContentTypeHint = application/xml
> typeFromResourceName = application/xml
> typeFromHeader = null
> type = application/xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.