You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "David Hara (JIRA)" <ji...@apache.org> on 2013/07/03 01:31:21 UTC

[jira] [Created] (TIKA-1141) javascript files that contain "
David Hara created TIKA-1141:
--------------------------------

             Summary: javascript files that contain "<html" are detected as text/html
                 Key: TIKA-1141
                 URL: https://issues.apache.org/jira/browse/TIKA-1141
             Project: Tika
          Issue Type: Bug
          Components: mime
    Affects Versions: 1.2
            Reporter: David Hara
            Priority: Minor


The Mimetypes detector will return text/html as the mimetype for any javascript file that contains the string "<html" in it. I believe this is due to the rule <match value="&lt;html" type="string" offset="0:8192"/> in the tika-mimetypes.xml file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira