You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/02/03 18:30:40 UTC

[jira] [Commented] (TIKA-1141) javascript files that contain "
    [ https://issues.apache.org/jira/browse/TIKA-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130723#comment-15130723 ] 

Nick Burch commented on TIKA-1141:
----------------------------------

I've tweaked the mime magic for HTML, so we give <html a lower priority if it isn't near the start. As long as the .js filename is given, Tika is able to correctly identify these JQuery files as application/javascript now. Without the filename it can't, as we don't have any javascript magic. Not sure if we could add any either, given the format, but if someone wants to take a stab that'd be great!

> javascript files that contain "<html" are detected as text/html
> ---------------------------------------------------------------
>
>                 Key: TIKA-1141
>                 URL: https://issues.apache.org/jira/browse/TIKA-1141
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.2
>            Reporter: David Hara
>            Priority: Minor
>
> The Mimetypes detector will return text/html as the mimetype for any javascript file that contains the string "<html" in it. I believe this is due to the rule <match value="&lt;html" type="string" offset="0:8192"/> in the tika-mimetypes.xml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)