You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ji-Hyun Oh (JIRA)" <ji...@apache.org> on 2015/05/22 00:39:17 UTC

[jira] [Created] (TIKA-1634) Detecting problem with Matlab source code

Ji-Hyun Oh created TIKA-1634:
--------------------------------

             Summary: Detecting problem with Matlab source code
                 Key: TIKA-1634
                 URL: https://issues.apache.org/jira/browse/TIKA-1634
             Project: Tika
          Issue Type: Improvement
          Components: mime
    Affects Versions: 1.8
            Reporter: Ji-Hyun Oh
            Priority: Trivial


Both Matlab source code and Objective-C source code have the same suffix, which is .m. Therefore, Matlab has additional match value in mime types.xml. 

In tika-mimetypes.xml Matlab is defined as:

  <mime-type type="text/x-matlab">
    <_comment>Matlab source code</_comment>
    <magic priority="50">
      <match value="function [" type="string" offset="0"/>
    </magic>
    <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
    <sub-class-of type="text/plain"/>
  </mime-type>

However, Matlab codes does not always start with "function [“. Therefore, some Matlab codes are detected as text/x-bojcsrc. Based on the source codes collected from NOAA Paleoclimatology Software Resources, many Matlab codes have match value like these (problematic files are attached as an example):

<mime-type type="text/x-matlab">
    <_comment>Matlab source code</_comment>
    <magic priority="50">
      <match value="function" type="string" offset="0"/>
      <match value="%" type="string" offset="0"/>
    </magic>
    <!-- <glob pattern="*.m"/> - conflicts with text/x-objcsrc -->
    <sub-class-of type="text/plain"/>
  </mime-type>





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)