You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Suman Kashyap (JIRA)" <ji...@apache.org> on 2016/03/04 04:04:40 UTC

[jira] [Updated] (TIKA-1892) Mime Magic for application/x-mobipocket-ebook and application/x-shapefile

     [ https://issues.apache.org/jira/browse/TIKA-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suman Kashyap updated TIKA-1892:
--------------------------------
    Description: 
Our FHT analysis for mobipocket-ebook and shapefiles shows high corelation of initial header bytes. Further inspection of these files over online available and TREC polar data sets revealed presence of common bytes for mime identification 

patch content
<mime-type type="application/x-netcdf">
  <acronym>NETCDF</acronym>
  <_comment>Network Common Data Format</_comment>
  <magic priority="60">
      <match value="CDF" type="string" offset="0" />
  </magic>
  <glob pattern="*.nc"/>
</mime-type>
<mime-type type="application/x-mobipocket-ebook">
  <acronym>MOBI</acronym>
  <_comment>Mobipocket Ebook</_comment>
  <magic priority="60">
      <match value="BOOKMOBI" type="string" offset="23" />
  </magic>
  <glob pattern="*.mobi"/>
</mime-type>
<mime-type type="application/x-shapefile">
  <acronym>ESRI Shapefiles</acronym>
  <_comment>ESRI Shapefiles</_comment>
  <magic priority="60">
      <match value="0x0000270a" type="big32" offset="2" />
  </magic>
  <glob pattern="*.shp"/>
</mime-type>
 


  was:Our FHT analysis for mobipocket-ebook and shapefiles shows high corelation of initial header bytes. Further inspection of these files over online available and TREC polar data sets revealed presence of common bytes for mime identification 


> Mime Magic for application/x-mobipocket-ebook and application/x-shapefile
> -------------------------------------------------------------------------
>
>                 Key: TIKA-1892
>                 URL: https://issues.apache.org/jira/browse/TIKA-1892
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.12
>            Reporter: Suman Kashyap
>            Priority: Minor
>
> Our FHT analysis for mobipocket-ebook and shapefiles shows high corelation of initial header bytes. Further inspection of these files over online available and TREC polar data sets revealed presence of common bytes for mime identification 
> patch content
> <mime-type type="application/x-netcdf">
>   <acronym>NETCDF</acronym>
>   <_comment>Network Common Data Format</_comment>
>   <magic priority="60">
>       <match value="CDF" type="string" offset="0" />
>   </magic>
>   <glob pattern="*.nc"/>
> </mime-type>
> <mime-type type="application/x-mobipocket-ebook">
>   <acronym>MOBI</acronym>
>   <_comment>Mobipocket Ebook</_comment>
>   <magic priority="60">
>       <match value="BOOKMOBI" type="string" offset="23" />
>   </magic>
>   <glob pattern="*.mobi"/>
> </mime-type>
> <mime-type type="application/x-shapefile">
>   <acronym>ESRI Shapefiles</acronym>
>   <_comment>ESRI Shapefiles</_comment>
>   <magic priority="60">
>       <match value="0x0000270a" type="big32" offset="2" />
>   </magic>
>   <glob pattern="*.shp"/>
> </mime-type>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)