You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Andreas Meier (JIRA)" <ji...@apache.org> on 2018/03/16 14:29:00 UTC

[jira] [Created] (TIKA-2609) Refine Emacs Lisp file recognition (.elc)

Andreas Meier created TIKA-2609:
-----------------------------------

             Summary: Refine Emacs Lisp file recognition (.elc)
                 Key: TIKA-2609
                 URL: https://issues.apache.org/jira/browse/TIKA-2609
             Project: Tika
          Issue Type: Improvement
          Components: core
            Reporter: Andreas Meier


Some newer .elc files are not recognized properly by the current matcher.
 (Tested with emacs 24.4 files from [https://github.com/jwiegley/emacs-release/tree/master/lisp])

I created a regex that should handle these files similar to the linux magic:
{code:java}
# Emacs 18 - this is always correct, but not very magical.
0 string \012( Emacs v18 byte-compiled Lisp data
!:mime application/x-elc
# Emacs 19+ - ver. recognition added by Ian Springer
# Also applies to XEmacs 19+ .elc files; could tell them apart with regexs
# - Chris Chittleborough <cc...@yahoo.com.au>
0 string ;ELC
>4 byte >18
>4 byte <32 Emacs/XEmacs v%d byte-compiled Lisp data
!:mime application/x-elc{code}
{code:xml}
<mime-type type="application/x-elc">
  <_comment>Emacs Lisp bytecode</_comment>
  <magic priority="50">
    <!-- Emacs 18 -->
    <match value="\012(" type="string" offset="0" />
    <!-- Emacs 19 -->
    <match value=";ELC" type="string" offset="0" >
      <match value="[\\x13-\\x1F]" type="regex" offset="4"/>
    </match>
  </magic>
  <glob pattern="*.elc"/>
</mime-type>
{code}
Please verify the hexvalues before committing.

 

Regards

 

Andreas



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)