You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Andreas Meier (JIRA)" <ji...@apache.org> on 2018/03/22 08:01:00 UTC

[jira] [Commented] (TIKA-2609) Refine Emacs Lisp file recognition (.elc)

    [ https://issues.apache.org/jira/browse/TIKA-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409192#comment-16409192 ] 

Andreas Meier commented on TIKA-2609:
-------------------------------------

Emacs 18 and earlier testfiles can be found under https://github.com/larsbrinkhoff/emacs-16.56
(the .elc files are emacs 16, but the structure of emacs 18 and 16 should be the same)

> Refine Emacs Lisp file recognition (.elc)
> -----------------------------------------
>
>                 Key: TIKA-2609
>                 URL: https://issues.apache.org/jira/browse/TIKA-2609
>             Project: Tika
>          Issue Type: Improvement
>          Components: core
>            Reporter: Andreas Meier
>            Priority: Minor
>
> Some newer .elc files are not recognized properly by the current matcher.
>  (Tested with emacs 24.4 files from [https://github.com/jwiegley/emacs-release/tree/master/lisp])
> I created a regex that should handle these files similar to the linux magic:
> {code:java}
> # Emacs 18 - this is always correct, but not very magical.
> 0 string \012( Emacs v18 byte-compiled Lisp data
> !:mime application/x-elc
> # Emacs 19+ - ver. recognition added by Ian Springer
> # Also applies to XEmacs 19+ .elc files; could tell them apart with regexs
> # - Chris Chittleborough <cc...@yahoo.com.au>
> 0 string ;ELC
> >4 byte >18
> >4 byte <32 Emacs/XEmacs v%d byte-compiled Lisp data
> !:mime application/x-elc{code}
> {code:xml}
> <mime-type type="application/x-elc">
>   <_comment>Emacs Lisp bytecode</_comment>
>   <magic priority="50">
>     <!-- Emacs 18 -->
>     <match value="\012(" type="string" offset="0" />
>     <!-- Emacs 19 -->
>     <match value=";ELC" type="string" offset="0" >
>       <match value="[\\x13-\\x1F]" type="regex" offset="4"/>
>     </match>
>   </magic>
>   <glob pattern="*.elc"/>
> </mime-type>
> {code}
> Please verify the hexvalues before committing.
>  
> Regards
>  
> Andreas



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)