You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Andreas Meier (JIRA)" <ji...@apache.org> on 2018/03/22 08:01:00 UTC
[jira] [Commented] (TIKA-2609) Refine Emacs Lisp file recognition
(.elc)
[ https://issues.apache.org/jira/browse/TIKA-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409192#comment-16409192 ]
Andreas Meier commented on TIKA-2609:
-------------------------------------
Emacs 18 and earlier testfiles can be found under https://github.com/larsbrinkhoff/emacs-16.56
(the .elc files are emacs 16, but the structure of emacs 18 and 16 should be the same)
> Refine Emacs Lisp file recognition (.elc)
> -----------------------------------------
>
> Key: TIKA-2609
> URL: https://issues.apache.org/jira/browse/TIKA-2609
> Project: Tika
> Issue Type: Improvement
> Components: core
> Reporter: Andreas Meier
> Priority: Minor
>
> Some newer .elc files are not recognized properly by the current matcher.
> (Tested with emacs 24.4 files from [https://github.com/jwiegley/emacs-release/tree/master/lisp])
> I created a regex that should handle these files similar to the linux magic:
> {code:java}
> # Emacs 18 - this is always correct, but not very magical.
> 0 string \012( Emacs v18 byte-compiled Lisp data
> !:mime application/x-elc
> # Emacs 19+ - ver. recognition added by Ian Springer
> # Also applies to XEmacs 19+ .elc files; could tell them apart with regexs
> # - Chris Chittleborough <cc...@yahoo.com.au>
> 0 string ;ELC
> >4 byte >18
> >4 byte <32 Emacs/XEmacs v%d byte-compiled Lisp data
> !:mime application/x-elc{code}
> {code:xml}
> <mime-type type="application/x-elc">
> <_comment>Emacs Lisp bytecode</_comment>
> <magic priority="50">
> <!-- Emacs 18 -->
> <match value="\012(" type="string" offset="0" />
> <!-- Emacs 19 -->
> <match value=";ELC" type="string" offset="0" >
> <match value="[\\x13-\\x1F]" type="regex" offset="4"/>
> </match>
> </magic>
> <glob pattern="*.elc"/>
> </mime-type>
> {code}
> Please verify the hexvalues before committing.
>
> Regards
>
> Andreas
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)