You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Andreas Meier (JIRA)" <ji...@apache.org> on 2018/03/15 09:52:00 UTC

[jira] [Commented] (TIKA-2574) Extend PCX detection in tika-mimetypes.xml

    [ https://issues.apache.org/jira/browse/TIKA-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400138#comment-16400138 ] 

Andreas Meier commented on TIKA-2574:
-------------------------------------

Link to the original published specification taken from the IANA specification

> Extend PCX detection in tika-mimetypes.xml
> ------------------------------------------
>
>                 Key: TIKA-2574
>                 URL: https://issues.apache.org/jira/browse/TIKA-2574
>             Project: Tika
>          Issue Type: Sub-task
>          Components: detector
>    Affects Versions: 1.17
>            Reporter: Andreas Meier
>            Priority: Major
>         Attachments: IUC10-da-Q.UTF-16LE.without-BOM, IUC10-da-Q.UTF-32LE.without-BOM, IUC10-da.UTF-16LE.without-BOM, IUC10-it.UTF-16LE.without-BOM, Test.pcx, Test_without_filehandle
>
>
> The matcher for pcx should be reworked to avoid false-positives upon UTF-16LE and UTF-32LE textfiles.
> I suggest adding the filler from the header as mentioned in the original [pcx specification|https://www.iana.org/assignments/media-types/image/vnd.zbrush.pcx]
>  
> {code:xml}
> <mime-type type="image/vnd.zbrush.pcx">
>   <acronym>PCX</acronym>
>   <_comment>ZSoft Paintbrush PiCture eXchange</_comment>
>   <alias type="image/x-pcx"/>
>   <alias type="image/x-pc-paintbrush"/>
>   <magic priority="40">
>   <match value="0x0A" type="string" offset="0">
>     <!-- bytes 74 to 128 are blank to fill out 128 byte header. Set all bytes to 0 -->
>     <!-- This has to be set to avoid false positives for text/plain;charset=UTF-16LE and text/plain;charset=UTF-32LE -->
>     <match value="0x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000" type="string" offset="74">
>       <match value="0x00" type="string" offset="1"/>
>       <match value="0x02" type="string" offset="1"/>
>       <match value="0x03" type="string" offset="1"/>
>       <match value="0x04" type="string" offset="1"/>
>       <match value="0x05" type="string" offset="1"/>
>     </match>
>   </match>
> </magic>
> <glob pattern="*.pcx"/>
> </mime-type>
> {code}
>  
> I added some testfiles.
> [~gagravarr] Can you please check this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)