You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2023/06/14 23:16:00 UTC

[jira] [Commented] (TIKA-4074) Add magic for TeX Virtual Font format

    [ https://issues.apache.org/jira/browse/TIKA-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732787#comment-17732787 ] 

Hudson commented on TIKA-4074:
------------------------------

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1116 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1116/])
TIKA-4074 -- TeX Virtual Font format (tallison: [https://github.com/apache/tika/commit/50b8532c40edf092c3363125fac5ebbe9fee7c80])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml


> Add magic for TeX Virtual Font format
> -------------------------------------
>
>                 Key: TIKA-4074
>                 URL: https://issues.apache.org/jira/browse/TIKA-4074
>             Project: Tika
>          Issue Type: Sub-task
>            Reporter: Gregory Lepore
>            Priority: Minor
>         Attachments: aebx10.vf, aebx12.vf, aebxsl10.vf
>
>
> The TeX Virtual Font format occurs 6,047 times in the second most recent Common Crawl dataset (and over 3000 in the latest set). No known mime type. The magic is:
>  
> F7CA\{9}F300\{4}0010 at offset 0.
>  
> The above signature will catch most TeX vf files, however some will be missed. However, there were no false positives so I think it's a good compromise to catch the majority of sample files.
>  
> It would be nice to see the results of additional testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)