You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tetiana Tvardovska (Jira)" <ji...@apache.org> on 2021/11/11 19:11:00 UTC

[jira] [Updated] (TIKA-3590) OSX DMG files wrong MIME type detection (wrong MediaType and Supertype)

     [ https://issues.apache.org/jira/browse/TIKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tetiana Tvardovska updated TIKA-3590:
-------------------------------------
    Component/s: detector

> OSX DMG files wrong MIME type detection (wrong MediaType and Supertype)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-3590
>                 URL: https://issues.apache.org/jira/browse/TIKA-3590
>             Project: Tika
>          Issue Type: Bug
>          Components: core, detector
>    Affects Versions: 1.26, 1.27, 2.0.0-ALPHA, 2.0.0-BETA, 2.1.0
>            Reporter: Tetiana Tvardovska
>            Priority: Major
>
> Calling {{mimeSupport.detectMimeTypes}} for  OSX DMG files returns a wrong value.
> DMG files are detected as MIME type: {{*"application/zlib"*}} or *{{"application/x-bzip"}}*
> instead of expected: *{{"application/x-apple-diskimage".}}*
>  
> Error is caused by {{getSupertype}} method which returns a wrong type (too "super" {{{}MediaType.OCTET_STREAM){}}}for OSX DMG files instead of  {{{}*"application/zlib" or* {*}"application/x-bzip"{*}{*}{*}{}}}.
>  
> For information, DMG mime type is correctly detected when debugging the  method
>  
> {code:java}
> org/apache/tika/mime/MimeTypes.java:484  public MediaType detect(...
> 522:  MimeType hint = getMimeType(name); 
> {code}
>   the {{hint}} value gets a correct *{{"application/x-apple-diskimage"}}* value here.
> But later the {{hint}} value is not taken into consideration for {{possibleTypes}}  as {{applyHint}} results:
>  
> {code:java}
> 529:  possibleTypes = applyHint(possibleTypes, hint);{code}
>  
> This wrong value is returned to : 
>  
> {code:java}
> repository/org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar!/org/apache/tika/detect/CompositeDetector.java:84
> MediaType detected = detector.detect(input, metadata);
> if (registry.isSpecializationOf(detected, type)) {
> type = detected;
> }
> {code}
>  
>  
> h3. Possible solution -Add a more precise Supertype detection for "{{{}*application/x-apple-diskimage*{}}}" type
> Just add one more verification into the {{{}MediaTypeRegistry.{}}}{{getSupertype}} method, for example, in a 'diff'-like format:
> {{org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar}}
> {{org/apache/tika/mime/MediaTypeRegistry.java:187}}
>  
> {code:java}
> public MediaType getSupertype(MediaType type) {
>  ...
> +    } else if (type.getSubtype().endsWith("x-apple-diskimage")) { 
> +        return    MediaType.application("x-bzip");
> +    }
> ...
> }
> {code}
>  
> or
> {code:java}
> public MediaType getSupertype(MediaType type) {
>  ...
> +    } else if (type.getSubtype().endsWith("x-apple-diskimage")) { 
> +        return MediaType.APPLICATION_ZIP;
> +    }
> ...
> }
> {code}
>  
>  
> ---
> Tested at project [Sonatype Nexus|https://github.com/sonatype/nexus-public/] {{release-3.36.0-01 }}for RAW repository with a "Strict Content Type Validation" set ON when trying to upload *.dmg files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)