You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tetiana Tvardovska (Jira)" <ji...@apache.org> on 2021/11/19 16:54:00 UTC

[jira] [Comment Edited] (TIKA-3590) OSX DMG files wrong MIME type detection (wrong MediaType and Supertype)

    [ https://issues.apache.org/jira/browse/TIKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446576#comment-17446576 ] 

Tetiana Tvardovska edited comment on TIKA-3590 at 11/19/21, 4:53 PM:
---------------------------------------------------------------------

[~nick] , unfortunately, I do no know how to _generate_ DMG test files.

I can only provide links for DMG files I used for testing DMG uploads with:
 # ~10Mb DMG file: [https://magnumbytes.com/downloads/releases/nimble-commander.dmg]
 # ~19Mb DMG file: [https://global.download.synology.com/download/Utility/Assistant/7.0.1-50044/Mac/synology-assistant-7.0.1-50044.dmg?model=DS1621%2B&bays=6&dsm_version=7.0.1&build_number=42218]

 


was (Author: JIRAUSER280065):
[~nick] , unfortunately, I do no know how to _generate_ DMG test files.

I can only provide links for DMG files I used for testing DMG uploads with:
 # ~10Bm DMG file: [https://magnumbytes.com/downloads/releases/nimble-commander.dmg]
 # ~19Mb DMG file: [https://global.download.synology.com/download/Utility/Assistant/7.0.1-50044/Mac/synology-assistant-7.0.1-50044.dmg?model=DS1621%2B&bays=6&dsm_version=7.0.1&build_number=42218]

 

> OSX DMG files wrong MIME type detection (wrong MediaType and Supertype)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-3590
>                 URL: https://issues.apache.org/jira/browse/TIKA-3590
>             Project: Tika
>          Issue Type: Bug
>          Components: core, detector
>    Affects Versions: 1.26, 1.27, 2.0.0-ALPHA, 2.0.0-BETA, 2.1.0
>            Reporter: Tetiana Tvardovska
>            Priority: Major
>
> Calling {{mimeSupport.detectMimeTypes}} for  OSX DMG files returns a wrong value.
> DMG files are detected as MIME type: {{*"application/zlib"*}} or *{{"application/x-bzip"}}*
> instead of expected: *{{"application/x-apple-diskimage".}}*
>  
> Error is caused by {{getSupertype}} method which returns a wrong type (too "super" {{{}MediaType.OCTET_STREAM){}}}for OSX DMG files instead of  {{{}*"application/zlib" or* {*}"application/x-bzip"{*}{*}{*}{}}}.
>  
> For information, DMG mime type is correctly detected when debugging the  method
>  
> {code:java}
> org/apache/tika/mime/MimeTypes.java:484  public MediaType detect(...
> 522:  MimeType hint = getMimeType(name); 
> {code}
>   the {{hint}} value gets a correct *{{"application/x-apple-diskimage"}}* value here.
> But later the {{hint}} value is not taken into consideration for {{possibleTypes}}  as {{applyHint}} results:
>  
> {code:java}
> 529:  possibleTypes = applyHint(possibleTypes, hint);{code}
>  
> This wrong value is returned to : 
>  
> {code:java}
> repository/org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar!/org/apache/tika/detect/CompositeDetector.java:84
> MediaType detected = detector.detect(input, metadata);
> if (registry.isSpecializationOf(detected, type)) {
> type = detected;
> }
> {code}
>  
>  
> h3. Possible solution -Add a more precise Supertype detection for "{{{}*application/x-apple-diskimage*{}}}" type
> Just add one more verification into the {{{}MediaTypeRegistry.{}}}{{getSupertype}} method, for example, in a 'diff'-like format:
> {{org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar}}
> {{org/apache/tika/mime/MediaTypeRegistry.java:187}}
>  
> {code:java}
> public MediaType getSupertype(MediaType type) {
>  ...
> +    } else if (type.getSubtype().endsWith("x-apple-diskimage")) { 
> +        return    MediaType.application("x-bzip");
> +    }
> ...
> }
> {code}
>  
> or
> {code:java}
> public MediaType getSupertype(MediaType type) {
>  ...
> +    } else if (type.getSubtype().endsWith("x-apple-diskimage")) { 
> +        return MediaType.APPLICATION_ZIP;
> +    }
> ...
> }
> {code}
>  
>  
> ---
> Tested at project [Sonatype Nexus|https://github.com/sonatype/nexus-public/] {{release-3.36.0-01 }}for RAW repository with a "Strict Content Type Validation" set ON when trying to upload *.dmg files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)