You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tetiana Tvardovska (Jira)" <ji...@apache.org> on 2021/11/19 16:54:00 UTC
[jira] [Comment Edited] (TIKA-3590) OSX DMG files wrong MIME type detection (wrong MediaType and Supertype)
[ https://issues.apache.org/jira/browse/TIKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446576#comment-17446576 ]
Tetiana Tvardovska edited comment on TIKA-3590 at 11/19/21, 4:53 PM:
---------------------------------------------------------------------
[~nick] , unfortunately, I do no know how to _generate_ DMG test files.
I can only provide links for DMG files I used for testing DMG uploads with:
# ~10Mb DMG file: [https://magnumbytes.com/downloads/releases/nimble-commander.dmg]
# ~19Mb DMG file: [https://global.download.synology.com/download/Utility/Assistant/7.0.1-50044/Mac/synology-assistant-7.0.1-50044.dmg?model=DS1621%2B&bays=6&dsm_version=7.0.1&build_number=42218]
was (Author: JIRAUSER280065):
[~nick] , unfortunately, I do no know how to _generate_ DMG test files.
I can only provide links for DMG files I used for testing DMG uploads with:
# ~10Bm DMG file: [https://magnumbytes.com/downloads/releases/nimble-commander.dmg]
# ~19Mb DMG file: [https://global.download.synology.com/download/Utility/Assistant/7.0.1-50044/Mac/synology-assistant-7.0.1-50044.dmg?model=DS1621%2B&bays=6&dsm_version=7.0.1&build_number=42218]
> OSX DMG files wrong MIME type detection (wrong MediaType and Supertype)
> -----------------------------------------------------------------------
>
> Key: TIKA-3590
> URL: https://issues.apache.org/jira/browse/TIKA-3590
> Project: Tika
> Issue Type: Bug
> Components: core, detector
> Affects Versions: 1.26, 1.27, 2.0.0-ALPHA, 2.0.0-BETA, 2.1.0
> Reporter: Tetiana Tvardovska
> Priority: Major
>
> Calling {{mimeSupport.detectMimeTypes}} for OSX DMG files returns a wrong value.
> DMG files are detected as MIME type: {{*"application/zlib"*}} or *{{"application/x-bzip"}}*
> instead of expected: *{{"application/x-apple-diskimage".}}*
>
> Error is caused by {{getSupertype}} method which returns a wrong type (too "super" {{{}MediaType.OCTET_STREAM){}}}for OSX DMG files instead of {{{}*"application/zlib" or* {*}"application/x-bzip"{*}{*}{*}{}}}.
>
> For information, DMG mime type is correctly detected when debugging the method
>
> {code:java}
> org/apache/tika/mime/MimeTypes.java:484 public MediaType detect(...
> 522: MimeType hint = getMimeType(name);
> {code}
> the {{hint}} value gets a correct *{{"application/x-apple-diskimage"}}* value here.
> But later the {{hint}} value is not taken into consideration for {{possibleTypes}} as {{applyHint}} results:
>
> {code:java}
> 529: possibleTypes = applyHint(possibleTypes, hint);{code}
>
> This wrong value is returned to :
>
> {code:java}
> repository/org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar!/org/apache/tika/detect/CompositeDetector.java:84
> MediaType detected = detector.detect(input, metadata);
> if (registry.isSpecializationOf(detected, type)) {
> type = detected;
> }
> {code}
>
>
> h3. Possible solution -Add a more precise Supertype detection for "{{{}*application/x-apple-diskimage*{}}}" type
> Just add one more verification into the {{{}MediaTypeRegistry.{}}}{{getSupertype}} method, for example, in a 'diff'-like format:
> {{org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar}}
> {{org/apache/tika/mime/MediaTypeRegistry.java:187}}
>
> {code:java}
> public MediaType getSupertype(MediaType type) {
> ...
> + } else if (type.getSubtype().endsWith("x-apple-diskimage")) {
> + return MediaType.application("x-bzip");
> + }
> ...
> }
> {code}
>
> or
> {code:java}
> public MediaType getSupertype(MediaType type) {
> ...
> + } else if (type.getSubtype().endsWith("x-apple-diskimage")) {
> + return MediaType.APPLICATION_ZIP;
> + }
> ...
> }
> {code}
>
>
> ---
> Tested at project [Sonatype Nexus|https://github.com/sonatype/nexus-public/] {{release-3.36.0-01 }}for RAW repository with a "Strict Content Type Validation" set ON when trying to upload *.dmg files.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)