You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Lars Trieloff (JIRA)" <ji...@apache.org> on 2009/07/21 18:56:14 UTC

[jira] Created: (SLING-1059) Mime Type Detection for WebDAV should use Apache Tika

Mime Type Detection for WebDAV should use Apache Tika
-----------------------------------------------------

                 Key: SLING-1059
                 URL: https://issues.apache.org/jira/browse/SLING-1059
             Project: Sling
          Issue Type: Improvement
            Reporter: Lars Trieloff
            Priority: Minor


Sling's WebDAV servlet currently has its own Mime Type detection mechanism. Given that Tika is a dependency of the soon to be added Jackrabbit 2.0 we can re-use Tika's Mime Type detection mechanism for greater accuracy and a better maintained list of mimetype-extension-mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SLING-1059) Mime Type Detection for WebDAV should use Apache Tika

Posted by "Philipp Koch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760555#action_12760555 ] 

Philipp Koch commented on SLING-1059:
-------------------------------------

the fact is that there are a many mimetype mapping/detecting "libs" available (sling, jackrabbit, tika) which require more or less the same functionality. from my point of view it makes absolutely sense to have ONE lib that is used by all projects that require mimetype mapping/detection and since jackrabbit uses tika already and sling is running on top of jackrabbit it makes even more sense to me! the advantages are obvious:
- single place to update the mimetype table
- well known api for mimetype detection/mapping

so what i suggest would be to gather all different requirements, discuss these and implement what is needed to the tika lib (since i belive that this is the right place) , make sure that the footprint is small and use that library in sling and jackrabbit.


> Mime Type Detection for WebDAV should use Apache Tika
> -----------------------------------------------------
>
>                 Key: SLING-1059
>                 URL: https://issues.apache.org/jira/browse/SLING-1059
>             Project: Sling
>          Issue Type: Improvement
>            Reporter: Lars Trieloff
>            Priority: Minor
>
> Sling's WebDAV servlet currently has its own Mime Type detection mechanism. Given that Tika is a dependency of the soon to be added Jackrabbit 2.0 we can re-use Tika's Mime Type detection mechanism for greater accuracy and a better maintained list of mimetype-extension-mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SLING-1059) Mime Type Detection for WebDAV should use Apache Tika

Posted by "Felix Meschberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734033#action_12734033 ] 

Felix Meschberger commented on SLING-1059:
------------------------------------------

This is probably wrong ;-)

Sling's WebDAV is using Sling's MIME Type service to resolve names to mime types and the mime type list of the Sling MIME type service is based on the MIME types list maintained by Roy Fielding for the http project. So this is list is probably the best maintained list.

The problem of tika is, that it is all but lean -- in fact it is quite heavy weight, at least the last time I looked at it. In addition, I see no added functionality in tika, either -- except for a few non-standard (read "private") mime types for RAW image formats. In fact the Sling MIME type service is extensible enough to add the missing tika-specific name-to-type mappings as an extension to the Sling MIME type service.

So -1 to using tika for now.

> Mime Type Detection for WebDAV should use Apache Tika
> -----------------------------------------------------
>
>                 Key: SLING-1059
>                 URL: https://issues.apache.org/jira/browse/SLING-1059
>             Project: Sling
>          Issue Type: Improvement
>            Reporter: Lars Trieloff
>            Priority: Minor
>
> Sling's WebDAV servlet currently has its own Mime Type detection mechanism. Given that Tika is a dependency of the soon to be added Jackrabbit 2.0 we can re-use Tika's Mime Type detection mechanism for greater accuracy and a better maintained list of mimetype-extension-mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SLING-1059) Mime Type Detection for WebDAV should use Apache Tika

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760197#action_12760197 ] 

Jukka Zitting commented on SLING-1059:
--------------------------------------

Starting with the 0.4 version Tika has become more modularized, so you can get the type detection functionality and all the related default configuration from the reasonably sized tika-core component that has no external dependencies.

The latest Tika trunk also contains all the type information (both mime.types and magic) from the Apache HTTP Server.

I looked at the MimeTypeProvider interface in Sling. The getMimeType(String) method could be implemented with the latest Tika 0.5-SNAPSHOT like this:

    public String getMimeType(String name) {
        return new Tika().detect(name);
    }

The same functionality is available also in Tika 0.4, but with ten lines of code instead of one.

For deeper integration, see the Detector interface [1] that could be used as a MimeTypeProvider replacement to add features like content-based type detection.

Note that Tika does not come with a MimeTypeProvider.getExtension(String) feature, but I couldn't find any place in Sling (apart from test cases) where that functionality is actually being used.

[1] https://svn.apache.org/repos/asf/lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/detect/Detector.java


> Mime Type Detection for WebDAV should use Apache Tika
> -----------------------------------------------------
>
>                 Key: SLING-1059
>                 URL: https://issues.apache.org/jira/browse/SLING-1059
>             Project: Sling
>          Issue Type: Improvement
>            Reporter: Lars Trieloff
>            Priority: Minor
>
> Sling's WebDAV servlet currently has its own Mime Type detection mechanism. Given that Tika is a dependency of the soon to be added Jackrabbit 2.0 we can re-use Tika's Mime Type detection mechanism for greater accuracy and a better maintained list of mimetype-extension-mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SLING-1059) Mime Type Detection for WebDAV should use Apache Tika

Posted by "Bertrand Delacretaz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760556#action_12760556 ] 

Bertrand Delacretaz commented on SLING-1059:
--------------------------------------------

Content-based mime type detection is more costly than filename-based in all cases (even if just a bit), so I'd suggest making the content-based part optional.

Ideally in a transparent way: keep the existing interfaces from sling.commons.mime-time, and if Tika (or another content-based mime type provider) is active, use it.

> Mime Type Detection for WebDAV should use Apache Tika
> -----------------------------------------------------
>
>                 Key: SLING-1059
>                 URL: https://issues.apache.org/jira/browse/SLING-1059
>             Project: Sling
>          Issue Type: Improvement
>            Reporter: Lars Trieloff
>            Priority: Minor
>
> Sling's WebDAV servlet currently has its own Mime Type detection mechanism. Given that Tika is a dependency of the soon to be added Jackrabbit 2.0 we can re-use Tika's Mime Type detection mechanism for greater accuracy and a better maintained list of mimetype-extension-mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SLING-1059) Mime Type Detection for WebDAV should use Apache Tika

Posted by "Felix Meschberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760552#action_12760552 ] 

Felix Meschberger commented on SLING-1059:
------------------------------------------

Point is that this does not help much: we have the up to date list in Sling and the code works reliably and if the size matters, the current Sling module wins, too ;-)

So for the purly name based MIME type mapping I don't -- currently -- see a pressing need to go for Tika.

And as I said before, the current Sling based approach is extensible at run time.

It is a different matter if we talk about content-based content type recognition. In this case, I would definitely go for something like Tika -- even though, this should probably be a standalone module, which you may take or leave ... and I would think, that it would even be better, if Tika would provide extensible bundles without Sling requiring to package something up ...

> Mime Type Detection for WebDAV should use Apache Tika
> -----------------------------------------------------
>
>                 Key: SLING-1059
>                 URL: https://issues.apache.org/jira/browse/SLING-1059
>             Project: Sling
>          Issue Type: Improvement
>            Reporter: Lars Trieloff
>            Priority: Minor
>
> Sling's WebDAV servlet currently has its own Mime Type detection mechanism. Given that Tika is a dependency of the soon to be added Jackrabbit 2.0 we can re-use Tika's Mime Type detection mechanism for greater accuracy and a better maintained list of mimetype-extension-mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SLING-1059) Mime Type Detection for WebDAV should use Apache Tika

Posted by "Alexander Klimetschek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734057#action_12734057 ] 

Alexander Klimetschek commented on SLING-1059:
----------------------------------------------

I think the major improvement Tika would bring is mime-type detection based on the the binary (magic bytes etc.) instead of just the file extension to mime type mapping, which fails, if people assign wrong extensions or if it's a unspecific container format.

> Mime Type Detection for WebDAV should use Apache Tika
> -----------------------------------------------------
>
>                 Key: SLING-1059
>                 URL: https://issues.apache.org/jira/browse/SLING-1059
>             Project: Sling
>          Issue Type: Improvement
>            Reporter: Lars Trieloff
>            Priority: Minor
>
> Sling's WebDAV servlet currently has its own Mime Type detection mechanism. Given that Tika is a dependency of the soon to be added Jackrabbit 2.0 we can re-use Tika's Mime Type detection mechanism for greater accuracy and a better maintained list of mimetype-extension-mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SLING-1059) Mime Type Detection for WebDAV should use Apache Tika

Posted by "Carsten Ziegeler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SLING-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carsten Ziegeler updated SLING-1059:
------------------------------------

    Component/s: JCR

> Mime Type Detection for WebDAV should use Apache Tika
> -----------------------------------------------------
>
>                 Key: SLING-1059
>                 URL: https://issues.apache.org/jira/browse/SLING-1059
>             Project: Sling
>          Issue Type: Improvement
>          Components: JCR
>            Reporter: Lars Trieloff
>            Priority: Minor
>
> Sling's WebDAV servlet currently has its own Mime Type detection mechanism. Given that Tika is a dependency of the soon to be added Jackrabbit 2.0 we can re-use Tika's Mime Type detection mechanism for greater accuracy and a better maintained list of mimetype-extension-mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.