You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/01/27 00:00:35 UTC

[jira] Resolved: (TIKA-141) Mime Content Type detection of a web document from its URL.

     [ https://issues.apache.org/jira/browse/TIKA-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-141.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.7
         Assignee: Jukka Zitting

This one was mostly solved already by the Tika.detect(URL) utility method introduced recently, but the suggestion here about using the extra input metadata available via the URLConnection object was pretty good so in revision 903470 I extended Tika.detect(URL) and any other entry points where we are given a URL or a File to automatically extract more such input metadata.

I think that covers pretty much everything there is to be done about this issue, so resolving as fixed.

> Mime Content Type detection of a web document from its URL.
> -----------------------------------------------------------
>
>                 Key: TIKA-141
>                 URL: https://issues.apache.org/jira/browse/TIKA-141
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Durville
>            Assignee: Jukka Zitting
>            Priority: Trivial
>             Fix For: 0.7
>
>   Original Estimate: 0.08h
>  Remaining Estimate: 0.08h
>
> While trying to determine the content type of a document from its URL, it will be interesting to use the java.net.URLConnection feature that gives the content type. In particular in case of distant web documents, instead of returning the root mime type (as it is the case for now), it will retrieve the true one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.