You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/26 17:15:23 UTC

Removing use of deprecated .getMimeType(url) towards detect in Tika 0.10

Hi,

I'm currently working on a patch over @ Nutch which removes usage of
deprected classes. My compiler is flagging up use of the code snippet below
indicating that 'this.mimeTypes.getMimeType(url) is deprecated.

    // if returned null, or if it's the default type then try url resolution
    if (type == null
        || (type != null && type.getName().equals(MimeTypes.OCTET_STREAM)))
{
      // If no mime-type header, or cannot find a corresponding registered
      // mime-type, then guess a mime-type from the url pattern
      type = this.mimeTypes.getMimeType(url) != null ? this.mimeTypes
          .getMimeType(url) : type;
    }

Looking @ Tika 0.10 Javadoc I see that in particular it's the call to
.getMimeType(url) which has been deprecated and we should be using
Tika.detect(URL) instead.

Can anyone please provide me with the correct syntax to fit this in here
*Thanks in advance for any help

Kind Regards

Lewis

p.s please can you copy me in on any email reply as I'm not officially
registered to this list. Thank you
*



-- 
*Lewis*

Re: Removing use of deprecated .getMimeType(url) towards detect in Tika 0.10

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 27 Feb 2012, Lewis John Mcgibbney wrote:
> After compiling I get
>
>    [javac] MimeUtil.java:165: incompatible types
>    [javac] found   :
> java.lang.Object&java.io.Serializable&java.lang.Comparable<? extends
> java.lang.Object&java.io.Serializable&java.lang.Comparable<?>>
>    [javac] required: org.apache.tika.mime.MimeType
>    [javac]       type = mt != null ? mt : type;

Tika.detect(URL) returns the mimetype as a String

The detectors themselves return a MediaType

Depending on what you want your code to do, the options are probably:
* Switch your code to use a mimetype String
* Switch your code to use MediaType rather than MimeType, and call
   DefaultDetector directly (rather than using the Tika facade class)
* If you get back a String (not null) for the mimetype, create a MimeType
   object for it

The right answer for you depends on the code you've got around the Tika 
call

Nick

Re: Removing use of deprecated .getMimeType(url) towards detect in Tika 0.10

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hmmm...

Taking these comments on board; after initiating
  /* creates a Tika facade using the default configuration */
  private Tika tika;

I now include a more verbose snippet of the method giving me problems here.

  /**
   * A facade interface to trying all the possible mime type resolution
   * strategies available within Tika. First, the mime type provided in
   * <code>typeName</code> is cleaned, with {@link #cleanMimeType(String)}.
   * Then the cleaned mime type is looked up in the underlying Tika
   * {@link MimeTypes} registry, by its cleaned name. If the {@link
MimeType} is
   * found, then that mime type is used, otherwise {@link URL} resolution is
   * used to try and determine the mime type. If that means is
unsuccessful, and
   * if <code>mime.type.magic</code> is enabled in {@link
NutchConfiguration},
   * then mime type magic resolution is used to try and obtain a
   * better-than-the-default approximation of the {@link MimeType}.
   *
   * @param typeName
   *          The original mime type, returned from a {@link
ProtocolOutput}.
   * @param url
   *          The given {@link URL}, that Nutch was trying to crawl. The
given
   *          name can also be a URL or a full file path. In such cases
only the
   *          file name part of the string is used for type detection.
   * @param data
   *          The byte data, returned from the crawl, if any.
   * @return The correctly, automatically guessed {@link MimeType} name.
   */
  public String autoResolveContentType(String typeName, String url, byte[]
data) {
    MimeType type = null;
    String cleanedMimeType = null;
....

    // if returned null, or if it's the default type then try url resolution
    if (type == null
        || (type != null && type.getName().equals(MimeTypes.OCTET_STREAM)))
{
      // If no mime-type header, or cannot find a corresponding registered
      // mime-type, then guess a mime-type from the url pattern
      String mt = tika.detect(url);
      type = mt != null ? mt : type;
    }

You will notice that the final two lines in the last code block contain the
'new' code you suggested. (thanks for this btw)
In this case we utilise 'String url' in the method parameter because the
given name can also be a URL or a full file path. In such cases only the
file name part of the string is used for type detection (from Javadoc :)).
After compiling I get

    [javac] MimeUtil.java:165: incompatible types
    [javac] found   :
java.lang.Object&java.io.Serializable&java.lang.Comparable<? extends
java.lang.Object&java.io.Serializable&java.lang.Comparable<?>>
    [javac] required: org.apache.tika.mime.MimeType
    [javac]       type = mt != null ? mt : type;
    [javac]                                ^

There is something which I am not quite getting right here :0| Any
suggestions please.

Thank you

Lewis

>
>
> On Mon, Feb 27, 2012 at 7:34 AM, Nick Burch <ni...@alfresco.com>wrote:
>
>>
>> How about:
>>  String mt = Tika.detect(URL);
>>  type = mt != null ? mt : type;
>>
>> That uses the new style call, and avoids detecting twice which your old
>> code did
>>
>> Nick
>>
>
>
> **

Re: Removing use of deprecated .getMimeType(url) towards detect in Tika 0.10

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Excellent Nick, absolutely excellent.

Thank you

Lewis

On Mon, Feb 27, 2012 at 7:34 AM, Nick Burch <ni...@alfresco.com> wrote:

> On Sun, 26 Feb 2012, Lewis John Mcgibbney wrote:
>
>>     // If no mime-type header, or cannot find a corresponding registered
>>     // mime-type, then guess a mime-type from the url pattern
>>     type = this.mimeTypes.getMimeType(**url) != null ? this.mimeTypes
>>         .getMimeType(url) : type;
>>   }
>>
>> Looking @ Tika 0.10 Javadoc I see that in particular it's the call to
>> .getMimeType(url) which has been deprecated and we should be using
>> Tika.detect(URL) instead.
>>
>
> How about:
>  String mt = Tika.detect(URL);
>  type = mt != null ? mt : type;
>
> That uses the new style call, and avoids detecting twice which your old
> code did
>
> Nick
>



-- 
*Lewis*

Re: Removing use of deprecated .getMimeType(url) towards detect in Tika 0.10

Posted by Nick Burch <ni...@alfresco.com>.
On Sun, 26 Feb 2012, Lewis John Mcgibbney wrote:
>      // If no mime-type header, or cannot find a corresponding registered
>      // mime-type, then guess a mime-type from the url pattern
>      type = this.mimeTypes.getMimeType(url) != null ? this.mimeTypes
>          .getMimeType(url) : type;
>    }
>
> Looking @ Tika 0.10 Javadoc I see that in particular it's the call to
> .getMimeType(url) which has been deprecated and we should be using
> Tika.detect(URL) instead.

How about:
   String mt = Tika.detect(URL);
   type = mt != null ? mt : type;

That uses the new style call, and avoids detecting twice which your old 
code did

Nick