You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by si...@fastmail.com on 2015/02/08 18:54:34 UTC

Add custom mime type programmatically

Hi,

Thanks for Apache Tika! Its a great project.

I'm trying to add a custom mime type. I've seen solutions that involve
writing a custom-mimetypes.xml file, but I'd really prefer to add my
custom type programmatically. Mostly this is because the magic bytes for
the file format are already defined elsewhere in code (which I'd prefer
not to duplicate), and I also want to leave the possibility open for
user-defined mime types at runtime.

I'm running 1.7 and doing detection like this:

TikaConfig config = TikaConfig.getDefaultConfig();
Detector detector = config.getDetector();
TikaInputStream stream = TikaInputStream.get(in);
Metadata metadata = new Metadata();
MediaType mediatype = detector.detect(stream, metadata);
System.out.println(mediatype.toString());

And this works great for my needs.

Instead of getting the Detector through the TikaConfig, I've tried
instantiating a new MagicDetector with the desired byte pattern and
MediaType, grabbing a DefaultDetector, and adding my new MagicDetector
to the DefaultDetector's getDetectors() List, and then performing
detection. Unfortunately this doesn't appear to make any difference.

Is there another way to add the custom mime type via a programmatic
interface?

silverchange

Re: Add custom mime type programmatically

Posted by si...@fastmail.com.
On Mon, Feb 9, 2015, at 06:52 PM, Nick Burch wrote:

> Would you anticipate adding these additional mimetypes once, when getting 
> your Tika Config object, or do you forsee wanting to add them on the fly?
> 

For my use case, adding once at startup is fine.

> I'm not sure that'll work, I'm not sure the detectors list can be
> modified 
> that. What happens if you get a default TikaConfig object, grab the
> normal 
> detectors from that, build your custom one, then create a 
> CompositeDetector from that list + new one, and use that
> CompositeDetector 
> from then on?

That worked great! Thanks for the tip. I didn't use the TikaConfig
because I couldn't see how to get a list of detectors from the config.
Here's the code:

final byte[] magicBytes = {'a', 'b', 'c'};
MediaType customType = new MediaType("application", "custom");
Detector customDetector = new MagicDetector(customType, magicBytes);
DefaultDetector defaultDetector = new DefaultDetector();
List<Detector> detectors = defaultDetectors.getDetectors();
detectors.add(customDetector);
CompositeDetector detector = new CompositeDetector(detectors);
Metadata metadata = new Metadata();
MediaType mediaType = detector.detect(stream, metadata);
System.out.println(mediaType.toString());

Thanks!

silverchange

Re: Add custom mime type programmatically

Posted by Nick Burch <ap...@gagravarr.org>.
On Sun, 8 Feb 2015, silverchange@fastmail.com wrote:
> I'm trying to add a custom mime type. I've seen solutions that involve
> writing a custom-mimetypes.xml file, but I'd really prefer to add my
> custom type programmatically.

Currently, I think we only support loading magic from a combination of 
(one) main mimetypes file and (many) custom mimetypes files. That loading 
does some sanity checking in the process.

Originally, Tika only supported the single core magic file. It took a 
little bit of re-jigging to handle the custom ones too. More rejigging is 
possible!

> Mostly this is because the magic bytes for the file format are already 
> defined elsewhere in code (which I'd prefer not to duplicate), and I 
> also want to leave the possibility open for user-defined mime types at 
> runtime.

Would you anticipate adding these additional mimetypes once, when getting 
your Tika Config object, or do you forsee wanting to add them on the fly?

> Instead of getting the Detector through the TikaConfig, I've tried
> instantiating a new MagicDetector with the desired byte pattern and
> MediaType, grabbing a DefaultDetector, and adding my new MagicDetector
> to the DefaultDetector's getDetectors() List, and then performing
> detection.

I'm not sure that'll work, I'm not sure the detectors list can be modified 
that. What happens if you get a default TikaConfig object, grab the normal 
detectors from that, build your custom one, then create a 
CompositeDetector from that list + new one, and use that CompositeDetector 
from then on?

Nick