You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by David Patterson <dp...@i-a-i.com> on 2012/09/28 16:42:20 UTC

Trying to create a new mime-type entry

I want to process a maven pom.xml with special code.

I added the following to the existing xml file of mimetypes:

  <mime-type type="application/maven-pom">
    <glob pattern="pom.xml" />
  </mime-type>

I used the MimeTypesFactory to process the file.
                MimeTypes mt = MimeTypesFactory.create( new FileInputStream( f ) );

Using MediaType.parse works:
                MediaType pomType = MediaType.parse( "application/maven-pom; charset=UTF-8");
                System.out.println( "type: " + pomType.getType());
                System.out.println( "subtype: " + pomType.getSubtype());
Results in
type: application
subtype: maven-pom

Then I created a Tika object with:
                Tika tika = new Tika( mt );
                File pom = new File( "~somewhere~/pom.xml");
                String pomTypeString = tika.detect( pom);
                System.out.println( "Tika thinks a pom is a " + pomTypeString);
                String pomStreamTypeString = tika.detect( new FileInputStream( pom ) );
                System.out.println( "Tika thinks pom stream is a " + pomStreamTypeString );
produces

Tika thinks a pom is a text/plain
Tika thinks pom stream is a text/plain

If I create a default Tika with no args, I get
Tika thinks a pom is a application/xml
Tika thinks pom stream is a text/plain

What have I missed?

Thanks.

Dave P



________________________________
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.

Re: Trying to create a new mime-type entry

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 28 Sep 2012, David Patterson wrote:
> I want to process a maven pom.xml with special code.
>
> I added the following to the existing xml file of mimetypes:
>
>  <mime-type type="application/maven-pom">
>    <glob pattern="pom.xml" />
>  </mime-type>

You'll be much better off adding it to a custom mimetypes file, rather 
than hacking the built-in one. Call your file 
org/apache/tika/mime/custom-mimetypes.xml to have it auto-loaded

Also, as pom files are xml based, you might also want to add the xml root 
definition too. There should be some examples to crib from in the main 
file

Nick