You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by David Patterson <dp...@i-a-i.com> on 2012/09/28 16:42:20 UTC
Trying to create a new mime-type entry
I want to process a maven pom.xml with special code.
I added the following to the existing xml file of mimetypes:
<mime-type type="application/maven-pom">
<glob pattern="pom.xml" />
</mime-type>
I used the MimeTypesFactory to process the file.
MimeTypes mt = MimeTypesFactory.create( new FileInputStream( f ) );
Using MediaType.parse works:
MediaType pomType = MediaType.parse( "application/maven-pom; charset=UTF-8");
System.out.println( "type: " + pomType.getType());
System.out.println( "subtype: " + pomType.getSubtype());
Results in
type: application
subtype: maven-pom
Then I created a Tika object with:
Tika tika = new Tika( mt );
File pom = new File( "~somewhere~/pom.xml");
String pomTypeString = tika.detect( pom);
System.out.println( "Tika thinks a pom is a " + pomTypeString);
String pomStreamTypeString = tika.detect( new FileInputStream( pom ) );
System.out.println( "Tika thinks pom stream is a " + pomStreamTypeString );
produces
Tika thinks a pom is a text/plain
Tika thinks pom stream is a text/plain
If I create a default Tika with no args, I get
Tika thinks a pom is a application/xml
Tika thinks pom stream is a text/plain
What have I missed?
Thanks.
Dave P
________________________________
This message and all attachments are PRIVATE, and contain information that is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit or otherwise disclose this message or any attachments to any third party whatsoever without the express written consent of Intelligent Automation, Inc. If you received this message in error or you are not willing to view this message or any attachments on a confidential basis, please immediately delete this email and any attachments and notify Intelligent Automation, Inc.
Re: Trying to create a new mime-type entry
Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 28 Sep 2012, David Patterson wrote:
> I want to process a maven pom.xml with special code.
>
> I added the following to the existing xml file of mimetypes:
>
> <mime-type type="application/maven-pom">
> <glob pattern="pom.xml" />
> </mime-type>
You'll be much better off adding it to a custom mimetypes file, rather
than hacking the built-in one. Call your file
org/apache/tika/mime/custom-mimetypes.xml to have it auto-loaded
Also, as pom files are xml based, you might also want to add the xml root
definition too. There should be some examples to crib from in the main
file
Nick