You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Withanage, Dulip" <wi...@asia-europe.uni-heidelberg.de> on 2011/04/06 11:53:18 UTC

RE: XMP Metadata extraction

Hi Nick,

1. I tried calling the --medatadata option  and it gives me the   metadataname:value.
So this looks promising to me, if i could format the above output  as xml.
what is your advice to do it the best way?

2. I have seen the xmp class org.apache.tika.parser.image.xmp.xmp.XMPPackerScanner
Is there anyway to use this as default  parser for the jpeg and tiff. Can it be done by configuring in the tika-mimetypes.xml ?

Thanks,

Dulip




---------------------------------------
Dulip Withanage
Senior Software developer.
Karl Jaspers Center
University of Heidelberg
Germany
________________________________________
From: Nick Burch [nick.burch@alfresco.com]
Sent: Monday, March 28, 2011 4:43 PM
To: user@tika.apache.org
Cc: oesterg@gmail.com
Subject: RE: XMP Metadata extraction

On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> thank you for your prompt help,  that looks promising.
>
> Here is my user case.
> 1.  generate the tika app using the source.
> 2. integrate itto a thirdparty application
> 3. use the tika extraction fuctionalities for images.
> 4. I have attached the image, it's row metadata and the tika results.

How are you calling Tika?

Also, I think that all the metadata you want is available in the Metadata
object, so you can likely just read it out of there. Try having a play
with the --metadata option to tika-app.jar (the demonstration command line
tika program)

Nick

RE: XMP Metadata extraction

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 6 Apr 2011, Withanage, Dulip wrote:
> 1. I tried calling the --medatadata option and it gives me the 
> metadataname:value. So this looks promising to me, if i could format the 
> above output as xml. what is your advice to do it the best way?

You'll probably want to write some java code at this point, rather than 
just calling the tika app on the command line. Grab the Metadata object 
back, then loop over the entries and output them as XML in whatever format 
you want them to be in.

> 2. I have seen the xmp class 
> org.apache.tika.parser.image.xmp.xmp.XMPPackerScanner Is there anyway to 
> use this as default parser for the jpeg and tiff. Can it be done by 
> configuring in the tika-mimetypes.xml ?

Nope, tika-mimetypes.xml is used for detection, and the parser stuff is 
different.

XMPPackerScanner isn't a parser though, but instead is called by the 
existing parsers when they detect XMP metadata within a file. The existing 
JPEG and TIFF parsers already do this for you

Nick