You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Withanage, Dulip" <wi...@asia-europe.uni-heidelberg.de> on 2011/03/28 14:58:51 UTC

XMP Metadata extraction

Dear developers and  users,
We are interesting in extracting the row metadata  (not formatted in XHML   as SAX events) from the files using tika.  Is  there any way that we could bypass the formatting mechanism for spectific file formats? 

Best ,
Dulip Withanage
Senior software developer
Karl Jaspers Center,
University of  Heidelberg.
Germany

RE: XMP Metadata extraction

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 6 Apr 2011, Withanage, Dulip wrote:
> 1. I tried calling the --medatadata option and it gives me the 
> metadataname:value. So this looks promising to me, if i could format the 
> above output as xml. what is your advice to do it the best way?

You'll probably want to write some java code at this point, rather than 
just calling the tika app on the command line. Grab the Metadata object 
back, then loop over the entries and output them as XML in whatever format 
you want them to be in.

> 2. I have seen the xmp class 
> org.apache.tika.parser.image.xmp.xmp.XMPPackerScanner Is there anyway to 
> use this as default parser for the jpeg and tiff. Can it be done by 
> configuring in the tika-mimetypes.xml ?

Nope, tika-mimetypes.xml is used for detection, and the parser stuff is 
different.

XMPPackerScanner isn't a parser though, but instead is called by the 
existing parsers when they detect XMP metadata within a file. The existing 
JPEG and TIFF parsers already do this for you

Nick

RE: XMP Metadata extraction

Posted by "Withanage, Dulip" <wi...@asia-europe.uni-heidelberg.de>.
Hi Nick,

1. I tried calling the --medatadata option  and it gives me the   metadataname:value.
So this looks promising to me, if i could format the above output  as xml.
what is your advice to do it the best way?

2. I have seen the xmp class org.apache.tika.parser.image.xmp.xmp.XMPPackerScanner
Is there anyway to use this as default  parser for the jpeg and tiff. Can it be done by configuring in the tika-mimetypes.xml ?

Thanks,

Dulip




---------------------------------------
Dulip Withanage
Senior Software developer.
Karl Jaspers Center
University of Heidelberg
Germany
________________________________________
From: Nick Burch [nick.burch@alfresco.com]
Sent: Monday, March 28, 2011 4:43 PM
To: user@tika.apache.org
Cc: oesterg@gmail.com
Subject: RE: XMP Metadata extraction

On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> thank you for your prompt help,  that looks promising.
>
> Here is my user case.
> 1.  generate the tika app using the source.
> 2. integrate itto a thirdparty application
> 3. use the tika extraction fuctionalities for images.
> 4. I have attached the image, it's row metadata and the tika results.

How are you calling Tika?

Also, I think that all the metadata you want is available in the Metadata
object, so you can likely just read it out of there. Try having a play
with the --metadata option to tika-app.jar (the demonstration command line
tika program)

Nick

RE: XMP Metadata extraction

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> thank you for your prompt help,  that looks promising.
>
> Here is my user case.
> 1.  generate the tika app using the source.
> 2. integrate itto a thirdparty application
> 3. use the tika extraction fuctionalities for images.
> 4. I have attached the image, it's row metadata and the tika results.

How are you calling Tika?

Also, I think that all the metadata you want is available in the Metadata 
object, so you can likely just read it out of there. Try having a play 
with the --metadata option to tika-app.jar (the demonstration command line 
tika program)

Nick

RE: XMP Metadata extraction

Posted by "Withanage, Dulip" <wi...@asia-europe.uni-heidelberg.de>.
thank you for your prompt help,  that looks promising.

Here is my user case.
1.  generate the tika app using the source.
2. integrate itto a thirdparty application 
3. use the tika extraction fuctionalities for images.
4. I have attached the image, it's row metadata and the tika results.

Input : xmp-meadata-test.jpg
Tika output:  xmp-meadata-test.jpg.xml
Intended output :   xmp-meadata-test.jpg.xmp  (retrieved using adobe bridge)

Best ,
Dulip




----------------------------------------
Dulip Withanage
Senior softwaredeveloper.
Karl Jaspers Center
University of Heidelberg
Germany
________________________________________
From: Nick Burch [nick.burch@alfresco.com]
Sent: Monday, March 28, 2011 3:46 PM
To: user@tika.apache.org
Subject: Re: XMP Metadata extraction

On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> We are interesting in extracting the row metadata (not formatted in XHML
> as SAX events) from the files using tika.

Generally speaking, all of the metadata that is extracted is placed into
the Metadata object you supply when parsing. The SAX events are generated
for the textual content of the file.

I think you might find that Tika already does what you need, but if not
I'd suggest you come back with a concrete example of a file, it's
metadata, the XML you get, and what you'd really hoped for!

Nick

Re: XMP Metadata extraction

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> We are interesting in extracting the row metadata (not formatted in XHML 
> as SAX events) from the files using tika.

Generally speaking, all of the metadata that is extracted is placed into 
the Metadata object you supply when parsing. The SAX events are generated 
for the textual content of the file.

I think you might find that Tika already does what you need, but if not 
I'd suggest you come back with a concrete example of a file, it's 
metadata, the XML you get, and what you'd really hoped for!

Nick