You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Withanage, Dulip" <wi...@asia-europe.uni-heidelberg.de> on 2011/03/28 14:58:51 UTC
XMP Metadata extraction
Dear developers and users,
We are interesting in extracting the row metadata (not formatted in XHML as SAX events) from the files using tika. Is there any way that we could bypass the formatting mechanism for spectific file formats?
Best ,
Dulip Withanage
Senior software developer
Karl Jaspers Center,
University of Heidelberg.
Germany
RE: XMP Metadata extraction
Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 6 Apr 2011, Withanage, Dulip wrote:
> 1. I tried calling the --medatadata option and it gives me the
> metadataname:value. So this looks promising to me, if i could format the
> above output as xml. what is your advice to do it the best way?
You'll probably want to write some java code at this point, rather than
just calling the tika app on the command line. Grab the Metadata object
back, then loop over the entries and output them as XML in whatever format
you want them to be in.
> 2. I have seen the xmp class
> org.apache.tika.parser.image.xmp.xmp.XMPPackerScanner Is there anyway to
> use this as default parser for the jpeg and tiff. Can it be done by
> configuring in the tika-mimetypes.xml ?
Nope, tika-mimetypes.xml is used for detection, and the parser stuff is
different.
XMPPackerScanner isn't a parser though, but instead is called by the
existing parsers when they detect XMP metadata within a file. The existing
JPEG and TIFF parsers already do this for you
Nick
RE: XMP Metadata extraction
Posted by "Withanage, Dulip" <wi...@asia-europe.uni-heidelberg.de>.
Hi Nick,
1. I tried calling the --medatadata option and it gives me the metadataname:value.
So this looks promising to me, if i could format the above output as xml.
what is your advice to do it the best way?
2. I have seen the xmp class org.apache.tika.parser.image.xmp.xmp.XMPPackerScanner
Is there anyway to use this as default parser for the jpeg and tiff. Can it be done by configuring in the tika-mimetypes.xml ?
Thanks,
Dulip
---------------------------------------
Dulip Withanage
Senior Software developer.
Karl Jaspers Center
University of Heidelberg
Germany
________________________________________
From: Nick Burch [nick.burch@alfresco.com]
Sent: Monday, March 28, 2011 4:43 PM
To: user@tika.apache.org
Cc: oesterg@gmail.com
Subject: RE: XMP Metadata extraction
On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> thank you for your prompt help, that looks promising.
>
> Here is my user case.
> 1. generate the tika app using the source.
> 2. integrate itto a thirdparty application
> 3. use the tika extraction fuctionalities for images.
> 4. I have attached the image, it's row metadata and the tika results.
How are you calling Tika?
Also, I think that all the metadata you want is available in the Metadata
object, so you can likely just read it out of there. Try having a play
with the --metadata option to tika-app.jar (the demonstration command line
tika program)
Nick
RE: XMP Metadata extraction
Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> thank you for your prompt help, that looks promising.
>
> Here is my user case.
> 1. generate the tika app using the source.
> 2. integrate itto a thirdparty application
> 3. use the tika extraction fuctionalities for images.
> 4. I have attached the image, it's row metadata and the tika results.
How are you calling Tika?
Also, I think that all the metadata you want is available in the Metadata
object, so you can likely just read it out of there. Try having a play
with the --metadata option to tika-app.jar (the demonstration command line
tika program)
Nick
RE: XMP Metadata extraction
Posted by "Withanage, Dulip" <wi...@asia-europe.uni-heidelberg.de>.
thank you for your prompt help, that looks promising.
Here is my user case.
1. generate the tika app using the source.
2. integrate itto a thirdparty application
3. use the tika extraction fuctionalities for images.
4. I have attached the image, it's row metadata and the tika results.
Input : xmp-meadata-test.jpg
Tika output: xmp-meadata-test.jpg.xml
Intended output : xmp-meadata-test.jpg.xmp (retrieved using adobe bridge)
Best ,
Dulip
----------------------------------------
Dulip Withanage
Senior softwaredeveloper.
Karl Jaspers Center
University of Heidelberg
Germany
________________________________________
From: Nick Burch [nick.burch@alfresco.com]
Sent: Monday, March 28, 2011 3:46 PM
To: user@tika.apache.org
Subject: Re: XMP Metadata extraction
On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> We are interesting in extracting the row metadata (not formatted in XHML
> as SAX events) from the files using tika.
Generally speaking, all of the metadata that is extracted is placed into
the Metadata object you supply when parsing. The SAX events are generated
for the textual content of the file.
I think you might find that Tika already does what you need, but if not
I'd suggest you come back with a concrete example of a file, it's
metadata, the XML you get, and what you'd really hoped for!
Nick
Re: XMP Metadata extraction
Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 28 Mar 2011, Withanage, Dulip wrote:
> We are interesting in extracting the row metadata (not formatted in XHML
> as SAX events) from the files using tika.
Generally speaking, all of the metadata that is extracted is placed into
the Metadata object you supply when parsing. The SAX events are generated
for the textual content of the file.
I think you might find that Tika already does what you need, but if not
I'd suggest you come back with a concrete example of a file, it's
metadata, the XML you get, and what you'd really hoped for!
Nick