You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/09/29 16:37:41 UTC

Supported Metadata Tags

Hi,

I was wondering if Tika supports:
IPTC (image)
XMP (image/video)  -- yes, AFAICT
EXIF (image)   -- yes, AFAICT
DCMI (image)
PLUS (image)
RIFF (video)
ID3 (audio)   -- yes, AFAICT

Anyone else know about the other formats?

Thanks,
Grant

--------------------------
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8


Re: Supported Metadata Tags

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 29 Sep 2010, Grant Ingersoll wrote:
> IPTC (image)

There is support for this in Staffan Olsson's git fork (see TIKA-482). I'm 
hoping Staffan will submit an updated patch of this shortly which we'll 
then be able to apply :)

> XMP (image/video)  -- yes, AFAICT

We can generate XMP metadata, but not read it. If memory serves, we're 
waiting on some bug fixes in pdfbox before we can use some of their code 
for reading, but it's not my bug so I could be wrong!

> EXIF (image)   -- yes, AFAICT

Yes

> DCMI (image)

As in dublin core? If so, we do map some image metadata onto dublin core 
entries

> PLUS (image)

I've not come across this one before, so I suspect we don't support it

> RIFF (video)

No support yet, but it's one that there is some interest in. If you could 
upload some short video files and audio files to JIRA, along with the 
details of the metadata in them, it might be enough to prod someone into 
adding at least basic support :)

> ID3 (audio)   -- yes, AFAICT

Support for ID3 v1, v2.2, v2.3 and v2.4 are all in, yes

Nick

Re: Supported Metadata Tags

Posted by Grant Ingersoll <gs...@apache.org>.
Thanks for both of the replies.  Very helpful!

-Grant

On Sep 29, 2010, at 11:22 AM, Mattmann, Chris A (388J) wrote:

> Hey Grant,
> 
> In the latest version of Tika 0.8 trunk, I added a utility to print the
> supported metadata models and associated tags that are part of Tika.
> 
> If you¹ve built a fresh copy of tika-app, you can do:
> 
> java jar tika-app-0.8-SNAPSHOT.jar --list-met-models
> 
> That will print you out a list of:
> 
> <model name>
> <met key 1>
> <met key 2>
> <model name 2>
> <met key 1>
> ...
> 
> On your list below, Tika supports DCMI if you are talking about the Dublin
> Core Metadata Initiative (DCMI). As for the others, others can speak up...
> 
> Cheers,
> Chris
> 
> 
> On 9/29/10 7:37 AM, "Grant Ignersoll" <gs...@apache.org> wrote:
> 
>> Hi,
>> 
>> I was wondering if Tika supports:
>> IPTC (image)
>> XMP (image/video)  -- yes, AFAICT
>> EXIF (image)   -- yes, AFAICT
>> DCMI (image)
>> PLUS (image)
>> RIFF (video)
>> ID3 (audio)   -- yes, AFAICT
>> 
>> Anyone else know about the other formats?
>> 
>> Thanks,
>> Grant
>> 


Re: Supported Metadata Tags

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Grant,

In the latest version of Tika 0.8 trunk, I added a utility to print the
supported metadata models and associated tags that are part of Tika.

If you¹ve built a fresh copy of tika-app, you can do:

java ­jar tika-app-0.8-SNAPSHOT.jar --list-met-models

That will print you out a list of:

<model name>
 <met key 1>
 <met key 2>
<model name 2>
 <met key 1>
...

On your list below, Tika supports DCMI if you are talking about the Dublin
Core Metadata Initiative (DCMI). As for the others, others can speak up...

Cheers,
Chris


On 9/29/10 7:37 AM, "Grant Ignersoll" <gs...@apache.org> wrote:

> Hi,
> 
> I was wondering if Tika supports:
> IPTC (image)
> XMP (image/video)  -- yes, AFAICT
> EXIF (image)   -- yes, AFAICT
> DCMI (image)
> PLUS (image)
> RIFF (video)
> ID3 (audio)   -- yes, AFAICT
> 
> Anyone else know about the other formats?
> 
> Thanks,
> Grant
> 
> --------------------------
> Grant Ingersoll
> http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++