You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nick Burch <ap...@gagravarr.org> on 2013/03/05 22:33:33 UTC

Re: how to add more metadata to tika extraction?

On Wed, 27 Feb 2013, eShard wrote:
> I manually ran the tika-app --gui and I dropped the rss feed into it.
> Here's what the metadata output:
>
> Content-Length: 615913
> Content-Type: application/rss+xml
> dc:description: This is an IBM C3 Public Files feed generated by a Java
> application.
> dc:title: IBM - C3 Public Files RSS feed
> description: This is an IBM C3 Public Files feed generated by a Java
> application.
> title: IBM - C3 Public Files RSS feed

Looks like the metadata you want isn't being pulled out as metadata by 
Tika

> that's not what I was expecting. where are the items? the items are in 
> the xml but tika isn't showing them...

Metadata != content

I'd suspect that if you look at the content output (eg run tika-app with 
the --xml flag rather than --gui) you'll see that there. Do you?

Nick