You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nick Burch <ap...@gagravarr.org> on 2013/03/05 22:33:33 UTC
Re: how to add more metadata to tika extraction?
On Wed, 27 Feb 2013, eShard wrote:
> I manually ran the tika-app --gui and I dropped the rss feed into it.
> Here's what the metadata output:
>
> Content-Length: 615913
> Content-Type: application/rss+xml
> dc:description: This is an IBM C3 Public Files feed generated by a Java
> application.
> dc:title: IBM - C3 Public Files RSS feed
> description: This is an IBM C3 Public Files feed generated by a Java
> application.
> title: IBM - C3 Public Files RSS feed
Looks like the metadata you want isn't being pulled out as metadata by
Tika
> that's not what I was expecting. where are the items? the items are in
> the xml but tika isn't showing them...
Metadata != content
I'd suspect that if you look at the content output (eg run tika-app with
the --xml flag rather than --gui) you'll see that there. Do you?
Nick