You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by ola nowak <ol...@gmail.com> on 2012/02/10 15:48:41 UTC

multivalued fields

Hi list!
I'm using Tika to parse xml and then index them using Solr. I wrote a
parser, based on DcXMLParser to handle my special xml tags. It's working :)
Now I have a problem because sometimes in my files I have multivalued
fields (what I mean that there are e.g 2 <dc:title> title </dc:title> tags
in the file). I want them to be indexed in Solr as multivalued field but in
metadata I get from Tika parser those titles are concatenated and they are
in one title field. What should I do to get what I want? Any help
apprieciated :)
Regards,
Alex

Re: multivalued fields

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 10 Feb 2012, ola nowak wrote:
> I'm using Tika to parse xml and then index them using Solr. I wrote a
> parser, based on DcXMLParser to handle my special xml tags. It's working :)
> Now I have a problem because sometimes in my files I have multivalued
> fields (what I mean that there are e.g 2 <dc:title> title </dc:title> tags
> in the file). I want them to be indexed in Solr as multivalued field but in
> metadata I get from Tika parser those titles are concatenated and they are
> in one title field. What should I do to get what I want?

Have you tried with a recent nightly build of Tika? Only I seem to recall 
fixing a bug with multi valued xml metadata fairly recently, so it may 
already be sorted

Nick