You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by alexander sulz <a....@digiconcept.net> on 2011/06/17 14:22:11 UTC

Re: Controlling Tika's metadata

I have the same problem with discarding the metadata title.
I thought the parameter "captureAttr" (can be provided at the 
solrconfig.xml and via get/post as a parameter) is responsible for that? 
I set it to false in in the xml and as a parameter, still, I get "not 
multivalued field" errors due to metadata & literals delivering content 
to a "no multivalued" field. ;(

using 3.1 though.

On 02.02.2011 17:13, Grant Ingersoll wrote:
> On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote:
>
>> Just getting my feet wet with the text extraction using both schema and
>> solrconfig settings from the example directory in the 1.4 distribution, so I
>> might miss something obvious.
>>
>> Trying to provide my own title (and discarding the one received through Tika's
>> metadata) wasn't straightforward. I had to use the following:
>>
>> fmap.title=tika_title (to discard the Tika title)
>> literal.attr_title=New Title (to provide the correct one)
>> fmap.attr_title=title (to map it back to the field as I would like to use title
>> in searches)
>>
>> Is there anything easier than the above?
>>
>> How can this best be generalized to other metadata provided by Tika (which in
>> our use case will be mostly ignored, as it is provided separately)?
> You can provide your own ContentHandler (see the wiki docs).  I think it would be reasonable to patch the ExtractingRequestHandler to have a no metadata option and it wouldn't be that hard.