You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Khalid Yagoubi (JIRA)" <ji...@apache.org> on 2009/12/11 22:46:19 UTC

[jira] Commented: (SOLR-1633) Solr Cell should be smarter about literal and multiValued="false"

    [ https://issues.apache.org/jira/browse/SOLR-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789552#action_12789552 ] 

Khalid Yagoubi commented on SOLR-1633:
--------------------------------------

I have written a patch for Tika Solr Extraction to ignore tika fields it's works but I'm not sure my patch is the best way.
It's solved my problem by avoiding tika extract metadata that conflict with my own literral non multivalued field.
Exemple : <meta name="id" content="10"/> is extracted as id or I give my own id : litteral.id = 12
==> error because id is non multivalued field

I explain here my patch :
- I patched SolrContentHandler.java
- I added a params contentOnly= true|false
- I ignore metadata from Tika that are defined in the schema

Ideas for improvements : 
- Ignore only metadata that are given in literral.foo and is not multivalued
- Prefix these fields
- Find a better name for params contentOnly or ign.meta.conflict

I'll submit my patch tommorow in the night

Thanks for suggestions

> Solr Cell should be smarter about literal and multiValued="false"
> -----------------------------------------------------------------
>
>                 Key: SOLR-1633
>                 URL: https://issues.apache.org/jira/browse/SOLR-1633
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Hoss Man
>
> As noted on solr-user, SolrCell has less then ideal behavior when "foo" is a single value field, and literal.foo=bar is specified in the request, but Tika also produces a value for the "foo" field from the document.  It seems like a possible improvement here would be for SolrCell to ignore the value from Tika if it already has one that was explicitly provided (as opposed to the current behavior of letting hte add fail because of multiple values in a single valued field).
> It seems pretty clear that in cases like this, the users intention is to have their one literal field used as the value.
> http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.