You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2009/09/02 02:17:33 UTC

[jira] Created: (SOLR-1401) solr should error on document add/update if uniqueKey field has multiple tokens.

solr should error on document add/update if uniqueKey field has multiple tokens.
--------------------------------------------------------------------------------

                 Key: SOLR-1401
                 URL: https://issues.apache.org/jira/browse/SOLR-1401
             Project: Solr
          Issue Type: Improvement
            Reporter: Hoss Man


over the years, have seem more then a few solr-user posts noticing odd behavior when using a uniqueKey field configured to use TextField with a non trivial analyzer ... we shouldn't error on TextField (KeyworkdTokenizer is perfectly legitimate) but we should error if that analyzer produces multiple tokens.  

Likewise we should verify that good error messages if uniqueKey field is configured such that multivalued=true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1401) solr should error on document add/update if uniqueKey field has multiple tokens.

Posted by "Igor Motov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751692#action_12751692 ] 

Igor Motov commented on SOLR-1401:
----------------------------------

It might be helpful to expand this to other non-trivial analyzers as well. Even if an analyzer produces a single token, removal of duplicates and distributed search don't function properly for any ids that were modified by the analyzer. To see how it works, just change type of id field to tightText and add a record with id "ID" twice. The tightText analyzer produces a single token for this value, and yet the record appears twice in the result list. At the same time, in distributed search (even with a single shard), these records completely disappear from the result list.  

This problem combined with recommendation for using textTight for SKUs in the schema.xml causes problems for some novice users. Frequently, SKU is a natural id and changing type for id from "string" to "textTight" is one of the first schema modifications that some users do, and then it takes them days to figure out the problem:

http://www.nabble.com/uniqueKey-gives-duplicate-values-td15341288.html
http://www.nabble.com/Adding-new-docs%2C-but-duplicating-instead-of-updating-td25241444.html
http://www.nabble.com/Solr-Shard---Strange-results-td23561201.html
http://www.nabble.com/Shard-Query-Problem-td22110121.html


> solr should error on document add/update if uniqueKey field has multiple tokens.
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-1401
>                 URL: https://issues.apache.org/jira/browse/SOLR-1401
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> over the years, have seem more then a few solr-user posts noticing odd behavior when using a uniqueKey field configured to use TextField with a non trivial analyzer ... we shouldn't error on TextField (KeyworkdTokenizer is perfectly legitimate) but we should error if that analyzer produces multiple tokens.  
> Likewise we should verify that good error messages if uniqueKey field is configured such that multivalued=true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.