You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steven Bower (JIRA)" <ji...@apache.org> on 2013/04/11 17:51:17 UTC

[jira] [Updated] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

     [ https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Bower updated SOLR-1782:
-------------------------------

    Attachment: SOLR-1782.2013-04-10.patch

Attached is a patch that I believe fixes the issues found in the latest unified patch from (2013.01.28)... The previous patch was interchangably using term nums from an UninvertedField with a FieldCache... this failed in cases where there were multiple values in the facet field.

I updated to use uif.lookupTerm() and a TermsEnum generated from uif.getOrdTermsEnum()..

All the tests now pass, but I am uncertain of the performance/memory footprint of the GetOrdTermsEnum() and/or whether it would be possible to hold onto this enum for longer than I am now.
                
> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-1782
>                 URL: https://issues.apache.org/jira/browse/SOLR-1782
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.4
>         Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>            Reporter: Gerald DeConto
>            Assignee: Hoss Man
>         Attachments: index.rar, SOLR-1782.2013-01-07.patch, SOLR-1782.2013-04-10.patch, SOLR-1782.2.patch, SOLR-1782.patch, SOLR-1782.patch, SOLR-1782.patch, SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems with a variety of field types, but in the case of multivalued fields it can either cause erroneous false stats when the number of distinct values is small, or it can cause ArrayIndexOutOfBoundsException when the number of distinct values is greater then the number of documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org