You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jamie Johnson <je...@gmail.com> on 2012/04/05 05:35:11 UTC

Duplicates in Facets

I am currently indexing some information and am wondering why I am
getting duplicates in facets.  From what I can tell they are the same,
but is there any case that could cause this that I may not be thinking
of?  Could this be some non printable character making it's way into
the index?


Sample output from luke

<lst name="fields">
  <lst name="organization_umvs">
    <str name="type">string</str>
    <str name="schema">I--M---OF----l</str>
    <str name="dynamicBase">*_umvs</str>
    <str name="index">(unstored field)</str>
    <int name="docs">332</int>
    <int name="distinct">-1</int>
    <lst name="topTerms">
      <int name="ORGANIZATION 1">328</int>
      <int name="ORGANIZATION 2">124</int>
      <int name="ORGANIZATION 2">36</int>
      <int name="ORGANIZATION 2">20</int>
      <int name="ORGANIZATION 3">4</int>
    </lst>

Re: Duplicates in Facets

Posted by Jamie Johnson <je...@gmail.com>.
Yes, thanks for the reply.  Turns out there is whitespace differences
in these fields, thank you for the quick reply!

On Wed, Apr 4, 2012 at 11:45 PM, Darren Govoni <da...@ontrenet.com> wrote:
> Try using Luke to look at your index and see if there are multiple
> similar TFV's. You can browse them easily in Luke.
>
> On Wed, 2012-04-04 at 23:35 -0400, Jamie Johnson wrote:
>> I am currently indexing some information and am wondering why I am
>> getting duplicates in facets.  From what I can tell they are the same,
>> but is there any case that could cause this that I may not be thinking
>> of?  Could this be some non printable character making it's way into
>> the index?
>>
>>
>> Sample output from luke
>>
>> <lst name="fields">
>>   <lst name="organization_umvs">
>>     <str name="type">string</str>
>>     <str name="schema">I--M---OF----l</str>
>>     <str name="dynamicBase">*_umvs</str>
>>     <str name="index">(unstored field)</str>
>>     <int name="docs">332</int>
>>     <int name="distinct">-1</int>
>>     <lst name="topTerms">
>>       <int name="ORGANIZATION 1">328</int>
>>       <int name="ORGANIZATION 2">124</int>
>>       <int name="ORGANIZATION 2">36</int>
>>       <int name="ORGANIZATION 2">20</int>
>>       <int name="ORGANIZATION 3">4</int>
>>     </lst>
>>
>
>

Re: Duplicates in Facets

Posted by Darren Govoni <da...@ontrenet.com>.
Try using Luke to look at your index and see if there are multiple
similar TFV's. You can browse them easily in Luke.

On Wed, 2012-04-04 at 23:35 -0400, Jamie Johnson wrote:
> I am currently indexing some information and am wondering why I am
> getting duplicates in facets.  From what I can tell they are the same,
> but is there any case that could cause this that I may not be thinking
> of?  Could this be some non printable character making it's way into
> the index?
> 
> 
> Sample output from luke
> 
> <lst name="fields">
>   <lst name="organization_umvs">
>     <str name="type">string</str>
>     <str name="schema">I--M---OF----l</str>
>     <str name="dynamicBase">*_umvs</str>
>     <str name="index">(unstored field)</str>
>     <int name="docs">332</int>
>     <int name="distinct">-1</int>
>     <lst name="topTerms">
>       <int name="ORGANIZATION 1">328</int>
>       <int name="ORGANIZATION 2">124</int>
>       <int name="ORGANIZATION 2">36</int>
>       <int name="ORGANIZATION 2">20</int>
>       <int name="ORGANIZATION 3">4</int>
>     </lst>
>