You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jamie Johnson <je...@gmail.com> on 2012/04/05 05:35:11 UTC
Duplicates in Facets
I am currently indexing some information and am wondering why I am
getting duplicates in facets. From what I can tell they are the same,
but is there any case that could cause this that I may not be thinking
of? Could this be some non printable character making it's way into
the index?
Sample output from luke
<lst name="fields">
<lst name="organization_umvs">
<str name="type">string</str>
<str name="schema">I--M---OF----l</str>
<str name="dynamicBase">*_umvs</str>
<str name="index">(unstored field)</str>
<int name="docs">332</int>
<int name="distinct">-1</int>
<lst name="topTerms">
<int name="ORGANIZATION 1">328</int>
<int name="ORGANIZATION 2">124</int>
<int name="ORGANIZATION 2">36</int>
<int name="ORGANIZATION 2">20</int>
<int name="ORGANIZATION 3">4</int>
</lst>
Re: Duplicates in Facets
Posted by Jamie Johnson <je...@gmail.com>.
Yes, thanks for the reply. Turns out there is whitespace differences
in these fields, thank you for the quick reply!
On Wed, Apr 4, 2012 at 11:45 PM, Darren Govoni <da...@ontrenet.com> wrote:
> Try using Luke to look at your index and see if there are multiple
> similar TFV's. You can browse them easily in Luke.
>
> On Wed, 2012-04-04 at 23:35 -0400, Jamie Johnson wrote:
>> I am currently indexing some information and am wondering why I am
>> getting duplicates in facets. From what I can tell they are the same,
>> but is there any case that could cause this that I may not be thinking
>> of? Could this be some non printable character making it's way into
>> the index?
>>
>>
>> Sample output from luke
>>
>> <lst name="fields">
>> <lst name="organization_umvs">
>> <str name="type">string</str>
>> <str name="schema">I--M---OF----l</str>
>> <str name="dynamicBase">*_umvs</str>
>> <str name="index">(unstored field)</str>
>> <int name="docs">332</int>
>> <int name="distinct">-1</int>
>> <lst name="topTerms">
>> <int name="ORGANIZATION 1">328</int>
>> <int name="ORGANIZATION 2">124</int>
>> <int name="ORGANIZATION 2">36</int>
>> <int name="ORGANIZATION 2">20</int>
>> <int name="ORGANIZATION 3">4</int>
>> </lst>
>>
>
>
Re: Duplicates in Facets
Posted by Darren Govoni <da...@ontrenet.com>.
Try using Luke to look at your index and see if there are multiple
similar TFV's. You can browse them easily in Luke.
On Wed, 2012-04-04 at 23:35 -0400, Jamie Johnson wrote:
> I am currently indexing some information and am wondering why I am
> getting duplicates in facets. From what I can tell they are the same,
> but is there any case that could cause this that I may not be thinking
> of? Could this be some non printable character making it's way into
> the index?
>
>
> Sample output from luke
>
> <lst name="fields">
> <lst name="organization_umvs">
> <str name="type">string</str>
> <str name="schema">I--M---OF----l</str>
> <str name="dynamicBase">*_umvs</str>
> <str name="index">(unstored field)</str>
> <int name="docs">332</int>
> <int name="distinct">-1</int>
> <lst name="topTerms">
> <int name="ORGANIZATION 1">328</int>
> <int name="ORGANIZATION 2">124</int>
> <int name="ORGANIZATION 2">36</int>
> <int name="ORGANIZATION 2">20</int>
> <int name="ORGANIZATION 3">4</int>
> </lst>
>