You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by realw5 <dr...@improvementdirect.com> on 2007/04/27 18:37:38 UTC

Facet Results Strange - Help

Hello,
I'm running into some strange results for some facets of mine. Below you'll
see the XML returned from solr. I did a query using the standard request
handler. Notice the duplicated values returned (american standard, delta,
etc). There is actually quite a few of them. At first I though it may be
because of case sensitivity, but I since lower case everything going to
solr. 

Hopefully someone can chime in with some tips, thanks!

Dan

<?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">4</int> 
  </lst>
  <result name="response" numFound="2328" start="0" /> 
- <lst name="facet_counts">
  <lst name="facet_queries" /> 
- <lst name="facet_fields">
- <lst name="manufacturer_facet">
  <int name="kohler">1560</int> 
  <int name="american standard">197</int> 
  <int name="toto">181</int> 
  <int name="bemis">83</int> 
  <int name="porcher">56</int> 
  <int name="ginger">45</int> 
  <int name="elements of design">40</int> 
  <int name="brasstech">18</int> 
  <int name="st thomas">18</int> 
  <int name="hansgrohe">15</int> 
  <int name="sterling">14</int> 
  <int name="whitehaus">13</int> 
  <int name="delta">12</int> 
  <int name="jacuzzi">10</int> 
  <int name="cifial">8</int> 
  <int name="kwc">8</int> 
  <int name="herbeau">7</int> 
  <int name="jado">7</int> 
  <int name="elizabethan classics">6</int> 
  <int name="showhouse by moen">5</int> 
  <int name="grohe">4</int> 
  <int name="creative specialties">3</int> 
  <int name="latoscana">3</int> 
  <int name="american standard">2</int> 
  <int name="danze">2</int> 
  <int name="ronbow">2</int> 
  <int name="belle foret">1</int> 
  <int name="dornbracht">1</int> 
  <int name="kohler">1</int> 
  <int name="myson">1</int> 
  <int name="newport brass">1</int> 
  <int name="price pfister">1</int> 
  <int name="quayside publishing">1</int> 
  <int name="st. thomas">1</int> 
  <int name="adagio">0</int> 
  <int name="alno">0</int> 
  <int name="alsons">0</int> 
  <int name="bates and bates">0</int> 
  <int name="blanco">0</int> 
  <int name="cec">0</int> 
  <int name="cole and co">0</int> 
  <int name="competitive">0</int> 
  <int name="corstone">0</int> 
  <int name="creative specialties">0</int> 
  <int name="danze">0</int> 
  <int name="decolav">0</int> 
  <int name="dolan designs">0</int> 
  <int name="doralfe">0</int> 
  <int name="dornbracht">0</int> 
  <int name="dreamline">0</int> 
  <int name="elkay">0</int> 
  <int name="fontaine">0</int> 
  <int name="franke">0</int> 
  <int name="grohe">0</int> 
  <int name="hamat">0</int> 
  <int name="hydrosystems">0</int> 
  <int name="improvement direct">0</int> 
  <int name="insinkerator">0</int> 
  <int name="kenroy international">0</int> 
  <int name="kichler">0</int> 
  <int name="kindred">0</int> 
  <int name="maxim">0</int> 
  <int name="mico">0</int> 
  <int name="moen">0</int> 
  <int name="moen">0</int> 
  <int name="mr sauna">0</int> 
  <int name="mr steam">0</int> 
  <int name="neo elements">0</int> 
  <int name="newport brass">0</int> 
  <int name="ondine">0</int> 
  <int name="pegasus">0</int> 
  <int name="price pfister">0</int> 
  <int name="progress lighting">0</int> 
  <int name="pulse">0</int> 
  <int name="quoizel">0</int> 
  <int name="robern">0</int> 
  <int name="rohl">0</int> 
  <int name="sagehill designs">0</int> 
  <int name="sea gull lighting">0</int> 
  <int name="show house">0</int> 
  <int name="sloan">0</int> 
  <int name="st%2e thomas">0</int> 
  <int name="st%2e thomas creations">0</int> 
  <int name="steamist">0</int> 
  <int name="swanstone">0</int> 
  <int name="thomas lighting">0</int> 
  <int name="warmatowel">0</int> 
  <int name="waste king">0</int> 
  <int name="waterstone">0</int> 
  </lst>
  </lst>
  </lst>
  </response>
-- 
View this message in context: http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a10222284
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Results Strange - Help

Posted by Chris Hostetter <ho...@fucit.org>.
: It's likely you have the facet category added more than once for one
: or more docs. Like this;
:
: <field name="manufacturer_facet">american standard</field>
: <field name="manufacturer_facet">american standard</field>
:
: Are you adding the facet values on-the-fly? This happened to me and I
: solved it by removing the duplicate facet fields.

that's really odd ... i can't think of any way that exactly duplicate
field values would be counted twice in the current facet.field code.

I just tested this using the exampledocs by adding "electronics" to the
cat field of some docs multiple times, and i couldn't reproduce this
behavior.

can you elaborate more on how to trigger it?


-Hoss


Re: Facet Results Strange - Help

Posted by Jennifer Seaman <je...@digitalartwork.org>.
>Hopefully someone can chime in with some tips, thanks!

It's likely you have the facet category added more than once for one 
or more docs. Like this;

<field name="manufacturer_facet">american standard</field>
<field name="manufacturer_facet">american standard</field>

Are you adding the facet values on-the-fly? This happened to me and I 
solved it by removing the duplicate facet fields.

Regards,
Jennifer Seaman 


Re: Facet Results Strange - Help

Posted by Yonik Seeley <yo...@apache.org>.
On 4/27/07, realw5 <dr...@improvementdirect.com> wrote:
> Ok, I just finished indexing about 20k in documents. I took a look at so far
> the problem has not appearred again. What I'm thinking caused it was I was
> not adding overwritePending & overwriteCommited in the add process. Therefor
> over time as data was being cleaned up, it was just appending to the
> existing data.

That is the default anyway.  Even if duplicate documents were somehow
added, that should not cause duplicates in facet results.  It should
be impossible to get duplicate values from facet.field, regardless of
what the index looks like.

> I did have once cause of repeated values, but after looking at the python
> writer, I notice a space at the end. I can fix this issue by triming all my
> values before sening them to solr :-)

Hopefully you should have also seen the space in the XML response...
if it's not there, that would be a bug.

-Yonik

Re: Facet Results Strange - Help

Posted by Chris Hostetter <ho...@fucit.org>.
: writer, I notice a space at the end. I can fix this issue by triming all my
: values before sening them to solr :-)

The built in Field Faceting works on the indexed values, so Solr can solve
this for you if you use something like this for your facet field type...

   <fieldType name="facetString" class="solr.TextField" omitNorms="true">
     <analyzer>
      <!-- KeywordTokenizer does no actual tokenizing, so the entire
           input string is preserved as a single token
        -->
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <!-- The LowerCase TokenFilter does what you expect, which can be
           when you want your sorting to be case insensitive
        -->
      <filter class="solr.LowerCaseFilterFactory" />
      <!-- The TrimFilter removes any leading or trailing whitespace -->
      <filter class="solr.TrimFilterFactory" />
     </analyzer>
   </fieldType>



-Hoss


Re: Facet Results Strange - Help

Posted by realw5 <dr...@improvementdirect.com>.
Ok, I just finished indexing about 20k in documents. I took a look at so far
the problem has not appearred again. What I'm thinking caused it was I was
not adding overwritePending & overwriteCommited in the add process. Therefor
over time as data was being cleaned up, it was just appending to the
existing data.

I did have once cause of repeated values, but after looking at the python
writer, I notice a space at the end. I can fix this issue by triming all my
values before sening them to solr :-) 

I'm going to continue indexing, and if the problem popups up once fully
indexed I'll post back again. Otherwise thanks for the quick replies!

Dan


Yonik Seeley wrote:
> 
> On 4/27/07, realw5 <dr...@improvementdirect.com> wrote:
>> I have a dynamic field setup for facets. It looks like this:
>>
>> <dynamicField name="*_facet" type="string" indexed="true" stored="false"
>> multiValued="true" />
>>
>> I do this, because we add facets quite often, so having to modify the
>> schema
>> every time would be unfeasible.
>>
>> I'm currently reindexing from scratch, so I cannot try wt=python for
>> little
>> bit longer. Once it's done indexing I'll give that a go and see if I
>> notice
>> anything.
> 
> If it's really the same field value repeated, you've hit a bug.
> If so, it would be helpful if you could open a JIRA bug, and anything
> you can do to help us reproduce the problem would be appreciated.
> 
> -Yonik
> 
> 

-- 
View this message in context: http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a10226731
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Results Strange - Help

Posted by Yonik Seeley <yo...@apache.org>.
On 4/27/07, realw5 <dr...@improvementdirect.com> wrote:
> I have a dynamic field setup for facets. It looks like this:
>
> <dynamicField name="*_facet" type="string" indexed="true" stored="false"
> multiValued="true" />
>
> I do this, because we add facets quite often, so having to modify the schema
> every time would be unfeasible.
>
> I'm currently reindexing from scratch, so I cannot try wt=python for little
> bit longer. Once it's done indexing I'll give that a go and see if I notice
> anything.

If it's really the same field value repeated, you've hit a bug.
If so, it would be helpful if you could open a JIRA bug, and anything
you can do to help us reproduce the problem would be appreciated.

-Yonik

Re: Facet Results Strange - Help

Posted by realw5 <dr...@improvementdirect.com>.
I have a dynamic field setup for facets. It looks like this:

<dynamicField name="*_facet" type="string" indexed="true" stored="false"
multiValued="true" /> 

I do this, because we add facets quite often, so having to modify the schema
every time would be unfeasible.

I'm currently reindexing from scratch, so I cannot try wt=python for little
bit longer. Once it's done indexing I'll give that a go and see if I notice
anything.

Dan


Yonik Seeley wrote:
> 
> On 4/27/07, realw5 <dr...@improvementdirect.com> wrote:
>> Hello,
>> I'm running into some strange results for some facets of mine. Below
>> you'll
>> see the XML returned from solr. I did a query using the standard request
>> handler. Notice the duplicated values returned (american standard, delta,
>> etc). There is actually quite a few of them. At first I though it may be
>> because of case sensitivity, but I since lower case everything going to
>> solr.
>>
>> Hopefully someone can chime in with some tips, thanks!
> 
> What's the field definition for manufacturer_facet in your schema?  Is
> it multi-valued or not?
> 
> Also, can you try the python response format (wt=python) as it outputs
> only ASCII and escapes everything else... there is an off chance the
> strings look the same but aren't.
> 
> -Yonik
> 
> 

-- 
View this message in context: http://www.nabble.com/Facet-Results-Strange---Help-tf3658597.html#a10226359
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Results Strange - Help

Posted by Yonik Seeley <yo...@apache.org>.
On 4/27/07, realw5 <dr...@improvementdirect.com> wrote:
> Hello,
> I'm running into some strange results for some facets of mine. Below you'll
> see the XML returned from solr. I did a query using the standard request
> handler. Notice the duplicated values returned (american standard, delta,
> etc). There is actually quite a few of them. At first I though it may be
> because of case sensitivity, but I since lower case everything going to
> solr.
>
> Hopefully someone can chime in with some tips, thanks!

What's the field definition for manufacturer_facet in your schema?  Is
it multi-valued or not?

Also, can you try the python response format (wt=python) as it outputs
only ASCII and escapes everything else... there is an off chance the
strings look the same but aren't.

-Yonik