You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Bennett <mb...@ideaeng.com> on 2010/02/26 19:12:33 UTC

Question on Facets and Multiple values (confusion from the Wiki)

Certainly lots of matches on Solr and facets.

Contrived example:
* Solr 1.4, etc.
* Yellow pages, business listings.
* Business listings have a zip code that I will use in Faceted search.
* Companies with multiple stores/outlets/offices still only have one record,
but all applicable zip codes are listed. (yes, others ways to solve this, I
know)
* I want each listing to show in all of it's zip codes when zip-code facets
are presented
* I declare one or more fields in schema.xml per the Wiki, etc. So 3 fields,
etc.

The behavior I will actually get will be:
(a) As described, a business in 3 zipcodes will show up under all 3 facets
(b) Nope, only the first zip code will work correctly
(c) Yes and No. Only the first zip code will show up in the facets.  But I
could certainly still search on the other codes and find that listing.  If
other businesses are in the other facets, a click on those zip codes will
return my original business, but if it's the ONLY business in the zip code,
the facet will not get displayed

Normally I'd say (a).  Playing with facets and reading online, this should
be possible, though it may take 2 or 3 versions of the field.

But why then would I even ask about (b) or (c) ?  Well, there's stuff in the
Wiki that makes me hesitate.  First, look at this page:

http://wiki.apache.org/solr/SolrFacetingOverview
First section, 3rd bullet point, where it says:
"For faceting: Primary author only, using a solr.StringField:
    * Schildt, Herbert"

Obviously if I were sorting on this field the first author would matter a
lot.  And it's a bit ambiguous which copy of the field I'm using, etc.

Other things that cause me to hesitate:

http://wiki.apache.org/solr/SchemaXml#Common_field_options
The multiValued=true|false

And this:
http://wiki.apache.org/solr/FieldOptionsByUseCase
multiValued is left blank in many cases, and not filled in for facets.



--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Re: Question on Facets and Multiple values (confusion from the Wiki)

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.
Hi Mark,

If (a) is wanted behaviour, i.e. have a business show up in facets for all ZIPs, you should define a multi-valued ZIP field. Since a ZIP is a number, I don't see any reason for any analysis on it, a String or a lightly normalized field type would do the job both for search and facets.

What I think confuses you is the author example in SolrFacetingOverview which chooses to use only the main (first) author for faceting. This is a business decision for this application and has nothing to do with faceting as such. The default would be to include all ZIPs. Probably the example in this page should clarify this behaviour.

When it comes to the table in FieldOptionsByUseCase, I agree that for faceting it makes sense to recommend multiValued if you have multivalue content, but it is not required for faceting. I think this table was made to explain what params you MUST set to enable certain functionality on a field. I would set "true[6]" for multiValued and a footnote that it must be used for multi value faceting.

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 26. feb. 2010, at 19.12, Mark Bennett wrote:

> Certainly lots of matches on Solr and facets.
> 
> Contrived example:
> * Solr 1.4, etc.
> * Yellow pages, business listings.
> * Business listings have a zip code that I will use in Faceted search.
> * Companies with multiple stores/outlets/offices still only have one record,
> but all applicable zip codes are listed. (yes, others ways to solve this, I
> know)
> * I want each listing to show in all of it's zip codes when zip-code facets
> are presented
> * I declare one or more fields in schema.xml per the Wiki, etc. So 3 fields,
> etc.
> 
> The behavior I will actually get will be:
> (a) As described, a business in 3 zipcodes will show up under all 3 facets
> (b) Nope, only the first zip code will work correctly
> (c) Yes and No. Only the first zip code will show up in the facets.  But I
> could certainly still search on the other codes and find that listing.  If
> other businesses are in the other facets, a click on those zip codes will
> return my original business, but if it's the ONLY business in the zip code,
> the facet will not get displayed
> 
> Normally I'd say (a).  Playing with facets and reading online, this should
> be possible, though it may take 2 or 3 versions of the field.
> 
> But why then would I even ask about (b) or (c) ?  Well, there's stuff in the
> Wiki that makes me hesitate.  First, look at this page:
> 
> http://wiki.apache.org/solr/SolrFacetingOverview
> First section, 3rd bullet point, where it says:
> "For faceting: Primary author only, using a solr.StringField:
>    * Schildt, Herbert"
> 
> Obviously if I were sorting on this field the first author would matter a
> lot.  And it's a bit ambiguous which copy of the field I'm using, etc.
> 
> Other things that cause me to hesitate:
> 
> http://wiki.apache.org/solr/SchemaXml#Common_field_options
> The multiValued=true|false
> 
> And this:
> http://wiki.apache.org/solr/FieldOptionsByUseCase
> multiValued is left blank in many cases, and not filled in for facets.
> 
> 
> 
> --
> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513