You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by geeky2 <ge...@hotmail.com> on 2012/04/12 15:27:49 UTC

is there a downside to combining search fields with copyfield?

hello everyone,

can people give me their thoughts on this.

currently, my schema has individual fields to search on.

are there advantages or disadvantages to taking several of the individual
search fields and combining them in to a single search field?

would this affect search times, term tokenization or possibly other things.

example of individual fields

brand
category
partno

example of a single combined search field

part_info (would combine brand, category and partno)

thank you for any feedback
mark





--
View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3905349.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there a downside to combining search fields with copyfield?

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/12/2012 1:37 PM, geeky2 wrote:
> can you elaborate on this and how EDisMax would preclude the need for
> copyfield?
>
> i am using extended dismax now in my response handlers.
>
> here is an example of one of my requestHandlers
>
>    <requestHandler name="partItemNoSearch" class="solr.SearchHandler"
> default="false">
>      <lst name="defaults">
>        <str name="defType">edismax</str>
>        <str name="echoParams">all</str>
>        <int name="rows">5</int>
>        <str name="qf">itemNo^1.0</str>
>        <str name="q.alt">*:*</str>
>      </lst>
>      <lst name="appends">
>        <str name="fq">itemType:1</str>
>        <str name="sort">rankNo asc, score desc</str>
>      </lst>
>      <lst name="invariants">
>        <str name="facet">false</str>
>      </lst>
>    </requestHandler>

I'm not sure whether or not you can use a multiValued field as the 
source for copyField.  This is the sort of thing that the devs tend to 
think of, so my initial thought would be that it should work, though I 
would definitely test it to be absolutely sure.

Your request handler above has qf set to include the field called 
itemNo.  If you made another that had the following in it, you could do 
without a copyField, by using that request handler.  You would want to 
customize the field boosts:

<str name="qf">brand^2.0 category^3.0 partno</str>

To really leverage edismax, assuming that you are using a tokenizer that 
splits any of these fields into multiple tokens, and that you want to 
use relevancy ranking, you might want to consider defining pf as well.

Some observations about your handler above... you are free to ignore 
this: I believe that you don't really need the ^1.0 that's in qf, 
because there's only one field, and 1.0 is the default boost.  Also, 
from what I can tell, because you are only using one qf field and are 
not using any of the dismax-specific goodies like pf or mm, you don't 
really need edismax at all here.  If I'm right, to remove edismax, just 
specify itemNo as the value for the df parameter (default field) and 
remove the defType.  The q.alt parameter might also need to come out.

Solr 3.6 (should be released soon) has deprecated the defaultSearchField 
and defaultOperator parameters in schema.xml, the df and q.op handler 
parameters are the replacement.  This will be enforced in Solr 4.0.

http://wiki.apache.org/solr/SearchHandler#Query_Params

Thanks,
Shawn


Re: is there a downside to combining search fields with copyfield?

Posted by geeky2 <ge...@hotmail.com>.
>>
You end up with one multivalued field, which means that you can only
have one analyzer chain.
<<

actually two of the three fields being considered for combination in to a
single field ARE multivalued fields.

would this be an issue?

>>
  With separate fields, each field can be
analyzed differently.  Also, if you are indexing and/or storing the
individual fields, you may have data duplication in your index, making
it larger and increasing your disk/RAM requirements.
<<

this makes sense


>>
  That field will
have a higher termcount than the individual fields, which means that
searches against it will naturally be just a little bit slower.
<<

ok

>>
  Your
application will not have to do as much work to construct a query, though.
<<

actually this is the primary reason this came up.  

>>
If you are already planning to use dismax/edismax, then you don't need
the overhead of a copyField.  You can simply provide access to (e)dismax
search with the qf (and possibly pf) parameters predefined, or your
application can provide these parameters.

http://wiki.apache.org/solr/ExtendedDisMax
<<

can you elaborate on this and how EDisMax would preclude the need for
copyfield?

i am using extended dismax now in my response handlers.

here is an example of one of my requestHandlers

  <requestHandler name="partItemNoSearch" class="solr.SearchHandler"
default="false">
    <lst name="defaults">
      <str name="defType">edismax</str>
      <str name="echoParams">all</str>
      <int name="rows">5</int>
      <str name="qf">itemNo^1.0</str>
      <str name="q.alt">*:*</str>
    </lst>
    <lst name="appends">
      <str name="fq">itemType:1</str>
      <str name="sort">rankNo asc, score desc</str>
    </lst>
    <lst name="invariants">
      <str name="facet">false</str>
    </lst>
  </requestHandler>






Thanks,
Shawn 

--
View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3906265.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there a downside to combining search fields with copyfield?

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/12/2012 7:27 AM, geeky2 wrote:
> currently, my schema has individual fields to search on.
>
> are there advantages or disadvantages to taking several of the individual
> search fields and combining them in to a single search field?
>
> would this affect search times, term tokenization or possibly other things.
>
> example of individual fields
>
> brand
> category
> partno
>
> example of a single combined search field
>
> part_info (would combine brand, category and partno)

You end up with one multivalued field, which means that you can only 
have one analyzer chain.  With separate fields, each field can be 
analyzed differently.  Also, if you are indexing and/or storing the 
individual fields, you may have data duplication in your index, making 
it larger and increasing your disk/RAM requirements.  That field will 
have a higher termcount than the individual fields, which means that 
searches against it will naturally be just a little bit slower.  Your 
application will not have to do as much work to construct a query, though.

If you are already planning to use dismax/edismax, then you don't need 
the overhead of a copyField.  You can simply provide access to (e)dismax 
search with the qf (and possibly pf) parameters predefined, or your 
application can provide these parameters.

http://wiki.apache.org/solr/ExtendedDisMax

Thanks,
Shawn