You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hasnain <ha...@hotmail.com> on 2010/10/01 09:15:54 UTC

Prioritizing advectives in solr search

Hi,

   My question is related to search results giving less importance to
adjectives, 

here is my scenario, im using dismax handler and my understanding is when I
query "Blue hammer", solr brings me results for "blue hammer", "blue" and
"hammer", and in the same hierarchy, which is understandable, is there any
way I can manage the "blue" keyword, so that solr searches for "blue hammer"
and "hammer" and not any results for "blue".

my handler is as follows...

 <requestHandler name="standard2" class="solr.SearchHandler">
    <!-- default values for query parameters -->
     <lst name="defaults">
	 <str name="defType">dismax</str>
       <str name="echoParams">explicit</str>
		<str name="tie">0.6</str>
		<str name="pf">name^2.3 mat_nr^0.4</str>
	<str name="mm">0%</str> 

any suggestion on this??
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Prioritizing-advectives-in-solr-search-tp1613029p1613029.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Prioritizing adjectives in solr search

Posted by Erick Erickson <er...@gmail.com>.
Spans do care about the order of words, so that might help....

Erick

On Tue, Oct 12, 2010 at 11:23 PM, Ron Mayer <rm...@0ape.com> wrote:

> Erick Erickson wrote:
> > You can do some interesting things with payloads. You could index a
> > particular value as the payload that identified the "kind" of word it
> was,
> > where "kind" is something you define. Then at query time, you could
> > boost depending on what part kind of word you identified it as in both
> > the query and at indexing time.
> >
> > But I can't even imagine how one would go about supporting this in a
> > general search engine. This kind of thing seems far too domain
> > specific.....
>
> Well, the "pf2" and "pf3" parameters in edismax come pretty close.
>
> For example, for the search query "red baseball cap black leather jacket",
> a "pf2" with no "phrase slop", combined with a "pf2" with a "phrase slop of
> 3"
> will do a pretty good job at finding "red caps" and "black jackets"
> and "baseball caps" and "leather jackets" before it'll find
> "red baseball jackets" and "leather caps".
>
> All it depended on is the convention that in english someone'll probably
> put adjectives before nouns in both the query and the document's text.
>
> The one annoyance is that I think the phrase slop doesn't care much
> about the order of words......
>
>
>
> > On Sun, Oct 10, 2010 at 8:50 PM, Ron Mayer <rm...@0ape.com> wrote:
> >
> >> Walter Underwood wrote:
> >>> I think this is a bad idea. The tf.idf algorithm will already put a
> >> higher weight on "hammers" than on "blue", because "hammers" will be
> more
> >> rare than "blue". Plus, you are making huge assumptions about the
> queries.
> >> In a search for "Canon camera", "Canon" is an adjective, but it is the
> >> important part of the query.
> >>> Have you looked at your query logs and which queries are successful and
> >> which are not?
> >>> Don't make radical changes like this unless you can justify them from
> the
> >> logs.
> >>
> >> The one radical change I'd like in the area of adjectives in noun
> clauses
> >> is if
> >> more weight were put when the adjectives apply to the appropriate noun.
> >>
> >> For example, a search for:
> >>   'red baseball cap black leather jacket'
> >> should find a doc with "the guy wore a red cap, blue jeans, and a
> leather
> >> jacket"
> >> before one that says "the guy wore a black cap, leather pants, and a red
> >> jacket".
> >>
> >>
> >> The closest I've come at doing this was to use a variety of "phrase
> slop"
> >> boosts simultaneously - so that "red [any_few_words] cap" "baseball cap"
> >> "leather jacket", "black [any_few_words] jacket" all add boosts to the
> >> score.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>> wunder
> >>>
> >>> On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> If you want "blue" to be used in search, then you should not treat it
> as
> >> a
> >>>> stopword.
> >>>>
> >>>> Re payloads: http://search-lucene.com/?q=payload+score
> >>>> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even
> >> better, look at
> >>>> hit #1)
> >>>>
> >>>> Otis
> >>>> ----
> >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>>> Lucene ecosystem search :: http://search-lucene.com/
> >>>>
> >>>>
> >>>>
> >>>> ----- Original Message ----
> >>>>> From: Hasnain <ha...@hotmail.com>
> >>>>> To: solr-user@lucene.apache.org
> >>>>> Sent: Mon, October 4, 2010 9:50:46 AM
> >>>>> Subject: Re: Prioritizing advectives in solr search
> >>>>>
> >>>>>
> >>>>> Hi Otis,
> >>>>>
> >>>>>         Thank you for replying,  unfortunately Im unable to fully
> grasp
> >> what
> >>>>> you are trying to say, can you  please elaborate what is payload with
> >>>>> adjective terms?
> >>>>>
> >>>>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs,
> now
> >> when
> >>>>> I search for  "Blue hammers", solr searches for "blue hammers" and
> >> "hammers"
> >>>>> but not  "blue", but the problem here is user can also search for
> just
> >>>>> "Blue", then it  wont search for anything...
> >>>>>
> >>>>> any suggestions on this??
> >>>>>
> >>>>> --
> >>>>> View  this message in context:
> >>>>>
> >>
> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
> >>>>> Sent  from the Solr - User mailing list archive at Nabble.com.
> >>>>>
> >>>
> >>>
> >>>
> >>
> >
>
>

Re: Prioritizing adjectives in solr search

Posted by Ron Mayer <rm...@0ape.com>.
Erick Erickson wrote:
> You can do some interesting things with payloads. You could index a
> particular value as the payload that identified the "kind" of word it was,
> where "kind" is something you define. Then at query time, you could
> boost depending on what part kind of word you identified it as in both
> the query and at indexing time.
> 
> But I can't even imagine how one would go about supporting this in a
> general search engine. This kind of thing seems far too domain
> specific.....

Well, the "pf2" and "pf3" parameters in edismax come pretty close.

For example, for the search query "red baseball cap black leather jacket",
a "pf2" with no "phrase slop", combined with a "pf2" with a "phrase slop of 3"
will do a pretty good job at finding "red caps" and "black jackets"
and "baseball caps" and "leather jackets" before it'll find
"red baseball jackets" and "leather caps".

All it depended on is the convention that in english someone'll probably
put adjectives before nouns in both the query and the document's text.

The one annoyance is that I think the phrase slop doesn't care much
about the order of words......



> On Sun, Oct 10, 2010 at 8:50 PM, Ron Mayer <rm...@0ape.com> wrote:
> 
>> Walter Underwood wrote:
>>> I think this is a bad idea. The tf.idf algorithm will already put a
>> higher weight on "hammers" than on "blue", because "hammers" will be more
>> rare than "blue". Plus, you are making huge assumptions about the queries.
>> In a search for "Canon camera", "Canon" is an adjective, but it is the
>> important part of the query.
>>> Have you looked at your query logs and which queries are successful and
>> which are not?
>>> Don't make radical changes like this unless you can justify them from the
>> logs.
>>
>> The one radical change I'd like in the area of adjectives in noun clauses
>> is if
>> more weight were put when the adjectives apply to the appropriate noun.
>>
>> For example, a search for:
>>   'red baseball cap black leather jacket'
>> should find a doc with "the guy wore a red cap, blue jeans, and a leather
>> jacket"
>> before one that says "the guy wore a black cap, leather pants, and a red
>> jacket".
>>
>>
>> The closest I've come at doing this was to use a variety of "phrase slop"
>> boosts simultaneously - so that "red [any_few_words] cap" "baseball cap"
>> "leather jacket", "black [any_few_words] jacket" all add boosts to the
>> score.
>>
>>
>>
>>
>>
>>
>>
>>> wunder
>>>
>>> On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:
>>>
>>>> Hi,
>>>>
>>>> If you want "blue" to be used in search, then you should not treat it as
>> a
>>>> stopword.
>>>>
>>>> Re payloads: http://search-lucene.com/?q=payload+score
>>>> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even
>> better, look at
>>>> hit #1)
>>>>
>>>> Otis
>>>> ----
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>>> From: Hasnain <ha...@hotmail.com>
>>>>> To: solr-user@lucene.apache.org
>>>>> Sent: Mon, October 4, 2010 9:50:46 AM
>>>>> Subject: Re: Prioritizing advectives in solr search
>>>>>
>>>>>
>>>>> Hi Otis,
>>>>>
>>>>>         Thank you for replying,  unfortunately Im unable to fully grasp
>> what
>>>>> you are trying to say, can you  please elaborate what is payload with
>>>>> adjective terms?
>>>>>
>>>>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs, now
>> when
>>>>> I search for  "Blue hammers", solr searches for "blue hammers" and
>> "hammers"
>>>>> but not  "blue", but the problem here is user can also search for just
>>>>> "Blue", then it  wont search for anything...
>>>>>
>>>>> any suggestions on this??
>>>>>
>>>>> --
>>>>> View  this message in context:
>>>>>
>> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
>>>>> Sent  from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>
>>>
>>>
>>
> 


Re: Prioritizing adjectives in solr search

Posted by Erick Erickson <er...@gmail.com>.
You can do some interesting things with payloads. You could index a
particular value as the payload that identified the "kind" of word it was,
where "kind" is something you define. Then at query time, you could
boost depending on what part kind of word you identified it as in both
the query and at indexing time.

But I can't even imagine how one would go about supporting this in a
general search engine. This kind of thing seems far too domain
specific.....

Best
Erick


On Sun, Oct 10, 2010 at 8:50 PM, Ron Mayer <rm...@0ape.com> wrote:

> Walter Underwood wrote:
> > I think this is a bad idea. The tf.idf algorithm will already put a
> higher weight on "hammers" than on "blue", because "hammers" will be more
> rare than "blue". Plus, you are making huge assumptions about the queries.
> In a search for "Canon camera", "Canon" is an adjective, but it is the
> important part of the query.
> >
> > Have you looked at your query logs and which queries are successful and
> which are not?
> >
> > Don't make radical changes like this unless you can justify them from the
> logs.
>
> The one radical change I'd like in the area of adjectives in noun clauses
> is if
> more weight were put when the adjectives apply to the appropriate noun.
>
> For example, a search for:
>   'red baseball cap black leather jacket'
> should find a doc with "the guy wore a red cap, blue jeans, and a leather
> jacket"
> before one that says "the guy wore a black cap, leather pants, and a red
> jacket".
>
>
> The closest I've come at doing this was to use a variety of "phrase slop"
> boosts simultaneously - so that "red [any_few_words] cap" "baseball cap"
> "leather jacket", "black [any_few_words] jacket" all add boosts to the
> score.
>
>
>
>
>
>
>
> >
> > wunder
> >
> > On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:
> >
> >> Hi,
> >>
> >> If you want "blue" to be used in search, then you should not treat it as
> a
> >> stopword.
> >>
> >> Re payloads: http://search-lucene.com/?q=payload+score
> >> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even
> better, look at
> >> hit #1)
> >>
> >> Otis
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >> Lucene ecosystem search :: http://search-lucene.com/
> >>
> >>
> >>
> >> ----- Original Message ----
> >>> From: Hasnain <ha...@hotmail.com>
> >>> To: solr-user@lucene.apache.org
> >>> Sent: Mon, October 4, 2010 9:50:46 AM
> >>> Subject: Re: Prioritizing advectives in solr search
> >>>
> >>>
> >>> Hi Otis,
> >>>
> >>>         Thank you for replying,  unfortunately Im unable to fully grasp
> what
> >>> you are trying to say, can you  please elaborate what is payload with
> >>> adjective terms?
> >>>
> >>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs, now
> when
> >>> I search for  "Blue hammers", solr searches for "blue hammers" and
> "hammers"
> >>> but not  "blue", but the problem here is user can also search for just
> >>> "Blue", then it  wont search for anything...
> >>>
> >>> any suggestions on this??
> >>>
> >>> --
> >>> View  this message in context:
> >>>
> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
> >>>
> >>> Sent  from the Solr - User mailing list archive at Nabble.com.
> >>>
> >
> >
> >
> >
>
>

Re: Prioritizing adjectives in solr search

Posted by Ron Mayer <rm...@0ape.com>.
Walter Underwood wrote:
> I think this is a bad idea. The tf.idf algorithm will already put a higher weight on "hammers" than on "blue", because "hammers" will be more rare than "blue". Plus, you are making huge assumptions about the queries. In a search for "Canon camera", "Canon" is an adjective, but it is the important part of the query.
> 
> Have you looked at your query logs and which queries are successful and which are not?
> 
> Don't make radical changes like this unless you can justify them from the logs.

The one radical change I'd like in the area of adjectives in noun clauses is if
more weight were put when the adjectives apply to the appropriate noun.

For example, a search for:
   'red baseball cap black leather jacket'
should find a doc with "the guy wore a red cap, blue jeans, and a leather jacket"
before one that says "the guy wore a black cap, leather pants, and a red jacket".


The closest I've come at doing this was to use a variety of "phrase slop"
boosts simultaneously - so that "red [any_few_words] cap" "baseball cap"
"leather jacket", "black [any_few_words] jacket" all add boosts to the score.







> 
> wunder
> 
> On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:
> 
>> Hi,
>>
>> If you want "blue" to be used in search, then you should not treat it as a 
>> stopword.
>>
>> Re payloads: http://search-lucene.com/?q=payload+score
>> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even better, look at 
>> hit #1)
>>
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> ----- Original Message ----
>>> From: Hasnain <ha...@hotmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Mon, October 4, 2010 9:50:46 AM
>>> Subject: Re: Prioritizing advectives in solr search
>>>
>>>
>>> Hi Otis,
>>>
>>>         Thank you for replying,  unfortunately Im unable to fully grasp what
>>> you are trying to say, can you  please elaborate what is payload with
>>> adjective terms?
>>>
>>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs, now when
>>> I search for  "Blue hammers", solr searches for "blue hammers" and "hammers"
>>> but not  "blue", but the problem here is user can also search for just
>>> "Blue", then it  wont search for anything...
>>>
>>> any suggestions on this?? 
>>>
>>> -- 
>>> View  this message in context: 
>>> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
>>>
>>> Sent  from the Solr - User mailing list archive at Nabble.com.
>>>
> 
> 
> 
> 


Re: Prioritizing advectives in solr search

Posted by Walter Underwood <wu...@wunderwood.org>.
I think this is a bad idea. The tf.idf algorithm will already put a higher weight on "hammers" than on "blue", because "hammers" will be more rare than "blue". Plus, you are making huge assumptions about the queries. In a search for "Canon camera", "Canon" is an adjective, but it is the important part of the query.

Have you looked at your query logs and which queries are successful and which are not?

Don't make radical changes like this unless you can justify them from the logs.

wunder

On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:

> Hi,
> 
> If you want "blue" to be used in search, then you should not treat it as a 
> stopword.
> 
> Re payloads: http://search-lucene.com/?q=payload+score
> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even better, look at 
> hit #1)
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message ----
>> From: Hasnain <ha...@hotmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Mon, October 4, 2010 9:50:46 AM
>> Subject: Re: Prioritizing advectives in solr search
>> 
>> 
>> Hi Otis,
>> 
>>         Thank you for replying,  unfortunately Im unable to fully grasp what
>> you are trying to say, can you  please elaborate what is payload with
>> adjective terms?
>> 
>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs, now when
>> I search for  "Blue hammers", solr searches for "blue hammers" and "hammers"
>> but not  "blue", but the problem here is user can also search for just
>> "Blue", then it  wont search for anything...
>> 
>> any suggestions on this?? 
>> 
>> -- 
>> View  this message in context: 
>> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
>> 
>> Sent  from the Solr - User mailing list archive at Nabble.com.
>> 





Re: Prioritizing advectives in solr search

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

If you want "blue" to be used in search, then you should not treat it as a 
stopword.

Re payloads: http://search-lucene.com/?q=payload+score
and http://search-lucene.com/?q=payload+score&fc_type=wiki (even better, look at 
hit #1)

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Hasnain <ha...@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Mon, October 4, 2010 9:50:46 AM
> Subject: Re: Prioritizing advectives in solr search
> 
> 
> Hi Otis,
> 
>          Thank you for replying,  unfortunately Im unable to fully grasp what
> you are trying to say, can you  please elaborate what is payload with
> adjective terms?
> 
> also Im using  stopwords.txt to stop adjectives, adverbs and verbs, now when
> I search for  "Blue hammers", solr searches for "blue hammers" and "hammers"
> but not  "blue", but the problem here is user can also search for just
> "Blue", then it  wont search for anything...
> 
> any suggestions on this?? 
> 
> -- 
> View  this message in context: 
>http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 

Re: Prioritizing advectives in solr search

Posted by Hasnain <ha...@hotmail.com>.
Hi Otis,

         Thank you for replying, unfortunately Im unable to fully grasp what
you are trying to say, can you please elaborate what is payload with
adjective terms?

also Im using stopwords.txt to stop adjectives, adverbs and verbs, now when
I search for "Blue hammers", solr searches for "blue hammers" and "hammers"
but not "blue", but the problem here is user can also search for just
"Blue", then it wont search for anything...

any suggestions on this?? 

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Prioritizing advectives in solr search

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Hasnain,

You'll need to apply POS (Part of Speech) on the input at/before indexing, then 
store a payload with your adjective terms, and finally use of those payload 
values to change the scoring at query time.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Hasnain <ha...@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Fri, October 1, 2010 3:15:54 AM
> Subject: Prioritizing advectives in solr search
> 
> 
> Hi,
> 
>    My question is related to search results giving less  importance to
> adjectives, 
> 
> here is my scenario, im using dismax  handler and my understanding is when I
> query "Blue hammer", solr brings me  results for "blue hammer", "blue" and
> "hammer", and in the same hierarchy,  which is understandable, is there any
> way I can manage the "blue" keyword, so  that solr searches for "blue hammer"
> and "hammer" and not any results for  "blue".
> 
> my handler is as follows...
> 
>  <requestHandler  name="standard2" class="solr.SearchHandler">
>     <!-- default  values for query parameters -->
>      <lst  name="defaults">
>      <str  name="defType">dismax</str>
>        <str  name="echoParams">explicit</str>
>          <str name="tie">0.6</str>
>          <str name="pf">name^2.3  mat_nr^0.4</str>
>     <str name="mm">0%</str> 
> 
> any suggestion on this??
> -- 
> View this message in context: 
>http://lucene.472066.n3.nabble.com/Prioritizing-advectives-in-solr-search-tp1613029p1613029.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 

Re: Prioritizing advectives in solr search

Posted by Chris Hostetter <ho...@fucit.org>.
: here is my scenario, im using dismax handler and my understanding is when I
: query "Blue hammer", solr brings me results for "blue hammer", "blue" and
: "hammer", and in the same hierarchy, which is understandable, is there any
: way I can manage the "blue" keyword, so that solr searches for "blue hammer"
: and "hammer" and not any results for "blue".

at a very simple level, you can achieve something like this by using a 
"qf" that points at fields where adjectives have been removed (ie: using 
StopFilter) and using "pf" fields where the adjectives have been left 
alone -- thus a query for "blue hammer" will match any doc containing 
"hammer" but the "pf" clause will boost documents matching the phrase 
"blue hammer" (documents matching only "blue" will not match, and 
documents matching "blue" and "hammer" farther apart then the "ps" param 
will not get the phrase boost)

But pleast note Walter's comments and consider them carefully before 
treating this as a silver bullet.

: 
: my handler is as follows...
: 
:  <requestHandler name="standard2" class="solr.SearchHandler">
:     <!-- default values for query parameters -->
:      <lst name="defaults">
: 	 <str name="defType">dismax</str>
:        <str name="echoParams">explicit</str>
: 		<str name="tie">0.6</str>
: 		<str name="pf">name^2.3 mat_nr^0.4</str>
: 	<str name="mm">0%</str> 
: 
: any suggestion on this??
: -- 
: View this message in context: http://lucene.472066.n3.nabble.com/Prioritizing-advectives-in-solr-search-tp1613029p1613029.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss