You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ron Mayer <rm...@0ape.com> on 2010/10/11 02:50:25 UTC

Re: Prioritizing adjectives in solr search

Walter Underwood wrote:
> I think this is a bad idea. The tf.idf algorithm will already put a higher weight on "hammers" than on "blue", because "hammers" will be more rare than "blue". Plus, you are making huge assumptions about the queries. In a search for "Canon camera", "Canon" is an adjective, but it is the important part of the query.
> 
> Have you looked at your query logs and which queries are successful and which are not?
> 
> Don't make radical changes like this unless you can justify them from the logs.

The one radical change I'd like in the area of adjectives in noun clauses is if
more weight were put when the adjectives apply to the appropriate noun.

For example, a search for:
   'red baseball cap black leather jacket'
should find a doc with "the guy wore a red cap, blue jeans, and a leather jacket"
before one that says "the guy wore a black cap, leather pants, and a red jacket".


The closest I've come at doing this was to use a variety of "phrase slop"
boosts simultaneously - so that "red [any_few_words] cap" "baseball cap"
"leather jacket", "black [any_few_words] jacket" all add boosts to the score.







> 
> wunder
> 
> On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:
> 
>> Hi,
>>
>> If you want "blue" to be used in search, then you should not treat it as a 
>> stopword.
>>
>> Re payloads: http://search-lucene.com/?q=payload+score
>> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even better, look at 
>> hit #1)
>>
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> ----- Original Message ----
>>> From: Hasnain <ha...@hotmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Mon, October 4, 2010 9:50:46 AM
>>> Subject: Re: Prioritizing advectives in solr search
>>>
>>>
>>> Hi Otis,
>>>
>>>         Thank you for replying,  unfortunately Im unable to fully grasp what
>>> you are trying to say, can you  please elaborate what is payload with
>>> adjective terms?
>>>
>>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs, now when
>>> I search for  "Blue hammers", solr searches for "blue hammers" and "hammers"
>>> but not  "blue", but the problem here is user can also search for just
>>> "Blue", then it  wont search for anything...
>>>
>>> any suggestions on this?? 
>>>
>>> -- 
>>> View  this message in context: 
>>> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
>>>
>>> Sent  from the Solr - User mailing list archive at Nabble.com.
>>>
> 
> 
> 
> 


Re: Prioritizing adjectives in solr search

Posted by Erick Erickson <er...@gmail.com>.
Spans do care about the order of words, so that might help....

Erick

On Tue, Oct 12, 2010 at 11:23 PM, Ron Mayer <rm...@0ape.com> wrote:

> Erick Erickson wrote:
> > You can do some interesting things with payloads. You could index a
> > particular value as the payload that identified the "kind" of word it
> was,
> > where "kind" is something you define. Then at query time, you could
> > boost depending on what part kind of word you identified it as in both
> > the query and at indexing time.
> >
> > But I can't even imagine how one would go about supporting this in a
> > general search engine. This kind of thing seems far too domain
> > specific.....
>
> Well, the "pf2" and "pf3" parameters in edismax come pretty close.
>
> For example, for the search query "red baseball cap black leather jacket",
> a "pf2" with no "phrase slop", combined with a "pf2" with a "phrase slop of
> 3"
> will do a pretty good job at finding "red caps" and "black jackets"
> and "baseball caps" and "leather jackets" before it'll find
> "red baseball jackets" and "leather caps".
>
> All it depended on is the convention that in english someone'll probably
> put adjectives before nouns in both the query and the document's text.
>
> The one annoyance is that I think the phrase slop doesn't care much
> about the order of words......
>
>
>
> > On Sun, Oct 10, 2010 at 8:50 PM, Ron Mayer <rm...@0ape.com> wrote:
> >
> >> Walter Underwood wrote:
> >>> I think this is a bad idea. The tf.idf algorithm will already put a
> >> higher weight on "hammers" than on "blue", because "hammers" will be
> more
> >> rare than "blue". Plus, you are making huge assumptions about the
> queries.
> >> In a search for "Canon camera", "Canon" is an adjective, but it is the
> >> important part of the query.
> >>> Have you looked at your query logs and which queries are successful and
> >> which are not?
> >>> Don't make radical changes like this unless you can justify them from
> the
> >> logs.
> >>
> >> The one radical change I'd like in the area of adjectives in noun
> clauses
> >> is if
> >> more weight were put when the adjectives apply to the appropriate noun.
> >>
> >> For example, a search for:
> >>   'red baseball cap black leather jacket'
> >> should find a doc with "the guy wore a red cap, blue jeans, and a
> leather
> >> jacket"
> >> before one that says "the guy wore a black cap, leather pants, and a red
> >> jacket".
> >>
> >>
> >> The closest I've come at doing this was to use a variety of "phrase
> slop"
> >> boosts simultaneously - so that "red [any_few_words] cap" "baseball cap"
> >> "leather jacket", "black [any_few_words] jacket" all add boosts to the
> >> score.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>> wunder
> >>>
> >>> On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> If you want "blue" to be used in search, then you should not treat it
> as
> >> a
> >>>> stopword.
> >>>>
> >>>> Re payloads: http://search-lucene.com/?q=payload+score
> >>>> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even
> >> better, look at
> >>>> hit #1)
> >>>>
> >>>> Otis
> >>>> ----
> >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>>> Lucene ecosystem search :: http://search-lucene.com/
> >>>>
> >>>>
> >>>>
> >>>> ----- Original Message ----
> >>>>> From: Hasnain <ha...@hotmail.com>
> >>>>> To: solr-user@lucene.apache.org
> >>>>> Sent: Mon, October 4, 2010 9:50:46 AM
> >>>>> Subject: Re: Prioritizing advectives in solr search
> >>>>>
> >>>>>
> >>>>> Hi Otis,
> >>>>>
> >>>>>         Thank you for replying,  unfortunately Im unable to fully
> grasp
> >> what
> >>>>> you are trying to say, can you  please elaborate what is payload with
> >>>>> adjective terms?
> >>>>>
> >>>>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs,
> now
> >> when
> >>>>> I search for  "Blue hammers", solr searches for "blue hammers" and
> >> "hammers"
> >>>>> but not  "blue", but the problem here is user can also search for
> just
> >>>>> "Blue", then it  wont search for anything...
> >>>>>
> >>>>> any suggestions on this??
> >>>>>
> >>>>> --
> >>>>> View  this message in context:
> >>>>>
> >>
> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
> >>>>> Sent  from the Solr - User mailing list archive at Nabble.com.
> >>>>>
> >>>
> >>>
> >>>
> >>
> >
>
>

Re: Prioritizing adjectives in solr search

Posted by Ron Mayer <rm...@0ape.com>.
Erick Erickson wrote:
> You can do some interesting things with payloads. You could index a
> particular value as the payload that identified the "kind" of word it was,
> where "kind" is something you define. Then at query time, you could
> boost depending on what part kind of word you identified it as in both
> the query and at indexing time.
> 
> But I can't even imagine how one would go about supporting this in a
> general search engine. This kind of thing seems far too domain
> specific.....

Well, the "pf2" and "pf3" parameters in edismax come pretty close.

For example, for the search query "red baseball cap black leather jacket",
a "pf2" with no "phrase slop", combined with a "pf2" with a "phrase slop of 3"
will do a pretty good job at finding "red caps" and "black jackets"
and "baseball caps" and "leather jackets" before it'll find
"red baseball jackets" and "leather caps".

All it depended on is the convention that in english someone'll probably
put adjectives before nouns in both the query and the document's text.

The one annoyance is that I think the phrase slop doesn't care much
about the order of words......



> On Sun, Oct 10, 2010 at 8:50 PM, Ron Mayer <rm...@0ape.com> wrote:
> 
>> Walter Underwood wrote:
>>> I think this is a bad idea. The tf.idf algorithm will already put a
>> higher weight on "hammers" than on "blue", because "hammers" will be more
>> rare than "blue". Plus, you are making huge assumptions about the queries.
>> In a search for "Canon camera", "Canon" is an adjective, but it is the
>> important part of the query.
>>> Have you looked at your query logs and which queries are successful and
>> which are not?
>>> Don't make radical changes like this unless you can justify them from the
>> logs.
>>
>> The one radical change I'd like in the area of adjectives in noun clauses
>> is if
>> more weight were put when the adjectives apply to the appropriate noun.
>>
>> For example, a search for:
>>   'red baseball cap black leather jacket'
>> should find a doc with "the guy wore a red cap, blue jeans, and a leather
>> jacket"
>> before one that says "the guy wore a black cap, leather pants, and a red
>> jacket".
>>
>>
>> The closest I've come at doing this was to use a variety of "phrase slop"
>> boosts simultaneously - so that "red [any_few_words] cap" "baseball cap"
>> "leather jacket", "black [any_few_words] jacket" all add boosts to the
>> score.
>>
>>
>>
>>
>>
>>
>>
>>> wunder
>>>
>>> On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:
>>>
>>>> Hi,
>>>>
>>>> If you want "blue" to be used in search, then you should not treat it as
>> a
>>>> stopword.
>>>>
>>>> Re payloads: http://search-lucene.com/?q=payload+score
>>>> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even
>> better, look at
>>>> hit #1)
>>>>
>>>> Otis
>>>> ----
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>>> From: Hasnain <ha...@hotmail.com>
>>>>> To: solr-user@lucene.apache.org
>>>>> Sent: Mon, October 4, 2010 9:50:46 AM
>>>>> Subject: Re: Prioritizing advectives in solr search
>>>>>
>>>>>
>>>>> Hi Otis,
>>>>>
>>>>>         Thank you for replying,  unfortunately Im unable to fully grasp
>> what
>>>>> you are trying to say, can you  please elaborate what is payload with
>>>>> adjective terms?
>>>>>
>>>>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs, now
>> when
>>>>> I search for  "Blue hammers", solr searches for "blue hammers" and
>> "hammers"
>>>>> but not  "blue", but the problem here is user can also search for just
>>>>> "Blue", then it  wont search for anything...
>>>>>
>>>>> any suggestions on this??
>>>>>
>>>>> --
>>>>> View  this message in context:
>>>>>
>> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
>>>>> Sent  from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>
>>>
>>>
>>
> 


Re: Prioritizing adjectives in solr search

Posted by Erick Erickson <er...@gmail.com>.
You can do some interesting things with payloads. You could index a
particular value as the payload that identified the "kind" of word it was,
where "kind" is something you define. Then at query time, you could
boost depending on what part kind of word you identified it as in both
the query and at indexing time.

But I can't even imagine how one would go about supporting this in a
general search engine. This kind of thing seems far too domain
specific.....

Best
Erick


On Sun, Oct 10, 2010 at 8:50 PM, Ron Mayer <rm...@0ape.com> wrote:

> Walter Underwood wrote:
> > I think this is a bad idea. The tf.idf algorithm will already put a
> higher weight on "hammers" than on "blue", because "hammers" will be more
> rare than "blue". Plus, you are making huge assumptions about the queries.
> In a search for "Canon camera", "Canon" is an adjective, but it is the
> important part of the query.
> >
> > Have you looked at your query logs and which queries are successful and
> which are not?
> >
> > Don't make radical changes like this unless you can justify them from the
> logs.
>
> The one radical change I'd like in the area of adjectives in noun clauses
> is if
> more weight were put when the adjectives apply to the appropriate noun.
>
> For example, a search for:
>   'red baseball cap black leather jacket'
> should find a doc with "the guy wore a red cap, blue jeans, and a leather
> jacket"
> before one that says "the guy wore a black cap, leather pants, and a red
> jacket".
>
>
> The closest I've come at doing this was to use a variety of "phrase slop"
> boosts simultaneously - so that "red [any_few_words] cap" "baseball cap"
> "leather jacket", "black [any_few_words] jacket" all add boosts to the
> score.
>
>
>
>
>
>
>
> >
> > wunder
> >
> > On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote:
> >
> >> Hi,
> >>
> >> If you want "blue" to be used in search, then you should not treat it as
> a
> >> stopword.
> >>
> >> Re payloads: http://search-lucene.com/?q=payload+score
> >> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even
> better, look at
> >> hit #1)
> >>
> >> Otis
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >> Lucene ecosystem search :: http://search-lucene.com/
> >>
> >>
> >>
> >> ----- Original Message ----
> >>> From: Hasnain <ha...@hotmail.com>
> >>> To: solr-user@lucene.apache.org
> >>> Sent: Mon, October 4, 2010 9:50:46 AM
> >>> Subject: Re: Prioritizing advectives in solr search
> >>>
> >>>
> >>> Hi Otis,
> >>>
> >>>         Thank you for replying,  unfortunately Im unable to fully grasp
> what
> >>> you are trying to say, can you  please elaborate what is payload with
> >>> adjective terms?
> >>>
> >>> also Im using  stopwords.txt to stop adjectives, adverbs and verbs, now
> when
> >>> I search for  "Blue hammers", solr searches for "blue hammers" and
> "hammers"
> >>> but not  "blue", but the problem here is user can also search for just
> >>> "Blue", then it  wont search for anything...
> >>>
> >>> any suggestions on this??
> >>>
> >>> --
> >>> View  this message in context:
> >>>
> http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html
> >>>
> >>> Sent  from the Solr - User mailing list archive at Nabble.com.
> >>>
> >
> >
> >
> >
>
>