You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Alex Shevchenko <ca...@gmail.com> on 2009/06/01 19:32:49 UTC

Re: Keyword Density

HI All,

Is there a way to perform filtering based on keyword density?

Thanks

-- 
Alex Shevchenko

Re: Keyword Density

Posted by Chris Hostetter <ho...@fucit.org>.

: Date: Wed, 3 Jun 2009 10:19:06 -0700 (PDT)
: From: Otis Gospodnetic
: Subject: Re: Keyword Density

: > > But I don't need to sort using this value. I need to cut results, where
: > > this value (for particular term of query!) not in some range.

: I don't think this is possible without changing Solr. Or maybe it's 
: possible with a custom Search Component that looks at all hits and 
: checks the "df" (document frequency) for a term in each document?  
: Sounds like a very costly operation...

FWIW: The best place to try and tackle something like this would probably 
be to write a new subclass of FilteredTermDocs that only returned 
docs/frequncies where the the freq was in the range you were interested 
in.  Then use your new FilteredTermDocs class in a subclass of TermQuery 
when constructing a TermScorer.  *then* use your new TermQuery subclass in 
a custom Solr QParser.

It can be done efficiently, but it definitely requires making some low 
level changes to the code.



-Hoss

Re: Keyword Density

Posted by Otis Gospodnetic <ot...@yahoo.com>.

I don't think this is possible without changing Solr.
Or maybe it's possible with a custom Search Component that looks at all hits and checks the "df" (document frequency) for a term in each document?  Sounds like a very costly operation...

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Alex Shevchenko <ca...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 3, 2009 9:57:29 AM
> Subject: Re: Keyword Density
> 
> So, is there an ability to perform filtering as I described?
> 
> On Mon, Jun 1, 2009 at 22:24, Alex Shevchenko wrote:
> 
> > But I don't need to sort using this value. I need to cut results, where
> > this value (for particular term of query!) not in some range.
> >
> >
> > On Mon, Jun 1, 2009 at 22:20, Walter Underwood wrote:
> >
> >> That is the normal relevance scoring formula in Solr and Lucene.
> >> It is a bit fancier than that, but you don't have to do anything
> >> special to get that behavior.
> >>
> >> Solr also uses the inverse document frequency (rarity) of each
> >> word for weighting.
> >>
> >> Look up "tf.idf" for more info.
> >>
> >> wunder
> >>
> >> On 6/1/09 11:46 AM, "Alex Shevchenko" wrote:
> >>
> >> > Something like that. Just not '> N times' but '
> >> > appears>/> '
> >> >
> >> > On Mon, Jun 1, 2009 at 21:00, Otis Gospodnetic
> >> > wrote:
> >> >
> >> >>
> >> >> Hi Alex,
> >> >>
> >> >> Could you please provide an example of this?  Are you looking to do
> >> >> something like "find all docs that match name:foo and where foo appears
> >> > N
> >> >> times (in the name field) in the matching document"?
> >> >>
> >> >>  Otis
> >> >> --
> >> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message ----
> >> >>> From: Alex Shevchenko 
> >> >>> To: solr-user@lucene.apache.org
> >> >>> Sent: Monday, June 1, 2009 1:32:49 PM
> >> >>> Subject: Re: Keyword Density
> >> >>>
> >> >>> HI All,
> >> >>>
> >> >>> Is there a way to perform filtering based on keyword density?
> >> >>>
> >> >>> Thanks
> >> >>>
> >> >>> --
> >> >>> Alex Shevchenko
> >>
> >>
> >>
> >
> >
> > --
> > Alex Shevchenko
> >
> 
> 
> 
> -- 
> Alex Shevchenko

Re: Keyword Density

Posted by Alex Shevchenko <ca...@gmail.com>.

So, is there an ability to perform filtering as I described?

On Mon, Jun 1, 2009 at 22:24, Alex Shevchenko <ca...@gmail.com> wrote:

> But I don't need to sort using this value. I need to cut results, where
> this value (for particular term of query!) not in some range.
>
>
> On Mon, Jun 1, 2009 at 22:20, Walter Underwood <wu...@netflix.com>wrote:
>
>> That is the normal relevance scoring formula in Solr and Lucene.
>> It is a bit fancier than that, but you don't have to do anything
>> special to get that behavior.
>>
>> Solr also uses the inverse document frequency (rarity) of each
>> word for weighting.
>>
>> Look up "tf.idf" for more info.
>>
>> wunder
>>
>> On 6/1/09 11:46 AM, "Alex Shevchenko" <ca...@gmail.com> wrote:
>>
>> > Something like that. Just not '> N times' but '<numbers of foo
>> > appears>/<total number of words> > <some value>'
>> >
>> > On Mon, Jun 1, 2009 at 21:00, Otis Gospodnetic
>> > <ot...@yahoo.com>wrote:
>> >
>> >>
>> >> Hi Alex,
>> >>
>> >> Could you please provide an example of this?  Are you looking to do
>> >> something like "find all docs that match name:foo and where foo appears
>> > N
>> >> times (in the name field) in the matching document"?
>> >>
>> >>  Otis
>> >> --
>> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >>
>> >>
>> >>
>> >> ----- Original Message ----
>> >>> From: Alex Shevchenko <ca...@gmail.com>
>> >>> To: solr-user@lucene.apache.org
>> >>> Sent: Monday, June 1, 2009 1:32:49 PM
>> >>> Subject: Re: Keyword Density
>> >>>
>> >>> HI All,
>> >>>
>> >>> Is there a way to perform filtering based on keyword density?
>> >>>
>> >>> Thanks
>> >>>
>> >>> --
>> >>> Alex Shevchenko
>>
>>
>>
>
>
> --
> Alex Shevchenko
>



-- 
Alex Shevchenko

Re: Keyword Density

Posted by Alex Shevchenko <ca...@gmail.com>.

But I don't need to sort using this value. I need to cut results, where this
value (for particular term of query!) not in some range.

On Mon, Jun 1, 2009 at 22:20, Walter Underwood <wu...@netflix.com>wrote:

> That is the normal relevance scoring formula in Solr and Lucene.
> It is a bit fancier than that, but you don't have to do anything
> special to get that behavior.
>
> Solr also uses the inverse document frequency (rarity) of each
> word for weighting.
>
> Look up "tf.idf" for more info.
>
> wunder
>
> On 6/1/09 11:46 AM, "Alex Shevchenko" <ca...@gmail.com> wrote:
>
> > Something like that. Just not '> N times' but '<numbers of foo
> > appears>/<total number of words> > <some value>'
> >
> > On Mon, Jun 1, 2009 at 21:00, Otis Gospodnetic
> > <ot...@yahoo.com>wrote:
> >
> >>
> >> Hi Alex,
> >>
> >> Could you please provide an example of this?  Are you looking to do
> >> something like "find all docs that match name:foo and where foo appears
> > N
> >> times (in the name field) in the matching document"?
> >>
> >>  Otis
> >> --
> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>
> >>
> >>
> >> ----- Original Message ----
> >>> From: Alex Shevchenko <ca...@gmail.com>
> >>> To: solr-user@lucene.apache.org
> >>> Sent: Monday, June 1, 2009 1:32:49 PM
> >>> Subject: Re: Keyword Density
> >>>
> >>> HI All,
> >>>
> >>> Is there a way to perform filtering based on keyword density?
> >>>
> >>> Thanks
> >>>
> >>> --
> >>> Alex Shevchenko
>
>
>


-- 
Alex Shevchenko

Re: Keyword Density

Posted by Walter Underwood <wu...@netflix.com>.

That is the normal relevance scoring formula in Solr and Lucene.
It is a bit fancier than that, but you don't have to do anything
special to get that behavior.

Solr also uses the inverse document frequency (rarity) of each
word for weighting.

Look up "tf.idf" for more info.

wunder

On 6/1/09 11:46 AM, "Alex Shevchenko" <ca...@gmail.com> wrote:

> Something like that. Just not '> N times' but '<numbers of foo
> appears>/<total number of words> > <some value>'
> 
> On Mon, Jun 1, 2009 at 21:00, Otis Gospodnetic
> <ot...@yahoo.com>wrote:
> 
>> 
>> Hi Alex,
>> 
>> Could you please provide an example of this?  Are you looking to do
>> something like "find all docs that match name:foo and where foo appears > N
>> times (in the name field) in the matching document"?
>> 
>>  Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> 
>> 
>> 
>> ----- Original Message ----
>>> From: Alex Shevchenko <ca...@gmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, June 1, 2009 1:32:49 PM
>>> Subject: Re: Keyword Density
>>> 
>>> HI All,
>>> 
>>> Is there a way to perform filtering based on keyword density?
>>> 
>>> Thanks
>>> 
>>> --
>>> Alex Shevchenko

Re: Keyword Density

Posted by Alex Shevchenko <ca...@gmail.com>.

Something like that. Just not '> N times' but '<numbers of foo
appears>/<total number of words> > <some value>'

On Mon, Jun 1, 2009 at 21:00, Otis Gospodnetic
<ot...@yahoo.com>wrote:

>
> Hi Alex,
>
> Could you please provide an example of this?  Are you looking to do
> something like "find all docs that match name:foo and where foo appears > N
> times (in the name field) in the matching document"?
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Alex Shevchenko <ca...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Monday, June 1, 2009 1:32:49 PM
> > Subject: Re: Keyword Density
> >
> > HI All,
> >
> > Is there a way to perform filtering based on keyword density?
> >
> > Thanks
> >
> > --
> > Alex Shevchenko
>
>


-- 
Alex Shevchenko

Re: Keyword Density

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi Alex,

Could you please provide an example of this?  Are you looking to do something like "find all docs that match name:foo and where foo appears > N times (in the name field) in the matching document"?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Alex Shevchenko <ca...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, June 1, 2009 1:32:49 PM
> Subject: Re: Keyword Density
> 
> HI All,
> 
> Is there a way to perform filtering based on keyword density?
> 
> Thanks
> 
> -- 
> Alex Shevchenko