You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Miller <da...@gmail.com> on 2014/02/07 00:58:21 UTC

Tf-Idf for a specific query

Hi Guys..

I require to obtain Tf-idf score from Solr for a certain set of documents.
But the catch is that, I needs the IDF (or DF) to be calculated on the
documents returned by the specific query and not the entire corpus.

Please provide me some hint on whether Solr has this feature or if I can
use the Lucene Api directly to achieve this.


Thanks in advance,
Dave

Re: Tf-Idf for a specific query

Posted by David Miller <da...@gmail.com>.
Hi Erick,

Slower queries for getting facets can be tolerated, as long as they don't
affect those without facets. The requirement is for a separate query which
can get me both term vector and facet counts.

One issue I am facing is that, for a search query I only want the term
vectors and facet counts, but not the results/docs. If I set the rows=0,
then term vectors are not returned. Could you suggest some way to achieve
the above.

Also it will be helpful to get a way to get aggregate TF of a term (across
all docs in the query).

Regards,
David






On Sat, Feb 8, 2014 at 10:49 AM, Erick Erickson <er...@gmail.com>wrote:

> David:
>
> If you're, say, faceting on fields with lots of unique values, this
> will be quite expensive.
> No idea whether you can tolerate slower queries or not, just sayin'....
>
> Erick
>
> On Fri, Feb 7, 2014 at 5:35 PM, David Miller <da...@gmail.com>
> wrote:
> > Thanks Mikhai,
> >
> > It seems that, this was what I was looking for. Being new to this, I
> wasn't
> > aware of such a use of facets.
> >
> > Now I can probably combine the term vectors and facets to fit my
> scenario.
> >
> > Regards,
> > Dave
> >
> >
> > On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com
> >> wrote:
> >
> >> David,
> >>
> >> I can imagine that "DF for resultset" is facets!
> >>
> >>
> >> On Fri, Feb 7, 2014 at 11:26 PM, David Miller <davthehacker@gmail.com
> >> >wrote:
> >>
> >> > Hi Mikhail,
> >> >
> >> > The DF seems to be based on the entire document set. What I require is
> >> > based on a the results of a single query.
> >> >
> >> > Suppose my Solr query returns a set of 50K documents from a superset
> of
> >> > 10Million documents, I require to calculate the DF just based on the
> 50K
> >> > documents. But currently it seems to be calculated on the entire doc
> set.
> >> >
> >> > So, is there any way to get the DF or IDF just on basis of the docs
> >> > returned by the query?
> >> >
> >> > Regards,
> >> > Dave
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev <
> >> > mkhludnev@griddynamics.com
> >> > > wrote:
> >> >
> >> > > Hello Dave
> >> > > you can get DF from http://wiki.apache.org/solr/TermsComponent(invert
> >> > it
> >> > > yourself)
> >> > > then, for certain term you can get number of occurrences per
> document
> >> by
> >> > > http://wiki.apache.org/solr/FunctionQuery#tf
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Feb 7, 2014 at 3:58 AM, David Miller <
> davthehacker@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi Guys..
> >> > > >
> >> > > > I require to obtain Tf-idf score from Solr for a certain set of
> >> > > documents.
> >> > > > But the catch is that, I needs the IDF (or DF) to be calculated on
> >> the
> >> > > > documents returned by the specific query and not the entire
> corpus.
> >> > > >
> >> > > > Please provide me some hint on whether Solr has this feature or
> if I
> >> > can
> >> > > > use the Lucene Api directly to achieve this.
> >> > > >
> >> > > >
> >> > > > Thanks in advance,
> >> > > > Dave
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Sincerely yours
> >> > > Mikhail Khludnev
> >> > > Principal Engineer,
> >> > > Grid Dynamics
> >> > >
> >> > > <http://www.griddynamics.com>
> >> > >  <mk...@griddynamics.com>
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >> <http://www.griddynamics.com>
> >>  <mk...@griddynamics.com>
> >>
>

Re: Tf-Idf for a specific query

Posted by Erick Erickson <er...@gmail.com>.
David:

If you're, say, faceting on fields with lots of unique values, this
will be quite expensive.
No idea whether you can tolerate slower queries or not, just sayin'....

Erick

On Fri, Feb 7, 2014 at 5:35 PM, David Miller <da...@gmail.com> wrote:
> Thanks Mikhai,
>
> It seems that, this was what I was looking for. Being new to this, I wasn't
> aware of such a use of facets.
>
> Now I can probably combine the term vectors and facets to fit my scenario.
>
> Regards,
> Dave
>
>
> On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev <mkhludnev@griddynamics.com
>> wrote:
>
>> David,
>>
>> I can imagine that "DF for resultset" is facets!
>>
>>
>> On Fri, Feb 7, 2014 at 11:26 PM, David Miller <davthehacker@gmail.com
>> >wrote:
>>
>> > Hi Mikhail,
>> >
>> > The DF seems to be based on the entire document set. What I require is
>> > based on a the results of a single query.
>> >
>> > Suppose my Solr query returns a set of 50K documents from a superset of
>> > 10Million documents, I require to calculate the DF just based on the 50K
>> > documents. But currently it seems to be calculated on the entire doc set.
>> >
>> > So, is there any way to get the DF or IDF just on basis of the docs
>> > returned by the query?
>> >
>> > Regards,
>> > Dave
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev <
>> > mkhludnev@griddynamics.com
>> > > wrote:
>> >
>> > > Hello Dave
>> > > you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
>> > it
>> > > yourself)
>> > > then, for certain term you can get number of occurrences per document
>> by
>> > > http://wiki.apache.org/solr/FunctionQuery#tf
>> > >
>> > >
>> > >
>> > > On Fri, Feb 7, 2014 at 3:58 AM, David Miller <da...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Guys..
>> > > >
>> > > > I require to obtain Tf-idf score from Solr for a certain set of
>> > > documents.
>> > > > But the catch is that, I needs the IDF (or DF) to be calculated on
>> the
>> > > > documents returned by the specific query and not the entire corpus.
>> > > >
>> > > > Please provide me some hint on whether Solr has this feature or if I
>> > can
>> > > > use the Lucene Api directly to achieve this.
>> > > >
>> > > >
>> > > > Thanks in advance,
>> > > > Dave
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours
>> > > Mikhail Khludnev
>> > > Principal Engineer,
>> > > Grid Dynamics
>> > >
>> > > <http://www.griddynamics.com>
>> > >  <mk...@griddynamics.com>
>> > >
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  <mk...@griddynamics.com>
>>

Re: Tf-Idf for a specific query

Posted by David Miller <da...@gmail.com>.
Thanks Mikhai,

It seems that, this was what I was looking for. Being new to this, I wasn't
aware of such a use of facets.

Now I can probably combine the term vectors and facets to fit my scenario.

Regards,
Dave


On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev <mkhludnev@griddynamics.com
> wrote:

> David,
>
> I can imagine that "DF for resultset" is facets!
>
>
> On Fri, Feb 7, 2014 at 11:26 PM, David Miller <davthehacker@gmail.com
> >wrote:
>
> > Hi Mikhail,
> >
> > The DF seems to be based on the entire document set. What I require is
> > based on a the results of a single query.
> >
> > Suppose my Solr query returns a set of 50K documents from a superset of
> > 10Million documents, I require to calculate the DF just based on the 50K
> > documents. But currently it seems to be calculated on the entire doc set.
> >
> > So, is there any way to get the DF or IDF just on basis of the docs
> > returned by the query?
> >
> > Regards,
> > Dave
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev <
> > mkhludnev@griddynamics.com
> > > wrote:
> >
> > > Hello Dave
> > > you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
> > it
> > > yourself)
> > > then, for certain term you can get number of occurrences per document
> by
> > > http://wiki.apache.org/solr/FunctionQuery#tf
> > >
> > >
> > >
> > > On Fri, Feb 7, 2014 at 3:58 AM, David Miller <da...@gmail.com>
> > > wrote:
> > >
> > > > Hi Guys..
> > > >
> > > > I require to obtain Tf-idf score from Solr for a certain set of
> > > documents.
> > > > But the catch is that, I needs the IDF (or DF) to be calculated on
> the
> > > > documents returned by the specific query and not the entire corpus.
> > > >
> > > > Please provide me some hint on whether Solr has this feature or if I
> > can
> > > > use the Lucene Api directly to achieve this.
> > > >
> > > >
> > > > Thanks in advance,
> > > > Dave
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > >  <mk...@griddynamics.com>
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>

Re: Tf-Idf for a specific query

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
David,

I can imagine that "DF for resultset" is facets!


On Fri, Feb 7, 2014 at 11:26 PM, David Miller <da...@gmail.com>wrote:

> Hi Mikhail,
>
> The DF seems to be based on the entire document set. What I require is
> based on a the results of a single query.
>
> Suppose my Solr query returns a set of 50K documents from a superset of
> 10Million documents, I require to calculate the DF just based on the 50K
> documents. But currently it seems to be calculated on the entire doc set.
>
> So, is there any way to get the DF or IDF just on basis of the docs
> returned by the query?
>
> Regards,
> Dave
>
>
>
>
>
>
>
> On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com
> > wrote:
>
> > Hello Dave
> > you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
> it
> > yourself)
> > then, for certain term you can get number of occurrences per document by
> > http://wiki.apache.org/solr/FunctionQuery#tf
> >
> >
> >
> > On Fri, Feb 7, 2014 at 3:58 AM, David Miller <da...@gmail.com>
> > wrote:
> >
> > > Hi Guys..
> > >
> > > I require to obtain Tf-idf score from Solr for a certain set of
> > documents.
> > > But the catch is that, I needs the IDF (or DF) to be calculated on the
> > > documents returned by the specific query and not the entire corpus.
> > >
> > > Please provide me some hint on whether Solr has this feature or if I
> can
> > > use the Lucene Api directly to achieve this.
> > >
> > >
> > > Thanks in advance,
> > > Dave
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Tf-Idf for a specific query

Posted by David Miller <da...@gmail.com>.
Hi Mikhail,

The DF seems to be based on the entire document set. What I require is
based on a the results of a single query.

Suppose my Solr query returns a set of 50K documents from a superset of
10Million documents, I require to calculate the DF just based on the 50K
documents. But currently it seems to be calculated on the entire doc set.

So, is there any way to get the DF or IDF just on basis of the docs
returned by the query?

Regards,
Dave







On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev <mkhludnev@griddynamics.com
> wrote:

> Hello Dave
> you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it
> yourself)
> then, for certain term you can get number of occurrences per document by
> http://wiki.apache.org/solr/FunctionQuery#tf
>
>
>
> On Fri, Feb 7, 2014 at 3:58 AM, David Miller <da...@gmail.com>
> wrote:
>
> > Hi Guys..
> >
> > I require to obtain Tf-idf score from Solr for a certain set of
> documents.
> > But the catch is that, I needs the IDF (or DF) to be calculated on the
> > documents returned by the specific query and not the entire corpus.
> >
> > Please provide me some hint on whether Solr has this feature or if I can
> > use the Lucene Api directly to achieve this.
> >
> >
> > Thanks in advance,
> > Dave
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>

Re: Tf-Idf for a specific query

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Dave
you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it
yourself)
then, for certain term you can get number of occurrences per document by
http://wiki.apache.org/solr/FunctionQuery#tf



On Fri, Feb 7, 2014 at 3:58 AM, David Miller <da...@gmail.com> wrote:

> Hi Guys..
>
> I require to obtain Tf-idf score from Solr for a certain set of documents.
> But the catch is that, I needs the IDF (or DF) to be calculated on the
> documents returned by the specific query and not the entire corpus.
>
> Please provide me some hint on whether Solr has this feature or if I can
> use the Lucene Api directly to achieve this.
>
>
> Thanks in advance,
> Dave
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>