You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sarfaraz masood <sa...@yahoo.com> on 2010/06/15 15:38:47 UTC

how to get tf-idf values in solr

I am Sarfaraz, working on a Search Engine
project which is based on Nutch & Solr. I am trying to implement a
new Search Algorithm for this engine.

Our search engine is crawling the web and storing the documents in form of large strings in the database indexed by their urls.

Now
to implement my algorithm i need tf - idf values(0 - 1) for each
document given by the crawler. but i m unable to find any method in
solr or lucene which can serve my purpose.

For my algorithm i need to maintain a relevance matrix of the following type :

eg 
        term1   term2    term3    term4...........
url1    0.7       0.8      
 0.3        0.1
url2    0.4       0.1       0.4       0.5
url3

.
.
.
and
for this purpose i need a core java method/function in solr that
returns me the tf idf values for all terms in all documents for the
available document list..

Plz help

I will highly grateful to you all

-Sarfaraz Masood


Re: how to get tf-idf values in solr

Posted by Erik Hatcher <er...@gmail.com>.
The TermVectorComponent can return tf/idf:

   <http://wiki.apache.org/solr/TermVectorComponent>


On Jun 15, 2010, at 9:38 AM, sarfaraz masood wrote:

> I am Sarfaraz, working on a Search Engine
> project which is based on Nutch & Solr. I am trying to implement a
> new Search Algorithm for this engine.
>
> Our search engine is crawling the web and storing the documents in  
> form of large strings in the database indexed by their urls.
>
> Now
> to implement my algorithm i need tf - idf values(0 - 1) for each
> document given by the crawler. but i m unable to find any method in
> solr or lucene which can serve my purpose.
>
> For my algorithm i need to maintain a relevance matrix of the  
> following type :
>
> eg
>         term1   term2    term3    term4...........
> url1    0.7       0.8
> 0.3        0.1
> url2    0.4       0.1       0.4       0.5
> url3
>
> .
> .
> .
> and
> for this purpose i need a core java method/function in solr that
> returns me the tf idf values for all terms in all documents for the
> available document list..
>
> Plz help
>
> I will highly grateful to you all
>
> -Sarfaraz Masood
>


Re: how to get tf-idf values in solr

Posted by didier deshommes <df...@gmail.com>.
Have you taken a look at Solr's TermVector component? It's probably
what you want:

http://wiki.apache.org/solr/TermVectorComponent

didier

On Tue, Jun 15, 2010 at 8:38 AM, sarfaraz masood
<sa...@yahoo.com> wrote:
> I am Sarfaraz, working on a Search Engine
> project which is based on Nutch & Solr. I am trying to implement a
> new Search Algorithm for this engine.
>
> Our search engine is crawling the web and storing the documents in form of large strings in the database indexed by their urls.
>
> Now
> to implement my algorithm i need tf - idf values(0 - 1) for each
> document given by the crawler. but i m unable to find any method in
> solr or lucene which can serve my purpose.
>
> For my algorithm i need to maintain a relevance matrix of the following type :
>
> eg
>         term1   term2    term3    term4...........
> url1    0.7       0.8
>  0.3        0.1
> url2    0.4       0.1       0.4       0.5
> url3
>
> .
> .
> .
> and
> for this purpose i need a core java method/function in solr that
> returns me the tf idf values for all terms in all documents for the
> available document list..
>
> Plz help
>
> I will highly grateful to you all
>
> -Sarfaraz Masood
>
>