You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Alex Steward <al...@yahoo.com> on 2009/05/19 16:12:27 UTC

lucene source code changes

Hello,

�I have a need to implement an custom inverted index in Lucene.
I
have files like the ones I have attached here. The Files have words and
and scores assigned to that word. There will 100's of such files. Each
file will have atleast 50000 such name value pairs.

Note: Currently the file only shows 10s of such name value pairs. But
My real production data will have 50000 plus name value pairs in file.

Currently
I index the data�using Lucene's Inverted Index. The query that is being
execute against the Index has 100 Words. When the query is excuted
against the index the result is returned in 100 milli seconds or so.

Problem: Once i have the results of the query, I have to go
through each file (for ex. attached file one). Then for each word in
the user input query, I have to compute the total score. Doing this
against 100's of files and 100's of keywords is causing the score
computation to be slow i.e. about 3-5seconds.

I need help resolving the above problem so that score computation takes less than 200Milli Seconds or so.
One Resolution I was thinking is modifying the Lucene Source Code
for creating inverted index. In this index we store the score in the
index itself. When the results of the query are returned, we will get
the scores along with the file names, there by eleminating the need to
search the file for the keyword and corresponding score. I need to
compute the total of all scores that belong to one single file.

I am also open to any other ideas that you may have. Any suggestions regarding this will be very helpful.

Thanks,
Abhilasha

Re: lucene source code changes

Posted by Grant Ingersoll <gs...@apache.org>.

You might have a look at the org.apache.lucene.search.function package  
(aka Function Queries) and what they have to offer.  Basically, they  
can be used to incorporate field values into the score.

-Grant

On May 19, 2009, at 10:12 AM, Alex Steward wrote:

> Hello,
>
>  I have a need to implement an custom inverted index in Lucene.
> I have files like the ones I have attached here. The Files have  
> words and and scores assigned to that word. There will 100's of such  
> files. Each file will have atleast 50000 such name value pairs.
> Note: Currently the file only shows 10s of such name value pairs.  
> But My real production data will have 50000 plus name value pairs in  
> file.
>
> Currently I index the data using Lucene's Inverted Index. The query  
> that is being execute against the Index has 100 Words. When the  
> query is excuted against the index the result is returned in 100  
> milli seconds or so.
>
> Problem: Once i have the results of the query, I have to go through  
> each file (for ex. attached file one). Then for each word in the  
> user input query, I have to compute the total score. Doing this  
> against 100's of files and 100's of keywords is causing the score  
> computation to be slow i.e. about 3-5seconds.
>
> I need help resolving the above problem so that score computation  
> takes less than 200Milli Seconds or so.
>
> One Resolution I was thinking is modifying the Lucene Source Code  
> for creating inverted index. In this index we store the score in the  
> index itself. When the results of the query are returned, we will  
> get the scores along with the file names, there by eleminating the  
> need to search the file for the keyword and corresponding score. I  
> need to compute the total of all scores that belong to one single  
> file.
>
> I am also open to any other ideas that you may have. Any suggestions  
> regarding this will be very helpful.
>
> Thanks,
> Abhilasha
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search