You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ivan Sekulovic <se...@net.yu> on 2004/08/26 13:03:45 UTC

Is Lucene right for me ?

Hello!

I am currently choosing technology for web crawler and search engine 
that will index between 1 and 10 million of documents (with storing 
documents). For some parts of the project I'll most likely choose 
existing software, for some I'll have to right new code, but at the end 
it should be pure java solution.

I am considering Lucene as solutions for text indexing and searching and 
I have few questions about Lucene for which I was not able to find 
answers in FAQs, Articles etc.

Is Lucene suitable for ~10 million documents?

Is it possible to have boosts factor per document ? The thing is that I 
need to have something like sort order of documents in relevance, but 
relevance cannot been calculated only from that document, because there 
are some external factors as well (e.g. Google PageRank algorithm). I 
think that I can calculate all this factors in one factor that can been 
stored in index, but can I use it to boost relevance of some documents ?
I guess it is possible, but would it require for some parts of Lucene to 
be rewritten to enable this ? Or should I just fetch documents from 
Lucene and then sort them outside?




Best Regards,
Sekula


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Is Lucene right for me ?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Ivane,

Yes, you can use Lucene for this.  10 mil. documents is not much, if
you use adequate hardware.  You can use boost individual documents
(check the javadocs for Document and Field classes).

Are you aware of Nutch, though?  It sounds like you are not, and Nutch
is probably the best tool for the job (and it uses Lucene under the
hood). www.nutch.org

Otis

--- Ivan Sekulovic <se...@net.yu> wrote:

> Hello!
> 
> I am currently choosing technology for web crawler and search engine 
> that will index between 1 and 10 million of documents (with storing 
> documents). For some parts of the project I'll most likely choose 
> existing software, for some I'll have to right new code, but at the
> end 
> it should be pure java solution.
> 
> I am considering Lucene as solutions for text indexing and searching
> and 
> I have few questions about Lucene for which I was not able to find 
> answers in FAQs, Articles etc.
> 
> Is Lucene suitable for ~10 million documents?
> 
> Is it possible to have boosts factor per document ? The thing is that
> I 
> need to have something like sort order of documents in relevance, but
> 
> relevance cannot been calculated only from that document, because
> there 
> are some external factors as well (e.g. Google PageRank algorithm). I
> 
> think that I can calculate all this factors in one factor that can
> been 
> stored in index, but can I use it to boost relevance of some
> documents ?
> I guess it is possible, but would it require for some parts of Lucene
> to 
> be rewritten to enable this ? Or should I just fetch documents from 
> Lucene and then sort them outside?
> 
> 
> 
> 
> Best Regards,
> Sekula
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org