You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ivan Sekulovic <se...@net.yu> on 2004/08/26 13:03:45 UTC
Is Lucene right for me ?
Hello!
I am currently choosing technology for web crawler and search engine
that will index between 1 and 10 million of documents (with storing
documents). For some parts of the project I'll most likely choose
existing software, for some I'll have to right new code, but at the end
it should be pure java solution.
I am considering Lucene as solutions for text indexing and searching and
I have few questions about Lucene for which I was not able to find
answers in FAQs, Articles etc.
Is Lucene suitable for ~10 million documents?
Is it possible to have boosts factor per document ? The thing is that I
need to have something like sort order of documents in relevance, but
relevance cannot been calculated only from that document, because there
are some external factors as well (e.g. Google PageRank algorithm). I
think that I can calculate all this factors in one factor that can been
stored in index, but can I use it to boost relevance of some documents ?
I guess it is possible, but would it require for some parts of Lucene to
be rewritten to enable this ? Or should I just fetch documents from
Lucene and then sort them outside?
Best Regards,
Sekula
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Is Lucene right for me ?
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Ivane,
Yes, you can use Lucene for this. 10 mil. documents is not much, if
you use adequate hardware. You can use boost individual documents
(check the javadocs for Document and Field classes).
Are you aware of Nutch, though? It sounds like you are not, and Nutch
is probably the best tool for the job (and it uses Lucene under the
hood). www.nutch.org
Otis
--- Ivan Sekulovic <se...@net.yu> wrote:
> Hello!
>
> I am currently choosing technology for web crawler and search engine
> that will index between 1 and 10 million of documents (with storing
> documents). For some parts of the project I'll most likely choose
> existing software, for some I'll have to right new code, but at the
> end
> it should be pure java solution.
>
> I am considering Lucene as solutions for text indexing and searching
> and
> I have few questions about Lucene for which I was not able to find
> answers in FAQs, Articles etc.
>
> Is Lucene suitable for ~10 million documents?
>
> Is it possible to have boosts factor per document ? The thing is that
> I
> need to have something like sort order of documents in relevance, but
>
> relevance cannot been calculated only from that document, because
> there
> are some external factors as well (e.g. Google PageRank algorithm). I
>
> think that I can calculate all this factors in one factor that can
> been
> stored in index, but can I use it to boost relevance of some
> documents ?
> I guess it is possible, but would it require for some parts of Lucene
> to
> be rewritten to enable this ? Or should I just fetch documents from
> Lucene and then sort them outside?
>
>
>
>
> Best Regards,
> Sekula
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org