You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Luis Rodrigo Aguado <lr...@isoco.com> on 2006/09/16 00:50:41 UTC

Real world app advice

    Hi all,

    I have used Lucene so far for solving toy exaples and making 
tutorial examples, but now I am facing my first real-world high-quality 
application.

    I need to manage around 50.000 docs, ranging from a few lines to a 
couple pages. I also need to handle lemmas and synonyms, and here is 
where my main doubts arise. I have considered two options: adding the 
synonyms and lemmas to the indexes and keeping the queries simple, or 
expanding the queries with these lemmas and synonyms and keeping the 
indexes simple. Is one of the two preferrable over the other? What are 
the benefits of each of them?

    Thanks in advance!


Re: Real world app advice

Posted by Erick Erickson <er...@gmail.com>.
Of course, the answer is "it depends" <G>..... This doesn't sound like a
very big index, so the first approach I'd make is making the index
complicated and keeping the queries as simple as possible. This assumes that
you really don't care about indexing speed/size and response time for
searches is what you do care about. And indexing speed won't be a problem
with this size index IMO.

Lucene in Action has an example of synonym injection into the indexing
stream that preserves proximity queries (SpanQueries) that you really want
to look at if you haven't already <G>....

Take care that the relationship between your indexing analyzer and your
search analyzers is correct, and get a copy of luke (google luke and lucene)
so you can examine your index and see how queries behave. Again, if you
haven't already I really, really recommend that you get a copy of Luke.

I think making the indexes more complex is actually lots less work, but I
don't have any real facts to back that up, FWIW.

Best
Erick

On 9/15/06, Luis Rodrigo Aguado <lr...@isoco.com> wrote:
>
>     Hi all,
>
>     I have used Lucene so far for solving toy exaples and making
> tutorial examples, but now I am facing my first real-world high-quality
> application.
>
>     I need to manage around 50.000 docs, ranging from a few lines to a
> couple pages. I also need to handle lemmas and synonyms, and here is
> where my main doubts arise. I have considered two options: adding the
> synonyms and lemmas to the indexes and keeping the queries simple, or
> expanding the queries with these lemmas and synonyms and keeping the
> indexes simple. Is one of the two preferrable over the other? What are
> the benefits of each of them?
>
>     Thanks in advance!
>
>
>