You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mek <me...@gmail.com> on 2006/08/01 09:52:48 UTC

Does lucene performance suffer with a lot of empty fields ?

I have 1 generic index, but am Indexing a lot of different things, like
actors, politicians, scientists, sportsmen.

And as you can see that though there are some common fields, like name &
DOB, there are also fields for each of these types of people that are
different.
e.g. Actors will have "Movies, TV shows, ", politicians will have "Political
party...", scientists will have "publications, inventions ..."

Also, I do not want to create multiple indexes, as the number of such types
& hence the number of indices can get out of hand, eg I could decide to add
"footballers", "tennis players".

I am sure I am not the 1st who's facing this problem.

>From what I gather, I can go ahead & create an Index & for each Document &
only add the relevant fields. Is this correct?
I should still be able to search with queries like "mel Movies:braveheart".
Right ?

Would this impact the search performance ?
Any other words of caution for me ?

Thanks,
 mek

Re: Does lucene performance suffer with a lot of empty fields ?

Posted by Erick Erickson <er...@gmail.com>.
I can't speak to performance, but there's no problem having different fields
for different documents. Stated differently, you don't need to have all
fields in all documents. It took me a while to get my head out of database
tables and accept this <G>....

I doubt there's a problem with speed, but as always some measurements over
your particular data count most.....

Erick

On 8/1/06, Mek <me...@gmail.com> wrote:
>
> I have 1 generic index, but am Indexing a lot of different things, like
> actors, politicians, scientists, sportsmen.
>
> And as you can see that though there are some common fields, like name &
> DOB, there are also fields for each of these types of people that are
> different.
> e.g. Actors will have "Movies, TV shows, ", politicians will have
> "Political
> party...", scientists will have "publications, inventions ..."
>
> Also, I do not want to create multiple indexes, as the number of such
> types
> & hence the number of indices can get out of hand, eg I could decide to
> add
> "footballers", "tennis players".
>
> I am sure I am not the 1st who's facing this problem.
>
> From what I gather, I can go ahead & create an Index & for each Document &
> only add the relevant fields. Is this correct?
> I should still be able to search with queries like "mel
> Movies:braveheart".
> Right ?
>
> Would this impact the search performance ?
> Any other words of caution for me ?
>
> Thanks,
> mek
>
>

Re: Does lucene performance suffer with a lot of empty fields ?

Posted by Chris Hostetter <ho...@fucit.org>.
: >From what I gather, I can go ahead & create an Index & for each Document &
: only add the relevant fields. Is this correct?
: I should still be able to search with queries like "mel Movies:braveheart".
: Right ?
:
: Would this impact the search performance ?
: Any other words of caution for me ?

it will absolutely work -- the one performance issue you may want to
consider is that by default a "fieldNorm" is computed for every document
and every field, and these are kept in memory -- there is a way to turn
them off on a per field basis (you have to turn them off for every doc, if
even one doc wants a norm for field X, then every doc gets a norm for
field X)

how to "omitNorms" for a field, and what the pros (save space) and cons
(no "lengthNorm" or "field boosts") are has been discussed extensively in
the past.  search the archives for anything i've put in quotes and you'll
find lots of info on this.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org