You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nadav Har'El <ny...@math.technion.ac.il> on 2007/03/01 09:28:07 UTC

Re: indexing performance

On Tue, Feb 27, 2007, Saravana wrote about "indexing performance":
> Hi,
> 
> Is it possible to scale lucene indexing like 2000/3000 documents per
> second?

I don't know about the actual numbers, but one trick I've used in the past
to get really fast indexing was to create several independent indexes in
parallel. Simply, if you have, say, 4 CPUs and perhaps even several physical
disks, run 4 indexing processes each indexing a 1/4 of the files and creating
a separate index (on separate disks on separate IO channels, if possible).

At the end, you have 4 indexes which you can actually search together without
any real need to merge them, unless query performance is very important to
you as well.

> I need to index 10 fields each with 20 bytes long.  I should be
> able to search by just giving any of the field values as criteria. I need to
> get the count that has same field values.

You need just the counts? And you want to do just whole-field matching, not
word matching? In that case, Lucene might be an overkill for you. Or, if you
do use Lucene, make sure to use "keyword" (untokenized) fields, not
"tokenized" fields.

-- 
Nadav Har'El                        |      Thursday, Mar  1 2007, 11 Adar 5767
IBM Haifa Research Lab              |-----------------------------------------
                                    |Open your arms to change, but don't let
http://nadav.harel.org.il           |go of your values.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org