You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Sudarsan, Sithu D." <Si...@fda.hhs.gov> on 2008/11/14 17:57:35 UTC

RE: Multi -threaded indexing of large number of PDF documents

 
Hi All,

Based on your valuable inputs, we tried a few experiments with number of
threads. The observation is, if the number of threads are one less than
the number of cores (we have 'main' as a separate thread. Essentially,
including 'main' number of threads equal to number of cores), the
indexing performance reaches the optimum level, with maximum CPU
utilization. We have tried with Windows XP as well as CentOS. 

As regards to number of documents to be indexed and size of the
documents, there seems to be some correlation  between the two. But we
are yet to ascertain the same.

At this point, we are writing to a single IndexWriter. Have not tried
comparing using multiple writers and merge them to benchmark the
performance.

Sincerely,
Sithu D Sudarsan

sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org