You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Peter Carlson <ca...@bookandhammer.com> on 2002/05/03 15:50:57 UTC

Performance benchmarks

Some performance numbers

Java Version: 1.3_01
OS Version: Windows 2000
CPU (Type, Speed and Quantity): Pentium 4, 1.5 GHz, 1 CPU
RAM: 512 MB
Drive configuration (IDE, SCSI, RAID-1, RAID-5): IDE (single)
Number of source documents: 103009
Total filesize of source documents: 430MB
Average filesize of source documents (in KB/MB): 4.3KB
Source documents storage location (filesystem, DB, http,etc): Filesystem
File type of source documents: xml
Parser(s) used, if any: Standard Analyzer
Number of Fields per document: 8
Time taken (in ms/s as an average of at least 3 indexing runs): 8387 sec
(139 min)
Time taken / 1000 docs indexed: 81 sec / 1000 docs
Notes (any special tuning/strategies):
I convert each document to a DOM, and use xpath to get the fields.
I perform validation on the data and make sure that it meets certain
criteria like total size > 150 characters, and verify there are no
duplicates using a Hashmap. Without these checks, the indexing goes faster
(about 60 seconds/1000 docs).


I hope this is helpful.
--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Performance benchmarks

Posted by Kelvin Tan <ke...@relevanz.com>.

Great Peter. I've posted a new set of attributes based on your submission
and Otis' feedback. Let me think about the best way to consolidate these
numbers and stick them somewhere accessible for all.

----- Original Message -----
From: "Peter Carlson" <ca...@bookandhammer.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, May 03, 2002 9:50 PM
Subject: Performance benchmarks


> Some performance numbers
>
> Java Version: 1.3_01
> OS Version: Windows 2000
> CPU (Type, Speed and Quantity): Pentium 4, 1.5 GHz, 1 CPU
> RAM: 512 MB
> Drive configuration (IDE, SCSI, RAID-1, RAID-5): IDE (single)
> Number of source documents: 103009
> Total filesize of source documents: 430MB
> Average filesize of source documents (in KB/MB): 4.3KB
> Source documents storage location (filesystem, DB, http,etc): Filesystem
> File type of source documents: xml
> Parser(s) used, if any: Standard Analyzer
> Number of Fields per document: 8
> Time taken (in ms/s as an average of at least 3 indexing runs): 8387 sec
> (139 min)
> Time taken / 1000 docs indexed: 81 sec / 1000 docs
> Notes (any special tuning/strategies):
> I convert each document to a DOM, and use xpath to get the fields.
> I perform validation on the data and make sure that it meets certain
> criteria like total size > 150 characters, and verify there are no
> duplicates using a Hashmap. Without these checks, the indexing goes faster
> (about 60 seconds/1000 docs).
>
>
> I hope this is helpful.
> --Peter
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>