You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Shahab Mohammed <sh...@gmail.com> on 2014/12/03 16:15:08 UTC

[lucy-user] Is there any benchmarking details about how fast is lucy indexing

Dear Lucy Users,

I will like to know if you can direct me to a page that has Lucy Indexing
benchmarking.
I understand that benchmarking will depend on CPU/RAM hardware config etc
as well as no of fields getting indexed.

I will like to know what is rate of indexing .. ?? MB/sec that can be
indexed. If some one has done such benchmarking please share the info with
me.

Thanks
Shahab

Re: [lucy-user] Is there any benchmarking details about how fast is lucy indexing

Posted by Shahab Mohammed <sh...@gmail.com>.
Dear Nick
Thank you so much for your reply. This helps a lot to me.
Kind Regards
Shahab


On Thu, Dec 4, 2014 at 5:35 AM, Nick Wellnhofer <we...@aevum.de> wrote:

> On 03/12/2014 16:15, Shahab Mohammed wrote:
>
>> I will like to know what is rate of indexing .. ?? MB/sec that can be
>> indexed. If some one has done such benchmarking please share the info with
>> me.
>>
>
> This depends on a lot of factors like the schema and analysis chain you
> use, the total size of your index, and the hardware. But if you want a
> ballpark figure, I'd say about 1-2 MB/s.
>
> Here is some data for one of our production systems running on a typical
> VPS:
>
> Total fields: 3
> Full text field: 2
> Highlightable fields: 2
> Documents: 20,000
> Raw input size: 35 MB
> Index size: 80 MB
> Analysis chain:
>   StandardTokenizer
>   Normalizer
>   SnowballStopFilter
>   SnowballStemmer
> Total time to reindex: 30s
>
> This includes the time to pull all of the data out of a PostgreSQL
> database, prepare it for indexing, and some other unrelated operations
> which shouldn't have a large impact.
>
> Nick
>
>

Re: [lucy-user] Is there any benchmarking details about how fast is lucy indexing

Posted by Nick Wellnhofer <we...@aevum.de>.
On 03/12/2014 16:15, Shahab Mohammed wrote:
> I will like to know what is rate of indexing .. ?? MB/sec that can be
> indexed. If some one has done such benchmarking please share the info with
> me.

This depends on a lot of factors like the schema and analysis chain you use, 
the total size of your index, and the hardware. But if you want a ballpark 
figure, I'd say about 1-2 MB/s.

Here is some data for one of our production systems running on a typical VPS:

Total fields: 3
Full text field: 2
Highlightable fields: 2
Documents: 20,000
Raw input size: 35 MB
Index size: 80 MB
Analysis chain:
   StandardTokenizer
   Normalizer
   SnowballStopFilter
   SnowballStemmer
Total time to reindex: 30s

This includes the time to pull all of the data out of a PostgreSQL database, 
prepare it for indexing, and some other unrelated operations which shouldn't 
have a large impact.

Nick