You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Vinodh Boopalan <vi...@contentdiv.com> on 2017/02/21 05:16:45 UTC

Performance benchmarking

Hi Karl/Team,

I am trying to benchmark ManifoldCF performance using the following configuration,

Host Specs

VM running in ESXi
Memory: 8GB
CPU: 2 with 2 cores
SSD

Running MCF in combined single process (1024MB JVM, 100 workers, 105 db connections) . Postgresql setting are mostly per the link,
https://manifoldcf.apache.org/release/release-2.5/en_US/performance-tuning.html

I have around 2300 docs which is around 3.5 GB in total file size.  I am getting a throughput of (null output) 13-14 docs/sec (18 docs/sec – best I have seen).

Is that a reasonable performance to expect?

Can I scale further by using the multiprocess deployment?  (have multiple agents against the postgresql). Or should I explore scaling Postgres?

Thanks and best regards,
Vinodh










Re: Performance benchmarking

Posted by Karl Wright <da...@gmail.com>.
For a 100,000 local document crawl (small files, which is the worst for
database performance), the first 10,000 documents clock in at 92 documents
per second here.  That's with an out-of-the-box Windows postgresql 9.3 on a
quad core processor with SSD.  The performance drops off from that pace; at
26K documents we're down to 55 docs/second.  But that's where it bottoms
out; at 36K we see 57 docs/second, and at 51K we see 65 docs per second.

The slowdown is as a result of the Postgresql tables getting larger and no
longer fitting in memory.  The speedup is due to no further document
discovery occurring and just plain crawling taking place.

Probably it is the 4-core architecture that makes the difference between
your results and mine.  My 2-core machine without SSDs is what I
benchmarked before and that's the one that runs some 20 docs/second.  (I
misremembed about the SSDs, sorry).

Thanks,
Karl


On Tue, Feb 21, 2017 at 1:20 AM, Karl Wright <da...@gmail.com> wrote:

> Hi Vinodh,
>
> ManifoldCF doc per second performance is limited mainly by database
> performance.  Postgresql 9.x on Windows seems a bit slower than Postgresql
> 8.x was; I'd typically get 25 documents per second on 8X even without SSDs,
> but with Postgresql 9.x on Windows the numbers are a good deal poorer.
> Still, I haven't seen it quite that bad; 20 docs per second is what I
> recall seeing in informal testing here.
>
> I haven't tried tuning for database performance on Windows ever.  Linux
> Postgresql deployments do a lot better, in my experience, and are probably
> more responsive to tuning.
>
> I'll do some experiments, as time permits, and get back to you.
>
> Karl
>
>
> On Tue, Feb 21, 2017 at 12:18 AM, Vinodh Boopalan <
> vinodh.boopalan@contentdiv.com> wrote:
>
>> Forgot to mention, currently focusing on Windows Share/Local File system.
>>
>> On 2/21/17, 12:16 AM, "Vinodh Boopalan" <vi...@contentdiv.com>
>> wrote:
>>
>>     Hi Karl/Team,
>>
>>     I am trying to benchmark ManifoldCF performance using the following
>> configuration,
>>
>>     Host Specs
>>
>>     VM running in ESXi
>>     Memory: 8GB
>>     CPU: 2 with 2 cores
>>     SSD
>>
>>     Running MCF in combined single process (1024MB JVM, 100 workers, 105
>> db connections) . Postgresql setting are mostly per the link,
>>     https://manifoldcf.apache.org/release/release-2.5/en_US/perf
>> ormance-tuning.html
>>
>>     I have around 2300 docs which is around 3.5 GB in total file size.  I
>> am getting a throughput of (null output) 13-14 docs/sec (18 docs/sec – best
>> I have seen).
>>
>>     Is that a reasonable performance to expect?
>>
>>     Can I scale further by using the multiprocess deployment?  (have
>> multiple agents against the postgresql). Or should I explore scaling
>> Postgres?
>>
>>     Thanks and best regards,
>>     Vinodh
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Performance benchmarking

Posted by Karl Wright <da...@gmail.com>.
Hi Vinodh,

ManifoldCF doc per second performance is limited mainly by database
performance.  Postgresql 9.x on Windows seems a bit slower than Postgresql
8.x was; I'd typically get 25 documents per second on 8X even without SSDs,
but with Postgresql 9.x on Windows the numbers are a good deal poorer.
Still, I haven't seen it quite that bad; 20 docs per second is what I
recall seeing in informal testing here.

I haven't tried tuning for database performance on Windows ever.  Linux
Postgresql deployments do a lot better, in my experience, and are probably
more responsive to tuning.

I'll do some experiments, as time permits, and get back to you.

Karl


On Tue, Feb 21, 2017 at 12:18 AM, Vinodh Boopalan <
vinodh.boopalan@contentdiv.com> wrote:

> Forgot to mention, currently focusing on Windows Share/Local File system.
>
> On 2/21/17, 12:16 AM, "Vinodh Boopalan" <vi...@contentdiv.com>
> wrote:
>
>     Hi Karl/Team,
>
>     I am trying to benchmark ManifoldCF performance using the following
> configuration,
>
>     Host Specs
>
>     VM running in ESXi
>     Memory: 8GB
>     CPU: 2 with 2 cores
>     SSD
>
>     Running MCF in combined single process (1024MB JVM, 100 workers, 105
> db connections) . Postgresql setting are mostly per the link,
>     https://manifoldcf.apache.org/release/release-2.5/en_US/
> performance-tuning.html
>
>     I have around 2300 docs which is around 3.5 GB in total file size.  I
> am getting a throughput of (null output) 13-14 docs/sec (18 docs/sec – best
> I have seen).
>
>     Is that a reasonable performance to expect?
>
>     Can I scale further by using the multiprocess deployment?  (have
> multiple agents against the postgresql). Or should I explore scaling
> Postgres?
>
>     Thanks and best regards,
>     Vinodh
>
>
>
>
>
>
>
>
>
>
>
>

Re: Performance benchmarking

Posted by Vinodh Boopalan <vi...@contentdiv.com>.
Forgot to mention, currently focusing on Windows Share/Local File system. 

On 2/21/17, 12:16 AM, "Vinodh Boopalan" <vi...@contentdiv.com> wrote:

    Hi Karl/Team,
    
    I am trying to benchmark ManifoldCF performance using the following configuration,
    
    Host Specs
    
    VM running in ESXi
    Memory: 8GB
    CPU: 2 with 2 cores
    SSD
    
    Running MCF in combined single process (1024MB JVM, 100 workers, 105 db connections) . Postgresql setting are mostly per the link,
    https://manifoldcf.apache.org/release/release-2.5/en_US/performance-tuning.html
    
    I have around 2300 docs which is around 3.5 GB in total file size.  I am getting a throughput of (null output) 13-14 docs/sec (18 docs/sec – best I have seen).
    
    Is that a reasonable performance to expect?
    
    Can I scale further by using the multiprocess deployment?  (have multiple agents against the postgresql). Or should I explore scaling Postgres?
    
    Thanks and best regards,
    Vinodh