You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/10/25 11:08:58 UTC

[jira] [Commented] (MINDEXER-99) improve performance loss introduced in MINDEXER-77

    [ https://issues.apache.org/jira/browse/MINDEXER-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604990#comment-15604990 ] 

ASF GitHub Bot commented on MINDEXER-99:
----------------------------------------

GitHub user tstupka opened a pull request:

    https://github.com/apache/maven-indexer/pull/12

    resolve performance loss due to lucene 4.8.1 - upgrade to lucene 553 and additional fixes

    since lucene was upgraded to 4.8.1 the indexer takes 2.5x longer than with lucene 3.6. This seems be a cumulative effect of partial reductions in performance introduced in particular lucene releases after 3.6 - see also https://issues.apache.org/jira/browse/MINDEXER-99
    
    see the particular commits in this request. Each addresses a suggestion for a specific improvement. When all applied, the resulting performance is comparable with the performance before the above mentioned upgrade.
    
    #0bb9484 - upgrading lucene from 4.8.1 to 5.5.3
    performance was improved in lucene 5.x.
    with 5.5.3 the indexer works significantly faster than with 4.8.1
    
    #3cfa430 - avoid rebuilding groups after reading index
    after generating the index it has to be re-read one more time to extract a distinct list of allGroups and rootGroups, even though that info was already available, but thrown away.
    
    #8b98a49 - improve reading from zip file
    
    #4062146 - do not unnecessarily force merge on index writer
    merge is very expensive - lets trust lucene to merge when it seems fit. 
    the final index size without force merges was 910mb compared to 900mb with fm.
    the time improvement is aprox 30%

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tstupka/maven-indexer lucene_553

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/maven-indexer/pull/12.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12
    
----
commit 0bb9484e6eaea3e7974d7e5c9b5ab3d6802780e9
Author: Tomas Stupka <to...@oracle.com>
Date:   2016-10-24T15:25:46Z

    upgrading lucene from 4.8.1 to 5.5.3

commit 3cfa430d71a8d58a0454966c0dd183e37f5fb067
Author: Tomas Stupka <to...@oracle.com>
Date:   2016-10-25T08:47:19Z

    avoid rebuilding groups after reading index

commit 8b98a495186cafe20ee6494719185e74813ea15e
Author: Tomas Stupka <to...@oracle.com>
Date:   2016-10-25T09:01:18Z

    improve reading from zip file

commit 40621465f3ebf14a89961d07ded0d17a4d2d61bc
Author: Tomas Stupka <to...@oracle.com>
Date:   2016-10-25T09:50:29Z

    do not unnecessarily force merge on index writer

----


> improve performance loss introduced in MINDEXER-77
> --------------------------------------------------
>
>                 Key: MINDEXER-99
>                 URL: https://issues.apache.org/jira/browse/MINDEXER-99
>             Project: Maven Indexer
>          Issue Type: Improvement
>    Affects Versions: 6.0
>            Reporter: Tomas Stupka
>             Fix For: 6.0
>
>
> even though lucene upgrade from 3.6 to 4.8.1 significantly reduced disk space usage, the unfortunate tradeoff is, that the indexer now needs approximately 2.5  times longer - 120s vs 300s, measured by using the indexer in current NetBeans dev build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)