You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@archiva.apache.org by "Brett Porter (JIRA)" <ji...@codehaus.org> on 2010/03/02 09:11:55 UTC
[jira] Commented: (MRM-589) investigate performance issues with initial scan on a large repository

    [ http://jira.codehaus.org/browse/MRM-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=212109#action_212109 ] 

Brett Porter commented on MRM-589:
----------------------------------

Here's some analysis of the directory walker itself:

{code}
ant scanner
12534 ms
retained memory ~27200808
Files: 67630
plexus-utils walker
3759 ms
retained memory ~1376
Files: 67597
new walker
3188 ms
retained memory ~712
Files: 67597
commons-io walker
4160 ms
retained memory ~7056
Files: 67630
Current repository scanner
4382 ms
retained memory ~80064

.\ Scan of null \.__________________________________________
  Repository Dir    : /Users/brett/Library/Application Support/Archiva/data/repositories/internal
  Repository Name   : null
  Repository Layout : default
  Known Consumers   : <none>
  Invalid Consumers : <none>
  Duration          : 4 Seconds 232 Milliseconds
  When Gathered     : 3/2/10 6:44 PM
  Total File Count  : 67597
  Avg Time Per File : 
______________________________________________________________
{code}

So there is some time and memory to be gained by switching to the Ant 1.8 SelectorUtils implementation instead of the plexus-utils version. However, switching to commons-io's walker is a bit slower and more memory consuming (though I didn't analyze if it could be improved by using, say, regex filters instead).

There are potential improvements in the two walkers by avoid the re-tokenization of the exclude paths on every call which I'll explore next.

As you can see, there is quite some overhead in the RepositoryScanningInstance as well.

Beyond this, I haven't examined the overhead of each of the current consumers when they are and are not operating which still needs to be looked at.

> investigate performance issues with initial scan on a large repository
> ----------------------------------------------------------------------
>
>                 Key: MRM-589
>                 URL: http://jira.codehaus.org/browse/MRM-589
>             Project: Archiva
>          Issue Type: Task
>    Affects Versions: 1.0-beta-4
>            Reporter: Brett Porter
>            Assignee: Brett Porter
>             Fix For: 1.4
>
>
> I've recently found scanning a copy of central to take several hours instead of <0.5hr that I'd expect.
> Should be easy to pinpoint hotspots and make this more efficient - it may just be a memory consumption issue.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira