You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@archiva.apache.org by "Brett Porter (JIRA)" <ji...@codehaus.org> on 2010/03/02 09:11:55 UTC
[jira] Commented: (MRM-589) investigate performance issues with
initial scan on a large repository
[ http://jira.codehaus.org/browse/MRM-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=212109#action_212109 ]
Brett Porter commented on MRM-589:
----------------------------------
Here's some analysis of the directory walker itself:
{code}
ant scanner
12534 ms
retained memory ~27200808
Files: 67630
plexus-utils walker
3759 ms
retained memory ~1376
Files: 67597
new walker
3188 ms
retained memory ~712
Files: 67597
commons-io walker
4160 ms
retained memory ~7056
Files: 67630
Current repository scanner
4382 ms
retained memory ~80064
.\ Scan of null \.__________________________________________
Repository Dir : /Users/brett/Library/Application Support/Archiva/data/repositories/internal
Repository Name : null
Repository Layout : default
Known Consumers : <none>
Invalid Consumers : <none>
Duration : 4 Seconds 232 Milliseconds
When Gathered : 3/2/10 6:44 PM
Total File Count : 67597
Avg Time Per File :
______________________________________________________________
{code}
So there is some time and memory to be gained by switching to the Ant 1.8 SelectorUtils implementation instead of the plexus-utils version. However, switching to commons-io's walker is a bit slower and more memory consuming (though I didn't analyze if it could be improved by using, say, regex filters instead).
There are potential improvements in the two walkers by avoid the re-tokenization of the exclude paths on every call which I'll explore next.
As you can see, there is quite some overhead in the RepositoryScanningInstance as well.
Beyond this, I haven't examined the overhead of each of the current consumers when they are and are not operating which still needs to be looked at.
> investigate performance issues with initial scan on a large repository
> ----------------------------------------------------------------------
>
> Key: MRM-589
> URL: http://jira.codehaus.org/browse/MRM-589
> Project: Archiva
> Issue Type: Task
> Affects Versions: 1.0-beta-4
> Reporter: Brett Porter
> Assignee: Brett Porter
> Fix For: 1.4
>
>
> I've recently found scanning a copy of central to take several hours instead of <0.5hr that I'd expect.
> Should be easy to pinpoint hotspots and make this more efficient - it may just be a memory consumption issue.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira