You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ant.apache.org by Xavier Hanin <Xa...@sas.com> on 2007/12/07 20:31:57 UTC

intelligent repository cleaning (IVY-658)

Hi,

I've started thinking at how to implement IVY-658, and it's not that easy to get information necessary to clean a repository intelligently.

To do so, we need to have information about the dependers of a module. If Ivy shines at finding dependees, it provides no way to access dependers, so this is something requiring some work.

To start with, I've developed a small prototype of a RepositoryManagementEngine, which actually loads all repository metadata in memory. First I didn't even consider that idea, because it doesn't scale. But since it was the easiest thing to start with, I gave it a try, and I now have something able to load a whole repository in memory, and able to return information such as the list of all modules with no dependers (the information needed for IVY-658). It could also be improved to easily to get other informations quickly, once everything is in memory it's pretty easy.

I'm currently doing my test on a linux box with sun jvm 1.4.2, accessing a NetApp filesystem repository (with very good performance). On this box I have this kind of results to load a repository of 1200 modules, 3000 module revisions: 40s, 60MB (memory usage is approximative, I've used an utility class based on [1]). If I extrapolate these results, here's what I get:
revs    time    memory
3k      40"     60MB
25k     6'      500MB
100k    22'     2GB

I'm pretty happy with the time results (the environment is well suited for that, but since it's a repository maintenance task, I guess most people could run it very close to their repository data, during night or over a week-end).

As expected memory usage can more quickly become an issue. So I've done some investigation on memory usage, and it appears that the ModuleRevisionId have a significant impact on memory usage. Indeed these objects are used not only to identify the module revisions loaded, but also in each dependency descriptor to store the content of the requested module revision.

I've found that in my use case Ivy was creating around 50k instances of ModuleRevisionId. These objects being immutable, I've tried to use a strategy similar to String#intern() to reuse the same instance whenever possible. I'be then decreased the number to 6k instances, with a total memory used by the in memory repository information of 43MB (around 28% better).

Then I thought another area of improvement may be the dependency descriptors themselves (around 46k instances in my test case). In DefaultDependencyDescriptor, we create the instances of LinkedHashMap used to store information when we create the object. For the exclude rules, include rules and dependency artifacts, very frequently they are not used at all (never in my test case). So I've change DefaultDependencyDescriptor to init these attributes only when needed, and ended up with a 31MB footprint for the whole repository. So my new extrapolation is now:
revs    time    memory
3k      40"     31MB
25k     6'      260MB
100k    22'     1.1GB

So I plan to commit these changes to Ivy trunk. The changes on DefaultDependencyDescriptor just makes the code slightly less readable, so I don't think it's an issue. For ModuleRevisionId, it introduces a very simple cache of instances based on a WeakHashMap. It means we have a get in a Map whenever we create a new ModuleRevisionId. I don't think it will impact the performance much, and may even decrease memory footprint for regular Ivy usage.

If you see any problem with that, feel free to let me know and we'll see how to address that differently.

BTW, the repository cleaning task is not done yet, just repository loading and basic analysis.

Xavier

 [1] http://java.sun.com/docs/books/performance/1st_edition/html/JPRAMFootprint.fm.html


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org