You are viewing a plain text version of this content. The canonical link for it is here.
Posted to m2-dev@maven.apache.org by John Casey <jd...@commonjava.org> on 2005/03/18 01:13:00 UTC

new repository conversion and cleanup tool in sandbox: repoclean

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

M2 Devs,

Sorry for the long email, but please hear me out. I feel like I finally
have enough of an understanding on this problem that I can talk
intelligently with others. So, here is a summary of my first pass at
solving the problem of repository conversion...

I know that we already have two repository-massaging tools in
maven-components already. I looked at both when trying to address
MNG-197. In the case of the pre-alpha-to-v4 converter, it didn't seem
expandable into the world of v3 poms without a major overhaul. Which
leaves the repository-tool subproject. There are two reasons why I
didn't just enhance repository-tool:

1. I didn't feel like I had enough time before this weekend's
mini-deadline to understand and complete the repository-tool. This would
have involved quite a lot of code-reading, and double-checking against
v3 documentation to ensure that all bases are covered.

2. It looks like this would be better suited as a plexus application.
Refactoring to a plexus-based app will allow us to introduce new
validators and/or patching components to improve the conversion process.

I don't know if these were reason enough to justify a whole new
application, and I'd be happy to merge the repoclean tool with
repository-tool in the future when time permits.

Carlos, especially for your sake, I'd like to outline the features and
approach of repoclean. Once you have an idea of what I've done, perhaps
you can suggest some things I missed, and we can start bringing these
tools toward a merger.

Essentially, I was trying to follow the contents of MNG-197 as closely
as possible with this tool. The resulting architecture is an
acknowledgement of the fact that we're really doing whole-repository
maintenance here, not merely translating individual poms. Here is a
brief rundown of the design:

1. Main class which processes command-line arguments, starts an Embedder
instance, looks up an instance of the RepositoryCleaner component, and
fires the cleanRepository() method.

2. RepositoryCleaner is the controller class for the application. Using
plexus' dependency injection, I have access to the components that
execute the various operations on the repository. Some of these operate
on the whole repository at a go, and others operate on a per-POM basis.

  A. Verify the validity of both the repository basedir and the reports
basedir. If either is invalid, error out.

  B. Scan the repository to create a list of pom files in the
repository. We'll use this list multiple times later, so this is an
optimization step.

  C. Scan the repository to create a list of artifact (non-pom, non-md5)
files in the repository. We'll use this list multiple times later, so
this is an optimization step.

  D. Setup the reporter for the repo-level operations.

  E. Call the ArtifactPomCorrelator which matches POMs to artifacts, and
spits out error messages to the reporter for any orphaned artifacts.
This is a repository-level operation.

  F. Call the ArtifactMd5Correlator which matches artifacts to MD5
digest files, and spits out error messages to the reporter for any
artifacts that are not accompanied by MD5 digests.

     If we're not executing in report-only mode, this component will
also create any missing md5 files.

     This is also a repository-level operation.

  G. Now, we move into the per-POM operations. For each POM, we first
setup a Reporter to record errors/warnings/etc. pertaining only to that POM.

  H. Read the v3 POM from file.

  I. Translate the v3 POM to a v4 POM using the PomV3ToV4Translator.
This will spit out warnings to the reporter for any elements that don't
translate (like aspectSourceDirectory), and errors where only partial
information is provided (as in distributionSite/distributionDirectory).
This is the only validation provided by the translator.

  J. Call the V4ModelIndependenceValidator to verify the ability of that
model to provide the minimum required information set to distinguish one
project from another, independent of any information in a parent model
(via the <extend/> element, which is ignored). On this pass, only report
failures as warnings to the reporter.

  K. If (J.) above fails, call the V4ModelPatcher to parse the path of
the POM in the repository in an effort to glean any information that may
be missing from the model. If the path is valid, fill in any missing
information in the model.

  L. If (J.) above fails, re-call the V4ModelIndependenceValidator, this
time in error-reporting mode. If the model is still missing required
information, this time the validator will report errors instead of warnings.

  M. If we're not executing in report-only mode, write the v4 POM to the
repository in place of the old v3 POM file.

  N. Flush all reporters.

As you can see, this tool does not account for backup/restore operations
on the repository. It is assumed that measures will be taken outside the
scope of the tool to make a backup copy of the repository before execution.

If I'm missing anything in this, please let me know. I've included a
bash script to install the tool at a location of your choice using:

'sh ./install.sh /path/to/target/install/dir /path/to/local/repo'

and another bash script to execute the tool using:

'./repoclean.sh /path/to/repository /path/to/reports/directory'

I think repoclean is a reasonable first stab at this problem, but I know
it needs to be much better than that. Please don't hesitate to shoot
holes in this thing! :)

Also: I will be duplicating this doco in an APT file somewhere in
maven-components, so that we can start recording the design discussion.

Thanks,

john
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCOh0LK3h2CZwO/4URAp/BAJwIT/F9tlVgnhICWJCXMHy2E8tWEQCeNb8F
3jGhFSdMOK1sp05khzPJQ94=
=ZE3H
-----END PGP SIGNATURE-----

Re: new repository conversion and cleanup tool in sandbox: repoclean

Posted by John Casey <jd...@commonjava.org>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi Carlos,

I had completely forgotten about verifying the validity of a checksum!
I'll add this now, as it's pretty easy to do.

I'm also going to be adding support for the new repository layout, laid
out in:

http://docs.codehaus.org/pages/viewpage.action?pageId=22230

Currently, I have the model source for v3 and v4 checked into the source
tree for the repoclean tool, mostly because I was going to have a
problem with specifying both <version>3.0</version> and
<version>4.0</version> for maven-model (or whatever the actual artifact
versions are) in the dependencies list.

While it is technically possible, provided their packaging structures
continue to be compatible, m2 will scream at having what it sees as two
conflicting artifact-versions in a single project. Not optimal, but
there you go.

I don't have any dependencies on wagon or artifact, since I'm dealing
with things on mostly a file level, and all self-contained within a
single repository. For the only higher-level processing I need to do, I
use the model-reader and -writer included in the above discussion. As
for MD5 generation/verification, this is actually quite simple, so I've
reused some code I had in another project to quickly generate/write MD5
files...I'll extend this for verification purposes.

The only packages I'm really dependent upon are plexus-container-default
and plexus-utils, the former for starting and managing dependencies
within the application, and the latter basically for IOUtil and StringUtils.

I have a few JDK 1.4+ regular expressions that I'm using to parse info
from the POM path (groupId, artifactId, version, etc.), so I'll probably
just reuse some of this code when it comes time to adjust
'groupPart1.groupPart2.groupPart3' to 'groupPart1/groupPart2/groupPart3'
etc.

If you have suggestions on any of this, or whatever else, please don't
hesitate to reply.

- -j

Carlos Sanchez wrote:
> Hi John,
> 
> I'll try to explain a bit what I've done with the repo tools.
> 
> I've added some logic to repository-tools to get all the artifacts
> from a file based repository. I needed to make this because wagon
> doesn't provide a way to get all the artifacts to iterate over them.
> It was just a quick hack.
> 
> Now I'm able to know if each artifact have a checksum throught
> maven-artifact. To ease the use of the tool I've created a repository
> mojo (which I'll move to the sandbox because it's a better place) that
> prints all files in the local repo which don't have checksum file.
> 
> The last thing I tried is checking that the checksum corresponds the
> file. I was trying to reuse the checksum observer from wagon, so my
> idea was downloading the artifacts from the local repo to a temp dir
> using wagon.
> 
> Another problem with the repository tools component is that I needed
> to remove two files in order to get it compiled, because the Pomv3tov4
> class tries to access maven model in a package v300 that doesn't
> exists.
> 
> I'll try to elaborate on this and write some doc but in the meantime
> feel free to ask any question.
> 
> Regards
> 
> Carlos
> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD4DBQFCO1KpK3h2CZwO/4URAghJAJioC80e19sOFxOJKHZOAU7ZN95uAKCJ7Ymn
6uSKQw3Py+JYslcUE6jNPQ==
=9PLb
-----END PGP SIGNATURE-----

Re: new repository conversion and cleanup tool in sandbox: repoclean

Posted by Carlos Sanchez <ca...@gmail.com>.
Hi John,

I'll try to explain a bit what I've done with the repo tools.

I've added some logic to repository-tools to get all the artifacts
from a file based repository. I needed to make this because wagon
doesn't provide a way to get all the artifacts to iterate over them.
It was just a quick hack.

Now I'm able to know if each artifact have a checksum throught
maven-artifact. To ease the use of the tool I've created a repository
mojo (which I'll move to the sandbox because it's a better place) that
prints all files in the local repo which don't have checksum file.

The last thing I tried is checking that the checksum corresponds the
file. I was trying to reuse the checksum observer from wagon, so my
idea was downloading the artifacts from the local repo to a temp dir
using wagon.

Another problem with the repository tools component is that I needed
to remove two files in order to get it compiled, because the Pomv3tov4
class tries to access maven model in a package v300 that doesn't
exists.

I'll try to elaborate on this and write some doc but in the meantime
feel free to ask any question.

Regards

Carlos