You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by Stephen Connolly <st...@gmail.com> on 2009/09/28 12:51:47 UTC

Additional Metadata (Was maven central repository cleanup)

Here, in my view are some of the issues we face going forwards:

   1. There is sufficient stuff deployed to central with "poor quality poms"
   which do not meet our own criteria for artifact hosting on central, that
   central risks being considered "unreliable".  Some of these artifacts arose
   from the original migration from m1, others just slipped in via the rsync
   2. There are artifacts which have been deployed with just plain wrong
   pom's, e.g. javax.xml.ws:jaxws-api:2.1.  In this category I would also
   include things like  log4j:log4j:1.2.15, which has optional dependencies as
   non-optional... I recognise that the log4j case is not "technically" an
   incorrect pom, but a lot of people might consider it such
   3. Relocations, as currently implemented, are sub-optimal in the
   resolution process... which has led to people not really using them that
   much (where is org.apache.logging:log4j? nowhere because soo many projects
   depend on log4j:log4j and we have no effective way of saying that these are
   one and the same thing... ok relocations were supposed to be the way, but
   why is nobody using them?)
   4. Version ranges... this is the big one.  We have the
   - Maven 2.x's idea of a version number:
      \d+(\.\d+(\.\d+)?)?((-[1-9]\d*)|(-[^1-9].*))? where everything must fit
      maj.min.inc-build or else it's all a qualifier.  and any version with a
      qualifier is < than a version without
      - Mercury's idea of a version number: an infinite chain of segments
      separated by - or . which are compared as numbers if they are
both numbers,
      compared as special meaning segments if they are one of the
special meaning
      segments, or failing all of that, compared as strings. Most
versions with a
      qualifier are < a version without (some of the special qualifiers come
      after)
      - OSGi's idea of a version number: Major.Minor.Inc.qualifier.  Any
      unspecified segments are assumed to be zero (nothing new there) and a
      missing qualifier is assumed to be the empty string.  the three numeric
      segments are compared as numbers, the qualifier is compared as a string.
      All versions with a qualifier are > a version without.

Now I think that #1 and #2 can be handled with some "deprecation" metadata
at the artifactId level, e.g. (And I'm not saying this is the actual
solution, more the information we'd require)
<metadata>
  <groupId>javax.xml.ws</groupId>
  <artifactId>jaxws-api</artifactId>
  <version>2.1-1</version>
  <versioning>
    <versions>
      <version>2.0</version>
      <version deprecatedBy="2.1-1">2.1</version>
      <version>2.1-1</version>
    </versions>
  </versioning>
</metadata>

Similarly, #3 could also be handled with a "relocatedTo" tag on either the
groupId or the artifactId or both, e.g.
<metadata>
  <groupId relocatedTo="org.apache.logging">log4j</groupId>
    ...
</metadata>

In the case of both of the above, we would be changing the XML format of the
metadata... this could be bad... I have chosen attributes, because, AFAIK,
modello will ignore attributes that it does not understand... an alternative
would be to use XML PI's, e.g.

<?relocation groupId="___" artifactId="___"?>

which old clients would just ignore... of course if an old client is
rewriting the metadata, we'd loose the PI...

The other solution is the parallel file metadata... whereby we keep the
new-format metadata in a parallel file and old clients only read the old
metadata, new clients try the new metadata, if not there, fall back to the
old metadata... of course this puts a performance penalty agains the new
clients :-(

I think that #4 requires some form of metadata to allow Maven to decide how
to handle version comparison. There are issues that I see with this, in that
I think if you change a versioning scheme, you have to change groupId or
artifactId and if we add relocations into the mix, things get complex.

*Plan A*
The first way I see, is if we just make the metadata for versions ordered,
thus the metadata contains the "correct" sorting, we don't need to do
anything more, any relocations would come before the current groupId, so
that an org.apache.logging:log4j:1.2.7-1 would be > log4j;log4j:1.2.15 but <
org.apache.logging:log4j:1.2.8. Of course this way is useless as there are
too many tools out there rewriting and fixing the metadata, and given that a
lot of the metadata does not even give the full list of versions available,
we would invariably break everything completely using such a scheme.

*Plan B*
The next way I see, is if we add a versioning scheme to the metadata, e.g.
<metadata>
  <groupId>com.foo.manchu</groupId>
  <artifactId>bar-api</artifactId>
  ...
  <versioning rule="osgi">
    <versions>
     ...
    </versions>
  </versioning>
</metadata>
 We would have to have a fixed set of allowed rules.  I see 3: maven2,
mercury, osgi (ok I personally see a 4th: the numeric scheme from
versions-maven-plugin, but in the interests of stopping the forever war
about version numbering schemes, I can live with mercury as a replacement)
One alternative would be to provide the gId:aId:v coords of a comparator...
I see this as a non-runner, some non-java projects use the repository, and
they would not be able to classload a java artifact
Another alternative would be to provide a version comparison grammar at a
gId:aId:v coord

We would need to define how this would interact with relocations... do all
relocated versions come first (so that log4j:log4j:___ is always <
org.apache.logging:log4j:___)... or do we produce a composite list... what
if log4j:log4j uses a different scheme from org.apache.logging:log4j? We
could do a double comparison in such cases, e.g. merge version lists based
on segments where both comparators agree, and in segments where they
disagree, put the relocated artitfact versions first in their comparator
order, followed by the current versions in their comparator order.  e.g.

log4j:log4j uses maven2, and has 1.2.15, 1.2.16, 1.2.17-alpha-1,
1.2.17-alpha-2, 1.2.17, 1.2.18
org.apache.logging:log4j uses osgi and has 1.2.16.sp1, 1.2.17.ga,
1.2.17.sp1, 1.2.19
the merged list would then be
1.2.15, 1.2.16, 1.2.16.sp1, 1.2.17-alpha-1, 1.2.17-alpha-2, 1.2.17,
1.2.17.ga, 1.2.17.sp1, 1.2.18, 1.2.19

*Plan C*
Add an epoch to the metadata, and deploy non-zero epochs as parallel
metadata files
<metadata>
  <groupId>com.foo.manchu</groupId>
  <artifactId>bar-api</artifactId>
  <epoch>1</epoch>
  <rule>osgi</rule>
  <versioning>
    <versions>
      <version>...</version>
    </versions>
  </versioning>
</metadata>
This is not that dissimilar to plan b but makes explicit how to handle
switch-overs and relocations.
By default if the epoch and rule are not specified, we assume epoch 0 and
rule maven2.

We can specify in the maven-deploy-plugin that we are deploying epoch ___
and rule ___.

In addition, the current metadata remains as is, we don't list the non-epoch
0 versions in the current metadata files, so old clients will not see the
new versions in their range resolution, and additionally, the epoch and rule
elements do not appear in the metadata files that old clients read, so no
changes required for old clients.  Since we would be specifying the epoch in
the deploy plugin, we don't need to change the pom.xml format.

I recognise that this is not the complete list of how to solve or whether to
solve these problems, but we might as well start the discussion somewhere!

-Stephen