You are viewing a plain text version of this content. The canonical link for it is here.
Posted to repository@apache.org by Brett Porter <br...@apache.org> on 2005/07/25 01:25:57 UTC

Maven repository policies

Hi,

First of all, please do not retain the CC's on any responses to this.
Folks interested in this topic, could you please subscribe and reply to
the repository@apache.org list - thanks.

Ok, what I wanted to talk about here was establishing a new Maven 2
repository for deployment of Apache redistributable artifacts, and any
policies that should be enforced on that. The Maven 2 repository layout
is currently described here:
http://docs.codehaus.org/pages/viewpage.action?pageId=22230.

I would like to have something workable for everyone going from the
outset. Here is what I am proposing - please let me know if you have any
feedback, additional requirements, objections, etc.

This repository, which would be in parallel to the existing
/dist/java-repository would be the place for Apache projects built with
Maven 2 or Ant (using the Maven 2 deployment tasks) to drop individual
libraries to. Maven 1 built projects can continue to use the existing
java-repository - only one or the other should be used. Both will be
rsynced to ibiblio and available to both Maven 1 & 2 clients.

I would suggest /dist/maven-repository as the location for this new
repository (java is too specific - we hope to include more than java
artifacts).

Some specific policies:
1) Clients would continue to point at ibiblio or a mirror, not directly
at the Apache repository. In fact I have no problem with this either not
being available over http if that assists in this goal, though it does
make it easier to browse.

2) no development builds in /dist/. Currently, there is
cvs.apache.org/repository for this in the Maven 1 style. A Maven 2
repository of cvs.apache.org/nightly/maven-repository or similar would
also need to be setup, though it would not be available to Maven 1
clients. Maven 2 has built in controls to ensure development builds
arrive at the correct repository (if setup). An alternative to having
this repository on cvs.apache.org might be to have projects create their
own in their zone, using whatever they use for nightly builds or
continuous integration.

3) require long groupId's. Currently the top level directory is polluted
with a lot of different project names. I would like projects to use
org.apache.project.subproject as the groupId so that it uses the
directory structure /org/apache/project/subproject keeping individual
directories with less files in them. The longest I can think of would be
org.apache.jakarta.commons.collections. This should also be done for any
future Maven 1 based releases (it works identically, but retains a
shallow structure in the Maven 1 layout).

4) set permissions to group writable with the appropriate unix group
governing modification of jakarta, for example. This is one area I'd
like to investigate more to ensure we have proper controls on.

5) all files must have an .md5 and .sha1 checksum. Maven deploys these
automatically, and I believe this is already monitored by a script which
we could crack down on violations of in the new repo.

6) all files in the /dist/ repository must have a .asc signature. We
will need to get this automated by the final release of Maven 2.

Some things I would like to get more information on would be:
- what is the best way to rsync this data from another machine?
- what should the deletion policy be? I would think that something
deleted from this repository should automatically be deleted from
mirrors, the ibiblio Maven repositories and its mirrors.
- what should be the archive policy? People often use older releases
from the repository, and require them for historical builds. If we want
to archive them from this repository to archive.apache.org, they still
need to be on ibiblio, so this needs to be managed with the deletion
policy above.

Looking forward to hearing your thoughts over at repository@apache.org.
Thanks for your time.

Cheers,
Brett



Re: Maven repository policies

Posted by Niclas Hedhman <ni...@hedhman.org>.
On Friday 29 July 2005 01:45, robert burrell donkin wrote:
> maybe i'm missing the reason why organisation structure needs to be
> embedded in the structure of the repository...

IIUIC, there are 3 issues on the table;

 1. Reduction of entries in any given directory at the resource host.

 2. Pattern to ensure conflicts will not occur over time, when groups are 
created outside the control of the Maven/Repository team itself.

 3. Pattern to ensure that if two developers independently artifactize the 
same resource, they will come up with the same group and name. (A bit 
unrealistic, but noble idea  :o)  )

Personally, I think package names would work well enough. If Castor is a 
hopeless example, wouldn't a simple Google;  
   castor site:www.ibiblio.org/maven
provide all the answer you are looking for ?


Cheers
Niclas

Re: Maven repository policies

Posted by Niclas Hedhman <ni...@hedhman.org>.
On Friday 29 July 2005 03:22, Steve Loughran wrote:
> because flat naming schemes dont scale. http://ibiblio.org/maven/ is a
> case in point.

Btw, I find it remarkable that it should take any time at all to serve a page 
requested 1000 times per hour hour and barely ever changes. Can perhaps 
someone with mod_proxy experience help ibiblio.org to set that up ?


Cheers
Niclas

Re: Maven repository policies

Posted by robert burrell donkin <rd...@apache.org>.
On Thu, 2005-07-28 at 20:22 +0100, Steve Loughran wrote:
> On 7/28/05, robert burrell donkin <rd...@apache.org> wrote:

<snip>

> > for example, it's asking a lot for a user to know that for castor x.y.z,
> > i need to look under www.exolab, for castor a.b.c i need to look under
> > sourceforge.net and for castor n.m.l i should look under java.net. IMHO
> > not obvious. on the other hand, org.castor is really easy to find.
> 
> yes, org.castor makes the most sense. 

virtual organisation does make sense: the first two parts of the
package. 
 
> > > maybe it could be like package naming, with some rules but also
> > > per-org freedom. Here, using the MUST/MAY/SHOULD terminology of IETF
> > > specs:
> > >
> > > 0. packages MUST use the domain at the beginning (org.apache,com.sun)
> > 
> > therefore projects hosted at sourceforge would have to start with
> > sourceforge.net
> 
> unless they have their own domain/org. Certainly projects I work on by
> day are sforge hosted, but they have their own domains. even my laptop
> has its own domain these days.

yep. i suppose that the real point is exactly what the organisation
means.

i think that the first two parts of the package name would work nicely
for the virtual organisation (java organisational uri, perhaps?), a lot
better than the actual organisation that happens to be hosting the
project at this particular time.
 
> > > 1. it's left to every organisation to do layout under there -we just
> > > declare what they SHOULD do.
> > >
> > > 2. for apache, jakarta stuff MAY go toplevel, as would tomcat, ant,
> > > maven, xml, ws projects.
> > 
> > is there any particular reason why you have proposed jakarta as a
> > special case?
> 
> I worry about apache itself devolving to a flat directory structure
> too. I suppose the main projects could be standalone, but what about
> the commons things?

who knows the future? some say they'll end as the whole of jakarta. some
say they'll end as a TLP. maybe everyone will just move to pastures new.

one complexity is that there is a certain amount of reshuffling of
component between the various TLP's. i think that this may increase in
the future. (lots of other TLPs are creating their own commons and
existing components may elect to jump ship.)  

> > > 3. jakarta-commons MUST  be separate from org.apache.commons,
> > 
> > is there any particular reason why you have proposed jakarta-commons as
> > a special case?
> > 
> > > in case there is an xml-commons, ws-commons, etc.
> > 
> > (FWIW these already exist but coexist by choosing different names for
> > components)
> > 
> > maybe i'm missing the reason why organisation structure needs to be
> > embedded in the structure of the repository...
> > 
> 
> because flat naming schemes dont scale. http://ibiblio.org/maven/ is a
> case in point.

i agree that a hierarchy is needed but wonder whether a hierarchy of
organisations needs a little more thought. pure package name probably
can't be used (for example, jaxme distributes a clean room JAXB api
which is packaged javax but is created by apache) but nether can hosting
organisation. i think that virtual organisations (usually the start of
the java package name) would work a lot better than hosting
domain/organisation. 

i also suspect that general rules would work better in the medium term
than jakarta specific ones. this would allow other TLPs to fit more
easily into the framework...

- robert 

Re: Maven repository policies

Posted by Steve Loughran <st...@gmail.com>.
On 7/28/05, robert burrell donkin <rd...@apache.org> wrote:
> On Thu, 2005-07-28 at 10:22 +0100, Steve Loughran wrote:
> > On 7/27/05, robert burrell donkin <rd...@apache.org> wrote:
> > > On Tue, 2005-07-26 at 07:12 -0700, Phil Steitz wrote:
> > > > Brett Porter wrote:
> >
> > > this is a good point: organisations change but code continues.
> > >
> > > why not just use package names?
> > >
> >
> > because often package names themselves are historical accidents.
> > "org.apache.tools.ant", being a case in point, how microsoft used
> > com.ms for all their J++ stuff another (that is morgan stanley's
> > domain, see)
> 
> the organisation structures associated with open source projects are
> also often historical accidents. at least the package is something which
> is code related and embedded in the artifacts.

good point.

> 
> for example, it's asking a lot for a user to know that for castor x.y.z,
> i need to look under www.exolab, for castor a.b.c i need to look under
> sourceforge.net and for castor n.m.l i should look under java.net. IMHO
> not obvious. on the other hand, org.castor is really easy to find.

yes, org.castor makes the most sense. 

> 
> > maybe it could be like package naming, with some rules but also
> > per-org freedom. Here, using the MUST/MAY/SHOULD terminology of IETF
> > specs:
> >
> > 0. packages MUST use the domain at the beginning (org.apache,com.sun)
> 
> therefore projects hosted at sourceforge would have to start with
> sourceforge.net

unless they have their own domain/org. Certainly projects I work on by
day are sforge hosted, but they have their own domains. even my laptop
has its own domain these days.

> 
> > 1. it's left to every organisation to do layout under there -we just
> > declare what they SHOULD do.
> >
> > 2. for apache, jakarta stuff MAY go toplevel, as would tomcat, ant,
> > maven, xml, ws projects.
> 
> is there any particular reason why you have proposed jakarta as a
> special case?

I worry about apache itself devolving to a flat directory structure
too. I suppose the main projects could be standalone, but what about
the commons things?

> 
> > 3. jakarta-commons MUST  be separate from org.apache.commons,
> 
> is there any particular reason why you have proposed jakarta-commons as
> a special case?
> 
> > in case there is an xml-commons, ws-commons, etc.
> 
> (FWIW these already exist but coexist by choosing different names for
> components)
> 
> maybe i'm missing the reason why organisation structure needs to be
> embedded in the structure of the repository...
> 

because flat naming schemes dont scale. http://ibiblio.org/maven/ is a
case in point.

Re: Maven repository policies

Posted by robert burrell donkin <rd...@apache.org>.
On Thu, 2005-07-28 at 10:22 +0100, Steve Loughran wrote: 
> On 7/27/05, robert burrell donkin <rd...@apache.org> wrote:
> > On Tue, 2005-07-26 at 07:12 -0700, Phil Steitz wrote:
> > > Brett Porter wrote:
> 
> > this is a good point: organisations change but code continues.
> > 
> > why not just use package names?
> > 
> 
> because often package names themselves are historical accidents.
> "org.apache.tools.ant", being a case in point, how microsoft used
> com.ms for all their J++ stuff another (that is morgan stanley's
> domain, see)

the organisation structures associated with open source projects are
also often historical accidents. at least the package is something which
is code related and embedded in the artifacts. 

for example, it's asking a lot for a user to know that for castor x.y.z,
i need to look under www.exolab, for castor a.b.c i need to look under
sourceforge.net and for castor n.m.l i should look under java.net. IMHO
not obvious. on the other hand, org.castor is really easy to find.

> maybe it could be like package naming, with some rules but also
> per-org freedom. Here, using the MUST/MAY/SHOULD terminology of IETF
> specs:
> 
> 0. packages MUST use the domain at the beginning (org.apache,com.sun)

therefore projects hosted at sourceforge would have to start with
sourceforge.net

> 1. it's left to every organisation to do layout under there -we just
> declare what they SHOULD do.
> 
> 2. for apache, jakarta stuff MAY go toplevel, as would tomcat, ant,
> maven, xml, ws projects.

is there any particular reason why you have proposed jakarta as a
special case?

> 3. jakarta-commons MUST  be separate from org.apache.commons, 

is there any particular reason why you have proposed jakarta-commons as
a special case?

> in case there is an xml-commons, ws-commons, etc.

(FWIW these already exist but coexist by choosing different names for
components)

maybe i'm missing the reason why organisation structure needs to be
embedded in the structure of the repository...

- robert

Re: Maven repository policies

Posted by Steve Loughran <st...@gmail.com>.
On 7/27/05, robert burrell donkin <rd...@apache.org> wrote:
> On Tue, 2005-07-26 at 07:12 -0700, Phil Steitz wrote:
> > Brett Porter wrote:

> this is a good point: organisations change but code continues.
> 
> why not just use package names?
> 

because often package names themselves are historical accidents.
"org.apache.tools.ant", being a case in point, how microsoft used
com.ms for all their J++ stuff another (that is morgan stanley's
domain, see)

maybe it could be like package naming, with some rules but also
per-org freedom. Here, using the MUST/MAY/SHOULD terminology of IETF
specs:

0. packages MUST use the domain at the beginning (org.apache,com.sun)

1. it's left to every organisation to do layout under there -we just
declare what they SHOULD do.

2. for apache, jakarta stuff MAY go toplevel, as would tomcat, ant,
maven, xml, ws projects.

3. jakarta-commons MUST  be separate from org.apache.commons, in case
there is an xml-commons, ws-commons, etc.

Re: Maven repository policies

Posted by robert burrell donkin <rd...@apache.org>.
On Tue, 2005-07-26 at 07:12 -0700, Phil Steitz wrote:
> Brett Porter wrote:

<snip>

> > 3) require long groupId's. Currently the top level directory is polluted
> > with a lot of different project names. I would like projects to use
> > org.apache.project.subproject as the groupId so that it uses the
> > directory structure /org/apache/project/subproject keeping individual
> > directories with less files in them. The longest I can think of would be
> > org.apache.jakarta.commons.collections. This should also be done for any
> > future Maven 1 based releases (it works identically, but retains a
> > shallow structure in the Maven 1 layout).
> 
> I don't know enough about maven repository management to understand 
> fully the implications of this, but it seems to me that leaving 
> "jakarta" out of the name adds flexibility with no loss of specificity. 
>   So, if one day commons is a TLP, we do not have to rename.  It seems 
> more natural to me to mirror the package names, but again I don't claim 
> to understand all of the issues here.

this is a good point: organisations change but code continues. 

why not just use package names?

- robert

Re: Maven repository policies

Posted by robert burrell donkin <rd...@apache.org>.
On Sun, 2005-07-31 at 12:55 -0700, Phil Steitz wrote:
> robert burrell donkin wrote:
> > On Fri, 2005-07-29 at 12:34 +1000, Brett Porter wrote:
> > 
> >>On 7/27/05, Phil Steitz <ph...@steitz.com> wrote:
> > 
> > 
> > <snip>
> > 
> >>>>6) all files in the /dist/ repository must have a .asc signature. We
> >>>>will need to get this automated by the final release of Maven 2.
> >>
> >>>What about KEYS?
> >>
> >>Yes, standard distribution rules. I'm not sure if we need that in the
> >>repo or just a URL from /dist/ at large - will see what comes of
> >>commons-openpgp.
> > 
> > 
> > just FYI there was a feeling at apachecon from the infrastructure movers
> > and shakers that KEYS files were an transitional expediency and that
> > they would be removed at some point in the future. 
> 
> To be replaced by what?

AIUI when the apache web of trust is strong and deep enough, there will
be no need for apache to maintain this information. you should be able
to download the key from any public key server. once the certification
authority is up and running, infrastructure will be able to start
tightening things up. GPG keys are likely to become compulsory for
committers. there's quite a lot of documentation that's going to be
needed at the foundation level before this can happen, though.

- robert

Re: Maven repository policies

Posted by Phil Steitz <ph...@steitz.com>.
robert burrell donkin wrote:
> On Fri, 2005-07-29 at 12:34 +1000, Brett Porter wrote:
> 
>>On 7/27/05, Phil Steitz <ph...@steitz.com> wrote:
> 
> 
> <snip>
> 
>>>>6) all files in the /dist/ repository must have a .asc signature. We
>>>>will need to get this automated by the final release of Maven 2.
>>
>>>What about KEYS?
>>
>>Yes, standard distribution rules. I'm not sure if we need that in the
>>repo or just a URL from /dist/ at large - will see what comes of
>>commons-openpgp.
> 
> 
> just FYI there was a feeling at apachecon from the infrastructure movers
> and shakers that KEYS files were an transitional expediency and that
> they would be removed at some point in the future. 

To be replaced by what?
> 

Phil

> - robert


Re: Maven repository policies

Posted by Steve Loughran <st...@gmail.com>.
On 7/31/05, robert burrell donkin <rd...@apache.org> wrote:
> On Fri, 2005-07-29 at 12:34 +1000, Brett Porter wrote:
> > On 7/27/05, Phil Steitz <ph...@steitz.com> wrote:
> 
> <snip>
> 
> > > > 6) all files in the /dist/ repository must have a .asc signature. We
> > > > will need to get this automated by the final release of Maven 2.
> >
> > > What about KEYS?
> >
> > Yes, standard distribution rules. I'm not sure if we need that in the
> > repo or just a URL from /dist/ at large - will see what comes of
> > commons-openpgp.
> 
> just FYI there was a feeling at apachecon from the infrastructure movers
> and shakers that KEYS files were an transitional expediency and that
> they would be removed at some point in the future.
> 
> BTW one of the neatest ideas i heard was to use the md5 sum to find the
> right release.
> 

yeah, I've looked at doing that...its effectively what .NET does with
the Global Assembly Cache and its 'strong names'. When you compile
something in .net it records the sha1 checksum of the lib you built
against, and that is what it tries to load later, by default looking
in the local path first (unlike JNI on windows, which uses
::LoadLibrary and not ::LoadLibraryEx for DLL lookup with a modified
path)

the GAC is an attempt to offer shared libraries without the
brittleness of the past, because you would always get the version you
built against; it is impossible to contaminate the GAC with a
conflicting version,


However, after spending some time with .net developers recently, I am
aware of two consequences of this design.

First, strong names suck during development. If you have a many
library project, you need to rebuild all apps when you change a
library if the app is using the strong name to match it. Its just too
strict.

Second, the GAC itself is having fun moving into the longhorn
timeframe. I cannot confirm this as I still have 1h29 of my 22hour
longhorn download to go, but believe they had to bend the rules a bit.
The apparent problem is that the net 1.x binaries needed changing to
integrate with .NET2.0 and longhorn, which means the base .net
binaries you get are not the ones you ask for by presenting their SHA1
key to the GAC. So i guess they have added redirects to the GAC , thus
breaking the whole "you get the library you ask for by its checksum"
rule, because that rule no longer held.

In Java, we have two other problems

1. adding this stuff on as an afterthought is hard. You will end up
asking for checksums over the phone; its a lot easier to tell if
log4j-1.2.8 is newer than log4j-1.2.9.

2. the checksum of a JAR changes after it is signed. If you adopt
checksum-only binding, you need to sign before binding. Yet things
like webstart expect everything signed, usually a late-binding
operation just before you ship.

-steve

Re: Maven repository policies

Posted by robert burrell donkin <rd...@apache.org>.
On Fri, 2005-07-29 at 12:34 +1000, Brett Porter wrote:
> On 7/27/05, Phil Steitz <ph...@steitz.com> wrote:

<snip>

> > > 6) all files in the /dist/ repository must have a .asc signature. We
> > > will need to get this automated by the final release of Maven 2.
> 
> > What about KEYS?
> 
> Yes, standard distribution rules. I'm not sure if we need that in the
> repo or just a URL from /dist/ at large - will see what comes of
> commons-openpgp.

just FYI there was a feeling at apachecon from the infrastructure movers
and shakers that KEYS files were an transitional expediency and that
they would be removed at some point in the future. 

BTW one of the neatest ideas i heard was to use the md5 sum to find the
right release. 

- robert

Re: Maven repository policies

Posted by Brett Porter <br...@gmail.com>.
On 7/27/05, Phil Steitz <ph...@steitz.com> wrote:
> > Some specific policies:
> I would strengthen this to say explicitly that only *released* artifacts
> can be in /dist/ and that we enforce the policy (ideally in an automated
> way) that what gets put there corresponds *exactly* to what was released.

+1

> I don't know enough about maven repository management to understand
> fully the implications of this, but it seems to me that leaving
> "jakarta" out of the name adds flexibility with no loss of specificity.
>   So, if one day commons is a TLP, we do not have to rename.  It seems
> more natural to me to mirror the package names, but again I don't claim
> to understand all of the issues here.

I was following the SVN structure, but I'm fine with it either way.
Releases stay where they are whether the project renames in later
versions or not (though you can move a release and leave a repository
pointer in the old position too).

Given that all the Jakarta projects are using org.apache.foo as their
Java package, and have eyes on being promoted to TLP one day, I'm
happy with matching the groupId to the package (its shorter, too :)

Groups can get deeper too if necessary. Eg, we have o.a.m.plugins as a
group. If a particular commons project intends to produce multiple
jars (jelly), I think o.a.commons.jelly should be a group.

This can get confusing when browsing the structure, I guess, but I
think we have every intention of setting up a proper search and
browsing capability at some point.

> > 5) all files must have an .md5 and .sha1 checksum. Maven deploys these
> > automatically, and I believe this is already monitored by a script which
> > we could crack down on violations of in the new repo.

> Out of curiousity, why both?  Does maven now generate both?

Yes. I'm happy to just do one or the other - but I get the feeling
that sha1 was a better choice, but that some tools might only be using
the md5.

> > 6) all files in the /dist/ repository must have a .asc signature. We
> > will need to get this automated by the final release of Maven 2.

> What about KEYS?

Yes, standard distribution rules. I'm not sure if we need that in the
repo or just a URL from /dist/ at large - will see what comes of
commons-openpgp.

> Ambivalent on this one - I don't see a compelling reason to separate
> archived releases in the repository.  

Just to ease the mirroring burden and to make the latest releases repo
more browsable.

> Is there an official apache
> archive policy? Maybe I am missing something.

I think for /dist/ proper you only retain the last release or two, and
everything is automatically in archive.apache.org.

Cheers,
Brett

Re: Maven repository policies

Posted by "Henk P. Penning" <he...@cs.uu.nl>.
On Tue, 26 Jul 2005, Phil Steitz wrote:

> Date: Tue, 26 Jul 2005 07:12:20 -0700
> From: Phil Steitz <ph...@steitz.com>
> To: repository@apache.org
> Subject: Re: Maven repository policies

> Brett Porter wrote:

> > - what should be the archive policy? People often use older releases
> > from the repository, and require them for historical builds. If we want
> > to archive them from this repository to archive.apache.org, they still
> > need to be on ibiblio, so this needs to be managed with the deletion
> > policy above.
> Ambivalent on this one - I don't see a compelling reason to separate
> archived releases in the repository.  Is there an official apache
> archive policy? Maybe I am missing something.

  Everything in 'dist/' is rsynced to 100+ mirrors plus many other
  sites that (periodically) pick up everything. Keeping 'dist/' small
  saves a lot in bandwidth and machine load due to rsync filesystem
  scans. Also, mirrors should only have to carry 'current' stuff,
  not a big chunk of apache history.

  Everything made available in 'www.apache.org/dist/' is
  automatically rsynced to 'archive.apache.org' (no deletes).
  The 'old' stuff is always available there.

  The policy is to keep 'current' stuff in 'dist/' (a few recent
  versions) and delete 'old' stuff from 'dist/' as soon as possible.

> Phil

  HPP

----------------------------------------------------------------   _
Henk P. Penning, Computer Systems Group       R Uithof CGN-A232  _/ \_
Dept of Computer Science, Utrecht University  T +31 30 253 4106 / \_/ \
Padualaan 14, 3584CH Utrecht, the Netherlands F +31 30 251 3791 \_/ \_/
http://www.cs.uu.nl/staff/henkp.html          M penning@cs.uu.nl  \_/


Re: Maven repository policies

Posted by Phil Steitz <ph...@steitz.com>.
Brett Porter wrote:
> Hi,
> 
> First of all, please do not retain the CC's on any responses to this.
> Folks interested in this topic, could you please subscribe and reply to
> the repository@apache.org list - thanks.
> 
> Ok, what I wanted to talk about here was establishing a new Maven 2
> repository for deployment of Apache redistributable artifacts, and any
> policies that should be enforced on that. The Maven 2 repository layout
> is currently described here:
> http://docs.codehaus.org/pages/viewpage.action?pageId=22230.
> 
> I would like to have something workable for everyone going from the
> outset. Here is what I am proposing - please let me know if you have any
> feedback, additional requirements, objections, etc.
> 
> This repository, which would be in parallel to the existing
> /dist/java-repository would be the place for Apache projects built with
> Maven 2 or Ant (using the Maven 2 deployment tasks) to drop individual
> libraries to. Maven 1 built projects can continue to use the existing
> java-repository - only one or the other should be used. Both will be
> rsynced to ibiblio and available to both Maven 1 & 2 clients.
> 
> I would suggest /dist/maven-repository as the location for this new
> repository (java is too specific - we hope to include more than java
> artifacts).
> 
> Some specific policies:
> 1) Clients would continue to point at ibiblio or a mirror, not directly
> at the Apache repository. In fact I have no problem with this either not
> being available over http if that assists in this goal, though it does
> make it easier to browse.
+1
> 
> 2) no development builds in /dist/. Currently, there is
> cvs.apache.org/repository for this in the Maven 1 style. A Maven 2
> repository of cvs.apache.org/nightly/maven-repository or similar would
> also need to be setup, though it would not be available to Maven 1
> clients. Maven 2 has built in controls to ensure development builds
> arrive at the correct repository (if setup). An alternative to having
> this repository on cvs.apache.org might be to have projects create their
> own in their zone, using whatever they use for nightly builds or
> continuous integration.

I would strengthen this to say explicitly that only *released* artifacts 
can be in /dist/ and that we enforce the policy (ideally in an automated 
way) that what gets put there corresponds *exactly* to what was released.
> 
> 3) require long groupId's. Currently the top level directory is polluted
> with a lot of different project names. I would like projects to use
> org.apache.project.subproject as the groupId so that it uses the
> directory structure /org/apache/project/subproject keeping individual
> directories with less files in them. The longest I can think of would be
> org.apache.jakarta.commons.collections. This should also be done for any
> future Maven 1 based releases (it works identically, but retains a
> shallow structure in the Maven 1 layout).

I don't know enough about maven repository management to understand 
fully the implications of this, but it seems to me that leaving 
"jakarta" out of the name adds flexibility with no loss of specificity. 
  So, if one day commons is a TLP, we do not have to rename.  It seems 
more natural to me to mirror the package names, but again I don't claim 
to understand all of the issues here.
> 
> 4) set permissions to group writable with the appropriate unix group
> governing modification of jakarta, for example. This is one area I'd
> like to investigate more to ensure we have proper controls on.
+1
> 
> 5) all files must have an .md5 and .sha1 checksum. Maven deploys these
> automatically, and I believe this is already monitored by a script which
> we could crack down on violations of in the new repo.
Out of curiousity, why both?  Does maven now generate both?
> 
> 6) all files in the /dist/ repository must have a .asc signature. We
> will need to get this automated by the final release of Maven 2.
What about KEYS?
> 
> Some things I would like to get more information on would be:
> - what is the best way to rsync this data from another machine?
> - what should the deletion policy be? I would think that something
> deleted from this repository should automatically be deleted from
> mirrors, the ibiblio Maven repositories and its mirrors.
+1
> - what should be the archive policy? People often use older releases
> from the repository, and require them for historical builds. If we want
> to archive them from this repository to archive.apache.org, they still
> need to be on ibiblio, so this needs to be managed with the deletion
> policy above.
Ambivalent on this one - I don't see a compelling reason to separate 
archived releases in the repository.  Is there an official apache 
archive policy? Maybe I am missing something.

Phil

Re: Maven repository policies

Posted by Niclas Hedhman <ni...@hedhman.org>.
On Monday 25 July 2005 07:25, Brett Porter wrote:
> This repository, which would be in parallel to the existing
> /dist/java-repository would be the place for Apache projects built with
> Maven 2 or Ant (using the Maven 2 deployment tasks) to drop individual
> libraries to. Maven 1 built projects can continue to use the existing
> java-repository - only one or the other should be used. Both will be
> rsynced to ibiblio and available to both Maven 1 & 2 clients.

So Maven1 will understand the new layout as well, and when an artifact is 
provided it will check both locations on each server ???

How come the use of dots in the group name, as usage of forward slash has 
worked at least abstractly and in Maven-repo capable other applications, 
iirc, even in Maven itself and the current flat structure only seems like a 
policy of the Maven team??
(see http://www.ibiblio.org/maven/avalon/  for examples)


Cheers
Niclas

Re: Maven repository policies

Posted by Steve Loughran <st...@gmail.com>.
On 7/26/05, Brett Porter <br...@gmail.com> wrote:
> On 7/25/05, Steve Loughran <st...@gmail.com> wrote:
> > +1
> >
> > if there is one problem here, it is finding stuff if you dont know its
> > origin. For example, if I am browsing for commons-lang, should i have
> > to know it is a jakarta-commons project?
> 
> I'm not sure where you started from here? Someone that has a
> commons-lang JAR file from somewhere else might have this problem.
> Someone with a project with that dependency can see where it came
> from, as can someone finding commons-lang through Jakarta (hopefully).

its more a matter of knowing the full path to a project as part of the
browse-for-a-version preamble. I like the idea of a search form
bettre.


> > > 6) all files in the /dist/ repository must have a .asc signature. We
> > > will need to get this automated by the final release of Maven 2.
> >
> > yes, we have a big security issue here waiting to be found and abused.
> 
> It's not quite that bad - for ASF artifacts it would still rely on the
> ASF box getting compromised. Though where pretty good at doing that
> ourselves (refer to commons-cli). Lots of tools needed in this area.

xerces is the juicy target. subvert that, insert a new PI in and then
every SOAP or REST endpoint that takes XML is owned by the backdoor:

<? owned exec="rm -rf /" ?>

If I had admin rights to the project, i'd just hide the attack in the
source of a 100+KB commit, so have the email skipped, a binary attack
downstream is less ubiquitous, but potentially easier. What is worse,
any attack will destroy trust in the repository system. (a source
attack would damage apache as a whole, if it wasnt caught fast)

> 
> > 7. All distributable files must have a .pom, even if it is a minimal
> > (zero dependency) one. Without this the maven tasks break.
> 
> Not any more :)

aha.

> 
> > Some script also needs to run through all files, get their poms and
> > verify that the dependencies can all be satisfied by the repositories,
> 
> http://jira.codehaus.org/browse/MRM-2

yes, a graph walking tool.

> 
> > or that somehow the artifacts are marked as non-uploadable (eg. sun
> > jars with limited distribution)
> 
> Sun artifacts now have POMs so that they can be identified (though
> they just contain a download link).
> 
> > I also worry about pom files with excessively tight dependencies to
> > things like xml parser implementations (e.g. jdom demands dom4j that
> > isnt found).
> 
> Yes, though I think the metadata is improving through increased use.
> Spec. dependencies (which have been left for a bit later) will help
> with the xml/j2ee/etc dependencies.
> 
> Nice job on the presentation, btw. - caught it on your blog.

you might also be interested to know that smartfrog-xml is the first
project using maven tasks in gump. gump is doing the fetch but
sticking to its own classpath.

-steve

Re: Maven repository policies

Posted by Brett Porter <br...@gmail.com>.
On 7/25/05, Steve Loughran <st...@gmail.com> wrote:
> +1
> 
> if there is one problem here, it is finding stuff if you dont know its
> origin. For example, if I am browsing for commons-lang, should i have
> to know it is a jakarta-commons project? 

I'm not sure where you started from here? Someone that has a
commons-lang JAR file from somewhere else might have this problem.
Someone with a project with that dependency can see where it came
from, as can someone finding commons-lang through Jakarta (hopefully).

> As browsing is usually a
> preamble to determining which versions are available, we may need a
> human usable search interface to go with the nested layout.

Absolutely. There is actually a M1 search hosted by someone else, and
we hope to develop a more comprehensive one for m2.

> > 6) all files in the /dist/ repository must have a .asc signature. We
> > will need to get this automated by the final release of Maven 2.
> 
> yes, we have a big security issue here waiting to be found and abused.

It's not quite that bad - for ASF artifacts it would still rely on the
ASF box getting compromised. Though where pretty good at doing that
ourselves (refer to commons-cli). Lots of tools needed in this area.

> 7. All distributable files must have a .pom, even if it is a minimal
> (zero dependency) one. Without this the maven tasks break.

Not any more :)

> Some script also needs to run through all files, get their poms and
> verify that the dependencies can all be satisfied by the repositories,

http://jira.codehaus.org/browse/MRM-2

> or that somehow the artifacts are marked as non-uploadable (eg. sun
> jars with limited distribution)

Sun artifacts now have POMs so that they can be identified (though
they just contain a download link).

> I also worry about pom files with excessively tight dependencies to
> things like xml parser implementations (e.g. jdom demands dom4j that
> isnt found).

Yes, though I think the metadata is improving through increased use.
Spec. dependencies (which have been left for a bit later) will help
with the xml/j2ee/etc dependencies.

Nice job on the presentation, btw. - caught it on your blog.

Thanks,
Brett

Re: Maven repository policies

Posted by Steve Loughran <st...@gmail.com>.
On 7/25/05, Brett Porter <br...@apache.org> wrote:

> 
> 3) require long groupId's. Currently the top level directory is polluted
> with a lot of different project names. I would like projects to use
> org.apache.project.subproject as the groupId so that it uses the
> directory structure /org/apache/project/subproject keeping individual
> directories with less files in them. The longest I can think of would be
> org.apache.jakarta.commons.collections. This should also be done for any
> future Maven 1 based releases (it works identically, but retains a
> shallow structure in the Maven 1 layout).

+1 

if there is one problem here, it is finding stuff if you dont know its
origin. For example, if I am browsing for commons-lang, should i have
to know it is a jakarta-commons project? As browsing is usually a
preamble to determining which versions are available, we may need a
human usable search interface to go with the nested layout.


> 
> 4) set permissions to group writable with the appropriate unix group
> governing modification of jakarta, for example. This is one area I'd
> like to investigate more to ensure we have proper controls on.
> 
> 5) all files must have an .md5 and .sha1 checksum. Maven deploys these
> automatically, and I believe this is already monitored by a script which
> we could crack down on violations of in the new repo.
> 
> 6) all files in the /dist/ repository must have a .asc signature. We
> will need to get this automated by the final release of Maven 2.

yes, we have a big security issue here waiting to be found and abused. 

7. All distributable files must have a .pom, even if it is a minimal
(zero dependency) one. Without this the maven tasks break.

Some script also needs to run through all files, get their poms and
verify that the dependencies can all be satisfied by the repositories,
or that somehow the artifacts are marked as non-uploadable (eg. sun
jars with limited distribution)

I also worry about pom files with excessively tight dependencies to
things like xml parser implementations (e.g. jdom demands dom4j that
isnt found).

> 
> Some things I would like to get more information on would be:
> - what is the best way to rsync this data from another machine?
> - what should the deletion policy be? I would think that something
> deleted from this repository should automatically be deleted from
> mirrors, the ibiblio Maven repositories and its mirrors.
> - what should be the archive policy? People often use older releases
> from the repository, and require them for historical builds. If we want
> to archive them from this repository to archive.apache.org, they still
> need to be on ibiblio, so this needs to be managed with the deletion
> policy above.
> 

Good point. Maybe we need a server of very old stuff (5+ years) that
is the last resort for old stuff.

Re: Maven repository policies

Posted by Brett Porter <br...@gmail.com>.
On 12/27/05, Henk P. Penning <he...@apache.org> wrote:
>   I noticed that the first artifacts are appearing in 'maven-repository'.
>   The stuff isn't signed, and I don't see a 'deployment/cleanup hook' ;
>   Am I missing something ?

I will chase them up. I told them they had to sign them manually, and
it hasn't happened.

I think we might need to rethink whether this belongs in /dist/ if it
needs to be cleaned up. The repository is meant to house historical
versions so that builds remain reproducible. Maybe the better
alternative is to move them directly to archive.apache.org, or move
them to a completely separate web server so that there is only one
copy, and the mirroring is all done through Maven's repository system
rather than the Apache mirror system. Thoughts?

Hope this helps.

Cheers,
Brett

Re: Maven repository policies

Posted by "Henk P. Penning" <he...@apache.org>.
On Fri, 29 Jul 2005, Brett Porter wrote:

> Date: Fri, 29 Jul 2005 12:47:59 +1000
> From: Brett Porter <br...@gmail.com>
> To: repository@apache.org
> Cc: henkp@apache.org
> Subject: Re: Maven repository policies
>
> Thanks for this Henk. Questions:
>
> On 7/27/05, Henk P. Penning <he...@apache.org> wrote:
> >   My point of departure (assumptions) :
> >
> >   -- The cleanup problems in 'java-repository' aren't solved
> >   -- If putting stuff in an repository is automated, then repository
> >      cleanup should be automated at least as well
>
> By this, do you mean the problem you refer to below or are there other
> cleanup issues?
>
> >   -- Every object X in the repository is derived from an object Y
> >      in 'dist/' (not 'dist/{java,maven}-repository/').
> >      If Y disappears from '/dist', X should disapear also
> >
> >   To keep things simple, I propose that for every artifact X in
> >   the repository there should a file X.par, containing a pointer
> >   to the parent in 'dist/'.
>
> I'm fine with doing this. We can add a deployment hook on ASF
> artifacts to push that file. By pointer, I assume it would be a URL to
> the original, which a script would replace http:// with /www/ to find
> the file location.

Hi,

  I noticed that the first artifacts are appearing in 'maven-repository'.
  The stuff isn't signed, and I don't see a 'deployment/cleanup hook' ;
  Am I missing something ?

  The past weeks I spent a lot of time nagging people to sign stuff
  and remove obsolete versions ; some 450 out of 1600 artifacts are
  now signed. The repo is still riddled with obsolete stuff ; even
  with automatic archiving on archive.apache.org, cleaning up the
  repo (and keeping it clean) is much too hard, even after being
  on the agenda for over a year...

  I would like to see these issues (pgp signing, cleanup hooks and
  procedures) resolved before the maven-repository goes in production.

> Brett

  Regards,

  Henk Penning

----------------------------------------------------------------   _
Henk P. Penning, Computer Systems Group       R Uithof CGN-A232  _/ \_
Dept of Computer Science, Utrecht University  T +31 30 253 4106 / \_/ \
Padualaan 14, 3584CH Utrecht, the Netherlands F +31 30 251 3791 \_/ \_/
http://www.cs.uu.nl/staff/henkp.html          M penning@cs.uu.nl  \_/


Re: Maven repository policies

Posted by Brett Porter <br...@gmail.com>.
Thanks for this Henk. Questions:

On 7/27/05, Henk P. Penning <he...@apache.org> wrote:
>   My point of departure (assumptions) :
> 
>   -- The cleanup problems in 'java-repository' aren't solved
>   -- If putting stuff in an repository is automated, then repository
>      cleanup should be automated at least as well

By this, do you mean the problem you refer to below or are there other
cleanup issues?

>   -- Every object X in the repository is derived from an object Y
>      in 'dist/' (not 'dist/{java,maven}-repository/').
>      If Y disappears from '/dist', X should disapear also
> 
>   To keep things simple, I propose that for every artifact X in
>   the repository there should a file X.par, containing a pointer
>   to the parent in 'dist/'.

I'm fine with doing this. We can add a deployment hook on ASF
artifacts to push that file. By pointer, I assume it would be a URL to
the original, which a script would replace http:// with /www/ to find
the file location.

Just for completeness, what about the reverse relationship?
- distributions are deployed alongside the jars in the repository
- elements in /dist/ are symlinks to the elements in the repository
- to pull a release, remove it from the repository
- scripts remove dead symlinks

Advantages: it's easier for Maven built projects, and it would
encourage other projects to ensure their distributions are in the
Maven repository rather than chasing them later. It's a bit cleaner
than adding .par files to everything.

Disadvantages: it's more maven centric so it is inconvenient/not
relevant to some projects and that means there are inconsistent places
to deploy to for different projects

I expect as we move shell accounts off minotaur we will need some way
to publish releases, and that could take care of  establishing the
layout though without inconveniencing anyone doing the deployment.

What are your thoughts?

>   I think that would be a nice policy for stuff that is rsynced
>   to 100+ mirrors.

Absolutely. Thanks!

Cheers,
Brett

Re: Maven repository policies

Posted by "Henk P. Penning" <he...@apache.org>.
On Mon, 25 Jul 2005, Brett Porter wrote:

> Date: Mon, 25 Jul 2005 09:25:57 +1000
> From: Brett Porter <br...@apache.org>
> To: repository@apache.org
> Cc: Apache Infrastructure <in...@apache.org>,
>      Maven Project Management Committee List <pm...@maven.apache.org>
> Subject: Maven repository policies
>
> Hi,
>
> First of all, please do not retain the CC's on any responses to this.
> Folks interested in this topic, could you please subscribe and reply to
> the repository@apache.org list - thanks.
>
> Ok, what I wanted to talk about here was establishing a new Maven 2
> repository for deployment of Apache redistributable artifacts, and any
> policies that should be enforced on that. The Maven 2 repository layout
> is currently described here:
> http://docs.codehaus.org/pages/viewpage.action?pageId=22230.

  It says:

    For each file that is the repository there must be a file containing
    the checksum of the file, typically md5 or sha1. There may also be a
    digital signature (eg .asc an ascii armoured openpgp signature)

> I would like to have something workable for everyone going from the
> outset. Here is what I am proposing - please let me know if you have any
> feedback, additional requirements, objections, etc.

  My point of departure (assumptions) :

  -- The cleanup problems in 'java-repository' aren't solved
  -- If putting stuff in an repository is automated, then repository
     cleanup should be automated at least as well
  -- Every object X in the repository is derived from an object Y
     in 'dist/' (not 'dist/{java,maven}-repository/').
     If Y disappears from '/dist', X should disapear also

  To keep things simple, I propose that for every artifact X in
  the repository there should a file X.par, containing a pointer
  to the parent in 'dist/'.

  -- artifacts without a .par file are removed
  -- artifacts without a parent in 'dist/' are removed

  This maintenance procedure must be operational before the
  repository is established, running daily.

  The result is that :

  -- anyone can put artifacts, derived from stuff in 'dist/',
     in the repository.
  -- if the pmc of a project decides to withdraw a distribution
     from 'dist/', the artifacts derived from that distribution
     disappear from the repository.

  I think that would be a nice policy for stuff that is rsynced
  to 100+ mirrors.

> Brett

  Just my two cents ; regards,

  HPP

----------------------------------------------------------------   _
Henk P. Penning, Computer Systems Group       R Uithof CGN-A232  _/ \_
Dept of Computer Science, Utrecht University  T +31 30 253 4106 / \_/ \
Padualaan 14, 3584CH Utrecht, the Netherlands F +31 30 251 3791 \_/ \_/
http://www.cs.uu.nl/staff/henkp.html          M penning@cs.uu.nl  \_/