You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@archiva.apache.org by Joakim Erdfelt <jo...@erdfelt.com> on 2007/08/15 02:14:23 UTC

MRM-463 - metadata handling / merging

MRM-463 has a bunch of unanswered questions for me.

[ link for the lazy to use http://jira.codehaus.org/browse/MRM-463 ]

.\ Synopsis \.

We have to maintain a sane metadata.xml for the m2 clients.


.\ Details \.

There are 2 major kinds of metadata.xml from what I can see.


.\ Metadata Type 1: [ groupId:artifactId ] \.

One that is obtained at the groupId:artifactId level, and contains a set 
of available versions for a specific artifactId.  With a link to the 
'current' or 'latest' version.


Example 1: 
http://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/maven-metadata.xml

<metadata>
  <groupId>commons-beanutils</groupId>
  <artifactId>commons-beanutils</artifactId>
  <version>1.0</version>
  <versioning>
    <versions>
      <version>1.0</version>
      <version>1.2</version>
      <version>1.3</version>
      <version>1.4</version>
      <version>1.4-dev</version>
      <version>1.4.1</version>
      <version>1.5</version>
      <version>1.6</version>
      <version>1.6.1</version>
      <version>1.7-dev</version>
      <version>1.7.0</version>
      <version>20020520</version>
      <version>20021128.082114</version>
      <version>20030211.134440</version>
      <version>dev</version>
    </versions>
  </versioning>
</metadata>

This example is actually bad IMO, as the top level version 
/metadata/version element isn't the latest version, or the current 
version, or even the last uploaded version.


.\ Metadata Type 2: [ groupId:artifactId:version ] \.

This type is version specific.

So far, most released artifacts have this in their directory, but it is 
not terribly useful IMO.

Example 2: 
http://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.6.1/maven-metadata.xml

<metadata>
  <groupId>commons-beanutils</groupId>
  <artifactId>commons-beanutils</artifactId>
  <version>1.6.1</version>
</metadata>

Not very exciting, quite easy actually.

But when we deal with snapshots, it becomes critical.

First we'll look at an artifact with Timestamped snapshots.
Example 3: 
http://snapshots.repository.codehaus.org/org/codehaus/xfire/xfire-core/1.2-SNAPSHOT/maven-metadata.xml

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
  <groupId>org.codehaus.xfire</groupId>
  <artifactId>xfire-core</artifactId>
  <version>1.2-SNAPSHOT</version>
  <versioning>
    <snapshot>
      <timestamp>20070612.101111</timestamp>
      <buildNumber>63</buildNumber>
    </snapshot>
    <lastUpdated>20070612101133</lastUpdated>
  </versioning>
</metadata>

Next, here is an example without Timestamped snapshots.
Example 4: 
http://snapshots.repository.codehaus.org/org/codehaus/groovy/groovy/1.1-beta-2-SNAPSHOT/maven-metadata.xml

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
  <groupId>org.codehaus.groovy</groupId>
  <artifactId>groovy</artifactId>
  <version>1.1-beta-2-SNAPSHOT</version>
  <versioning>
    <snapshot>
      <buildNumber>2</buildNumber>
    </snapshot>
    <lastUpdated>20070616042726</lastUpdated>
  </versioning>
</metadata>

Next, here is an example of an artifact with Timestamped and non 
Timestamped artifacts.  To see this you'll need to browse the directory: 
http://people.apache.org/repo/m2-snapshot-repository/org/apache/cocoon/cocoon-ajax/1-SNAPSHOT/ 
(course, this is appears to be a case of someone uploading their local 
repository)

Example 5: 
http://people.apache.org/repo/m2-snapshot-repository/org/apache/cocoon/cocoon-ajax/1-SNAPSHOT/maven-metadata.xml

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
  <groupId>org.apache.cocoon</groupId>
  <artifactId>cocoon-ajax</artifactId>
  <version>1-SNAPSHOT</version>
  <versioning>
    <snapshot>
      <timestamp>20060728.031822</timestamp>
      <buildNumber>10</buildNumber>
    </snapshot>
    <lastUpdated>20060728031823</lastUpdated>
  </versioning>
</metadata>


.\ Areas of Concern \.

Alright, I think we have few areas of focus around this.

1) The repository consumer run to ensure that a metadata.xml file exists.
2) The database consumer run to ensure that the contents of the metadata.xml
   are sane based on the list of available versions in the database.
3) When proxying content from a remote repository for a released artifact,
   the type 1 groupId:artifactId needs to be updated to reflect the new
   artifactId:version that was just downloaded.
4) When proxying content from a remote repository for a snapshot artifact,
   the proxy mechanism needs to pull the remote metadata.xml to determine
   what actual artifactId:version to pull.  Is it timestamped or not?
5) When proxying content from a remote repository for a snapshot artifact,
   the managed repository needs to have the current metadata.xml for
   remote repository?
6) When presenting to the user browsing the repository, do we show what is
   in the managed repository, or the full list of potential versions 
from all
   downstream remote repositories too?


.\ Ideas \.

I think it would be a good idea to adopt what happens in the local 
repository
now, and have maven-metadata-${remote_repo_id}.xml files in the managed
repository that the proxy mechanism keeps up to date, and the seperate merge
mechanism utilizes to keep the managed repository metadata.xml as accurate
as possible with all potential versions available.

WDYT?

--
- Joakim Erdfelt
  joakime@apache.org
  joakim@erdfelt.com

Re: MRM-463 - metadata handling / merging

Posted by Maria Odea Ching <oc...@exist.com>.
Joakim Erdfelt wrote:
> MRM-463 has a bunch of unanswered questions for me.
>
> [ link for the lazy to use http://jira.codehaus.org/browse/MRM-463 ]
>
> .\ Synopsis \.
>
> We have to maintain a sane metadata.xml for the m2 clients.
>
>
> .\ Details \.
>
> There are 2 major kinds of metadata.xml from what I can see.
>
>
> .\ Metadata Type 1: [ groupId:artifactId ] \.
>
> One that is obtained at the groupId:artifactId level, and contains a 
> set of available versions for a specific artifactId.  With a link to 
> the 'current' or 'latest' version.
>
>
> Example 1: 
> http://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/maven-metadata.xml 
>
>
> <metadata>
>  <groupId>commons-beanutils</groupId>
>  <artifactId>commons-beanutils</artifactId>
>  <version>1.0</version>
>  <versioning>
>    <versions>
>      <version>1.0</version>
>      <version>1.2</version>
>      <version>1.3</version>
>      <version>1.4</version>
>      <version>1.4-dev</version>
>      <version>1.4.1</version>
>      <version>1.5</version>
>      <version>1.6</version>
>      <version>1.6.1</version>
>      <version>1.7-dev</version>
>      <version>1.7.0</version>
>      <version>20020520</version>
>      <version>20021128.082114</version>
>      <version>20030211.134440</version>
>      <version>dev</version>
>    </versions>
>  </versioning>
> </metadata>
>
> This example is actually bad IMO, as the top level version 
> /metadata/version element isn't the latest version, or the current 
> version, or even the last uploaded version.
>
>
> .\ Metadata Type 2: [ groupId:artifactId:version ] \.
>
> This type is version specific.
>
> So far, most released artifacts have this in their directory, but it 
> is not terribly useful IMO.
>
> Example 2: 
> http://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.6.1/maven-metadata.xml 
>
>
> <metadata>
>  <groupId>commons-beanutils</groupId>
>  <artifactId>commons-beanutils</artifactId>
>  <version>1.6.1</version>
> </metadata>
>
> Not very exciting, quite easy actually.
>
> But when we deal with snapshots, it becomes critical.
>
> First we'll look at an artifact with Timestamped snapshots.
> Example 3: 
> http://snapshots.repository.codehaus.org/org/codehaus/xfire/xfire-core/1.2-SNAPSHOT/maven-metadata.xml 
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <metadata>
>  <groupId>org.codehaus.xfire</groupId>
>  <artifactId>xfire-core</artifactId>
>  <version>1.2-SNAPSHOT</version>
>  <versioning>
>    <snapshot>
>      <timestamp>20070612.101111</timestamp>
>      <buildNumber>63</buildNumber>
>    </snapshot>
>    <lastUpdated>20070612101133</lastUpdated>
>  </versioning>
> </metadata>
>
> Next, here is an example without Timestamped snapshots.
> Example 4: 
> http://snapshots.repository.codehaus.org/org/codehaus/groovy/groovy/1.1-beta-2-SNAPSHOT/maven-metadata.xml 
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <metadata>
>  <groupId>org.codehaus.groovy</groupId>
>  <artifactId>groovy</artifactId>
>  <version>1.1-beta-2-SNAPSHOT</version>
>  <versioning>
>    <snapshot>
>      <buildNumber>2</buildNumber>
>    </snapshot>
>    <lastUpdated>20070616042726</lastUpdated>
>  </versioning>
> </metadata>
>
> Next, here is an example of an artifact with Timestamped and non 
> Timestamped artifacts.  To see this you'll need to browse the 
> directory: 
> http://people.apache.org/repo/m2-snapshot-repository/org/apache/cocoon/cocoon-ajax/1-SNAPSHOT/ 
> (course, this is appears to be a case of someone uploading their local 
> repository)
>
> Example 5: 
> http://people.apache.org/repo/m2-snapshot-repository/org/apache/cocoon/cocoon-ajax/1-SNAPSHOT/maven-metadata.xml 
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <metadata>
>  <groupId>org.apache.cocoon</groupId>
>  <artifactId>cocoon-ajax</artifactId>
>  <version>1-SNAPSHOT</version>
>  <versioning>
>    <snapshot>
>      <timestamp>20060728.031822</timestamp>
>      <buildNumber>10</buildNumber>
>    </snapshot>
>    <lastUpdated>20060728031823</lastUpdated>
>  </versioning>
> </metadata>

I think I missed updating the metadata file in the artifactId/version 
level for the repository purge..
I have to create a new jira for it then.

Thanks for putting up these examples :)

>
>
> .\ Areas of Concern \.
>
> Alright, I think we have few areas of focus around this.
>
> 1) The repository consumer run to ensure that a metadata.xml file exists.
> 2) The database consumer run to ensure that the contents of the 
> metadata.xml
>   are sane based on the list of available versions in the database.
> 3) When proxying content from a remote repository for a released 
> artifact,
>   the type 1 groupId:artifactId needs to be updated to reflect the new
>   artifactId:version that was just downloaded.
> 4) When proxying content from a remote repository for a snapshot 
> artifact,
>   the proxy mechanism needs to pull the remote metadata.xml to determine
>   what actual artifactId:version to pull.  Is it timestamped or not?
> 5) When proxying content from a remote repository for a snapshot 
> artifact,
>   the managed repository needs to have the current metadata.xml for
>   remote repository?
> 6) When presenting to the user browsing the repository, do we show 
> what is
>   in the managed repository, or the full list of potential versions 
> from all
>   downstream remote repositories too?
>
>
> .\ Ideas \.
>
> I think it would be a good idea to adopt what happens in the local 
> repository
> now, and have maven-metadata-${remote_repo_id}.xml files in the managed
> repository that the proxy mechanism keeps up to date, and the seperate 
> merge
> mechanism utilizes to keep the managed repository metadata.xml as 
> accurate
> as possible with all potential versions available.
>
> WDYT?

+1 to this.. I've just read the discussions between you and Brett, and 
everything seems to have been ironed out :)

>
> -- 
> - Joakim Erdfelt
>  joakime@apache.org
>  joakim@erdfelt.com
>

Thanks,
Deng



Re: MRM-463 - metadata handling / merging

Posted by Brett Porter <br...@apache.org>.
On 15/08/2007, at 12:34 PM, Joakim Erdfelt wrote:

> Ok. I'll chalk this up as a maven 2 anomaly, we'll likely need to  
> fill that element out just to avoid a parsing error on the maven 2  
> side no?

I don't think so, but worth checking.

>>
>> No idea, but if you look at a recent release, it's not generated:  
>> http://people.apache.org/~oching/stage-repo/org/apache/maven/ 
>> archiva/archiva-lucene-consumers/1.0-beta-1/
>>
>> Maybe it depends on your version of Maven/deploy plugin.
>
> Curses.  Using an Archiva release against me. ;-)
>
> So noted.  Chalked up as not-important.
> Should we consider a file like this as "out of place" or "bad" for  
> the repository problem reports then?

nah... as long as it is valid.

>
>>
>>>>> 1) The repository consumer run to ensure that a metadata.xml  
>>>>> file exists.
>>>>
>>>> if it's required.
>>>
>>> Can you expand on what you mean by 'required' ?
>>
>> Same as the point above - the metadata without snapshots is kind  
>> of useless, so we may not require it in the version directory.
>
> Again. useless metadat.xml files encountered.  Do we make a  
> repository problem report for this too?

I don't think so, just a problem if it isn't there when it has to be.

> But you understand what I'm getting at though?

Kind of - just short on bandwidth right now. We can do the wiki  
proposal thing to get through it more completely.

>
> One last thing I thought about.
>
> If we have a set of maven-metadata-${remote_repo_id}.xml files,  
> should we make sure those files are updated on a schedule (like a  
> database consumer) or just in time, when the proxy request occurs?

just in time since it involves network behaviour (as we go beyond 1.0  
and look at pre-emptive syncing that might change, and I suspect that  
ties in to the previous point).

Cheers,
Brett

Re: MRM-463 - metadata handling / merging

Posted by Joakim Erdfelt <jo...@erdfelt.com>.
Brett Porter wrote:
>
> On 15/08/2007, at 12:19 PM, Joakim Erdfelt wrote:
>
>> Brett Porter wrote:
>>>
>>> On 15/08/2007, at 10:14 AM, Joakim Erdfelt wrote:
>>>
>>>>
>>>> This example is actually bad IMO, as the top level version 
>>>> /metadata/version element isn't the latest version, or the current 
>>>> version, or even the last uploaded version.
>>>
>>> Maven actually ignores it - it's a deployment bug. You can safely 
>>> omit it.
>>
>> Would it be wise to be a good repository citizen and keep that field 
>> updated?
>
> No - it doesn't make any sense (it's not meant to reside in that 
> directory). The <release> tag is probably the one to update.

Ok. I'll chalk this up as a maven 2 anomaly, we'll likely need to fill 
that element out just to avoid a parsing error on the maven 2 side no?

>
>>
>>>
>>>> Example 2: 
>>>> http://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.6.1/maven-metadata.xml 
>>>>
>>>>
>>>> <metadata>
>>>>  <groupId>commons-beanutils</groupId>
>>>>  <artifactId>commons-beanutils</artifactId>
>>>>  <version>1.6.1</version>
>>>> </metadata>
>>>>
>>>> Not very exciting, quite easy actually.
>>>
>>> Maven deployment doesn't generate this, so whether we do or not 
>>> doesn't really matter
>>
>> Interesting.  Who generates this then?  As it's in every versioned 
>> artifact I look at.
>
> No idea, but if you look at a recent release, it's not generated: 
> http://people.apache.org/~oching/stage-repo/org/apache/maven/archiva/archiva-lucene-consumers/1.0-beta-1/ 
>
>
> Maybe it depends on your version of Maven/deploy plugin.

Curses.  Using an Archiva release against me. ;-)

So noted.  Chalked up as not-important.
Should we consider a file like this as "out of place" or "bad" for the 
repository problem reports then?

>
>>>> 1) The repository consumer run to ensure that a metadata.xml file 
>>>> exists.
>>>
>>> if it's required.
>>
>> Can you expand on what you mean by 'required' ?
>
> Same as the point above - the metadata without snapshots is kind of 
> useless, so we may not require it in the version directory.

Again. useless metadat.xml files encountered.  Do we make a repository 
problem report for this too?

>
>>
>>>
>>>> 2) The database consumer run to ensure that the contents of the 
>>>> metadata.xml
>>>>   are sane based on the list of available versions in the database.
>>>
>>> if it exists
>>
>> You mean, if the maven-metadata.xml exists, then update it.  But if 
>> it doesn't exist, don't update it?
>> I would think you would always want a maven-metadata.xml file.  Am I 
>> wrong in that assumption?
>
> Just the same as above. Certainly if it should exist, create it.
>
>>>
>>>> 6) When presenting to the user browsing the repository, do we show 
>>>> what is
>>>>   in the managed repository, or the full list of potential versions 
>>>> from all
>>>>   downstream remote repositories too?
>>>
>>> in 1.0, what is in the repository. Beyond, would be nice to have 
>>> what is remote (though it should be marked up). That gets more 
>>> complicated though, for cases where you haven't retrieved any 
>>> artifacts yet - you basically need a whole repository index update 
>>> from remote.
>>
>> I thought about the whole repository index question, I don't think 
>> that's necessary.  Lemme explain ...
>> I think that if you have a project that utilizes a specific 
>> groupId:artifactId, then you have shown an interest in that specific 
>> groupId:artifactId, and knowing about new releases, etc.. would be a 
>> good idea.  Could prove to be a useful avenue for a future RSS feed, 
>> or maven-am-i-up-to-date-plugin, or even some IDE integration angle. no?
>
> would need to think about this some more - all I think right now is 
> post-1.0 :)
Yep. those 3 ideas are all post-1.0 (Future) concepts.
But you understand what I'm getting at though?

One last thing I thought about.

If we have a set of maven-metadata-${remote_repo_id}.xml files, should 
we make sure those files are updated on a schedule (like a database 
consumer) or just in time, when the proxy request occurs?

- Joakim

Re: MRM-463 - metadata handling / merging

Posted by Brett Porter <br...@apache.org>.
On 15/08/2007, at 12:19 PM, Joakim Erdfelt wrote:

> Brett Porter wrote:
>>
>> On 15/08/2007, at 10:14 AM, Joakim Erdfelt wrote:
>>
>>>
>>> This example is actually bad IMO, as the top level version / 
>>> metadata/version element isn't the latest version, or the current  
>>> version, or even the last uploaded version.
>>
>> Maven actually ignores it - it's a deployment bug. You can safely  
>> omit it.
>
> Would it be wise to be a good repository citizen and keep that  
> field updated?

No - it doesn't make any sense (it's not meant to reside in that  
directory). The <release> tag is probably the one to update.

>
>>
>>> Example 2: http://repo1.maven.org/maven2/commons-beanutils/ 
>>> commons-beanutils/1.6.1/maven-metadata.xml
>>>
>>> <metadata>
>>>  <groupId>commons-beanutils</groupId>
>>>  <artifactId>commons-beanutils</artifactId>
>>>  <version>1.6.1</version>
>>> </metadata>
>>>
>>> Not very exciting, quite easy actually.
>>
>> Maven deployment doesn't generate this, so whether we do or not  
>> doesn't really matter
>
> Interesting.  Who generates this then?  As it's in every versioned  
> artifact I look at.

No idea, but if you look at a recent release, it's not generated:  
http://people.apache.org/~oching/stage-repo/org/apache/maven/archiva/ 
archiva-lucene-consumers/1.0-beta-1/

Maybe it depends on your version of Maven/deploy plugin.

>>> 1) The repository consumer run to ensure that a metadata.xml file  
>>> exists.
>>
>> if it's required.
>
> Can you expand on what you mean by 'required' ?

Same as the point above - the metadata without snapshots is kind of  
useless, so we may not require it in the version directory.

>
>>
>>> 2) The database consumer run to ensure that the contents of the  
>>> metadata.xml
>>>   are sane based on the list of available versions in the database.
>>
>> if it exists
>
> You mean, if the maven-metadata.xml exists, then update it.  But if  
> it doesn't exist, don't update it?
> I would think you would always want a maven-metadata.xml file.  Am  
> I wrong in that assumption?

Just the same as above. Certainly if it should exist, create it.

>
>>
>>> 3) When proxying content from a remote repository for a released  
>>> artifact,
>>>   the type 1 groupId:artifactId needs to be updated to reflect  
>>> the new
>>>   artifactId:version that was just downloaded.
>>
>> I don't understand this?
>
> This is the Type 1, Example 1 reference.
> If we download due a proxy request version 2.1 of commons-lang, but  
> our managed repository local maven-metadata.xml file doesn't have  
> it listed, then add that version to the maven-metadata.xml file.

+1

>>
>>> 6) When presenting to the user browsing the repository, do we  
>>> show what is
>>>   in the managed repository, or the full list of potential  
>>> versions from all
>>>   downstream remote repositories too?
>>
>> in 1.0, what is in the repository. Beyond, would be nice to have  
>> what is remote (though it should be marked up). That gets more  
>> complicated though, for cases where you haven't retrieved any  
>> artifacts yet - you basically need a whole repository index update  
>> from remote.
>
> I thought about the whole repository index question, I don't think  
> that's necessary.  Lemme explain ...
> I think that if you have a project that utilizes a specific  
> groupId:artifactId, then you have shown an interest in that  
> specific groupId:artifactId, and knowing about new releases, etc..  
> would be a good idea.  Could prove to be a useful avenue for a  
> future RSS feed, or maven-am-i-up-to-date-plugin, or even some IDE  
> integration angle. no?

would need to think about this some more - all I think right now is  
post-1.0 :)

Cheers,
Brett

Re: MRM-463 - metadata handling / merging

Posted by Joakim Erdfelt <jo...@erdfelt.com>.
Brett Porter wrote:
>
> On 15/08/2007, at 10:14 AM, Joakim Erdfelt wrote:
>
>>
>> This example is actually bad IMO, as the top level version 
>> /metadata/version element isn't the latest version, or the current 
>> version, or even the last uploaded version.
>
> Maven actually ignores it - it's a deployment bug. You can safely omit 
> it.

Would it be wise to be a good repository citizen and keep that field 
updated?

>
>> Example 2: 
>> http://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.6.1/maven-metadata.xml 
>>
>>
>> <metadata>
>>  <groupId>commons-beanutils</groupId>
>>  <artifactId>commons-beanutils</artifactId>
>>  <version>1.6.1</version>
>> </metadata>
>>
>> Not very exciting, quite easy actually.
>
> Maven deployment doesn't generate this, so whether we do or not 
> doesn't really matter

Interesting.  Who generates this then?  As it's in every versioned 
artifact I look at.

>
>>
>> But when we deal with snapshots, it becomes critical.
>
> Correct.
>
> I would ensure that we generate it properly for timestamped artifacts. 
> I'm not exactly sure how maven deals with non-timestamped artifacts 
> when the metadata exists (the two other cases you listed where it was 
> mixed or always not timestamped but had metadata anyway) - probably 
> needs some investigation as to the best to generate here, but leaving 
> it out is probably the best option (in the mixed case, we need to 
> figure out how to decide when to leave it out - I'd say if the last 
> mod time of the -SNAPSHOT file is > all other timestamps in the 
> directory, or otherwise just make that a repository warning and ignore 
> one or the other).
>
>> .\ Areas of Concern \.
>>
>> Alright, I think we have few areas of focus around this.
>>
>> 1) The repository consumer run to ensure that a metadata.xml file 
>> exists.
>
> if it's required.

Can you expand on what you mean by 'required' ?

>
>> 2) The database consumer run to ensure that the contents of the 
>> metadata.xml
>>   are sane based on the list of available versions in the database.
>
> if it exists

You mean, if the maven-metadata.xml exists, then update it.  But if it 
doesn't exist, don't update it?
I would think you would always want a maven-metadata.xml file.  Am I 
wrong in that assumption?

>
>> 3) When proxying content from a remote repository for a released 
>> artifact,
>>   the type 1 groupId:artifactId needs to be updated to reflect the new
>>   artifactId:version that was just downloaded.
>
> I don't understand this?

This is the Type 1, Example 1 reference.
If we download due a proxy request version 2.1 of commons-lang, but our 
managed repository local maven-metadata.xml file doesn't have it listed, 
then add that version to the maven-metadata.xml file.

>
>> 4) When proxying content from a remote repository for a snapshot 
>> artifact,
>>   the proxy mechanism needs to pull the remote metadata.xml to determine
>>   what actual artifactId:version to pull.
>
> right
>
>> Is it timestamped or not?
>
> not sure if you are asking a question here or if it's a piece of logic 
> you are describing.

More logic then question.
As the proxy request to the remote repository needs to know what 
artifact / pom to actually pull, based on the contents of the remote 
maven-metadata.xml file.

>
>> 5) When proxying content from a remote repository for a snapshot 
>> artifact,
>>   the managed repository needs to have the current metadata.xml for
>>   remote repository?
>
> yes.
>
>> 6) When presenting to the user browsing the repository, do we show 
>> what is
>>   in the managed repository, or the full list of potential versions 
>> from all
>>   downstream remote repositories too?
>
> in 1.0, what is in the repository. Beyond, would be nice to have what 
> is remote (though it should be marked up). That gets more complicated 
> though, for cases where you haven't retrieved any artifacts yet - you 
> basically need a whole repository index update from remote.

I thought about the whole repository index question, I don't think 
that's necessary.  Lemme explain ...
I think that if you have a project that utilizes a specific 
groupId:artifactId, then you have shown an interest in that specific 
groupId:artifactId, and knowing about new releases, etc.. would be a 
good idea.  Could prove to be a useful avenue for a future RSS feed, or 
maven-am-i-up-to-date-plugin, or even some IDE integration angle. no?

>
>>
>>
>> .\ Ideas \.
>>
>> I think it would be a good idea to adopt what happens in the local 
>> repository
>> now, and have maven-metadata-${remote_repo_id}.xml files in the managed
>> repository that the proxy mechanism keeps up to date, and the 
>> seperate merge
>> mechanism utilizes to keep the managed repository metadata.xml as 
>> accurate
>> as possible with all potential versions available.
>
> +1
>
>>
>> WDYT?
>
> Sounds fine.
>
> Cheers,
> Brett
>


-- 
- Joakim Erdfelt
  joakim@erdfelt.com
  Open Source Software (OSS) Developer


Re: MRM-463 - metadata handling / merging

Posted by Brett Porter <br...@apache.org>.
On 15/08/2007, at 10:14 AM, Joakim Erdfelt wrote:

>
> This example is actually bad IMO, as the top level version / 
> metadata/version element isn't the latest version, or the current  
> version, or even the last uploaded version.

Maven actually ignores it - it's a deployment bug. You can safely  
omit it.

> Example 2: http://repo1.maven.org/maven2/commons-beanutils/commons- 
> beanutils/1.6.1/maven-metadata.xml
>
> <metadata>
>  <groupId>commons-beanutils</groupId>
>  <artifactId>commons-beanutils</artifactId>
>  <version>1.6.1</version>
> </metadata>
>
> Not very exciting, quite easy actually.

Maven deployment doesn't generate this, so whether we do or not  
doesn't really matter

>
> But when we deal with snapshots, it becomes critical.

Correct.

I would ensure that we generate it properly for timestamped  
artifacts. I'm not exactly sure how maven deals with non-timestamped  
artifacts when the metadata exists (the two other cases you listed  
where it was mixed or always not timestamped but had metadata anyway)  
- probably needs some investigation as to the best to generate here,  
but leaving it out is probably the best option (in the mixed case, we  
need to figure out how to decide when to leave it out - I'd say if  
the last mod time of the -SNAPSHOT file is > all other timestamps in  
the directory, or otherwise just make that a repository warning and  
ignore one or the other).

> .\ Areas of Concern \.
>
> Alright, I think we have few areas of focus around this.
>
> 1) The repository consumer run to ensure that a metadata.xml file  
> exists.

if it's required.

> 2) The database consumer run to ensure that the contents of the  
> metadata.xml
>   are sane based on the list of available versions in the database.

if it exists

> 3) When proxying content from a remote repository for a released  
> artifact,
>   the type 1 groupId:artifactId needs to be updated to reflect the new
>   artifactId:version that was just downloaded.

I don't understand this?

> 4) When proxying content from a remote repository for a snapshot  
> artifact,
>   the proxy mechanism needs to pull the remote metadata.xml to  
> determine
>   what actual artifactId:version to pull.

right

> Is it timestamped or not?

not sure if you are asking a question here or if it's a piece of  
logic you are describing.

> 5) When proxying content from a remote repository for a snapshot  
> artifact,
>   the managed repository needs to have the current metadata.xml for
>   remote repository?

yes.

> 6) When presenting to the user browsing the repository, do we show  
> what is
>   in the managed repository, or the full list of potential versions  
> from all
>   downstream remote repositories too?

in 1.0, what is in the repository. Beyond, would be nice to have what  
is remote (though it should be marked up). That gets more complicated  
though, for cases where you haven't retrieved any artifacts yet - you  
basically need a whole repository index update from remote.

>
>
> .\ Ideas \.
>
> I think it would be a good idea to adopt what happens in the local  
> repository
> now, and have maven-metadata-${remote_repo_id}.xml files in the  
> managed
> repository that the proxy mechanism keeps up to date, and the  
> seperate merge
> mechanism utilizes to keep the managed repository metadata.xml as  
> accurate
> as possible with all potential versions available.

+1

>
> WDYT?

Sounds fine.

Cheers,
Brett


Re: MRM-463 - metadata handling / merging

Posted by Maria Odea Ching <oc...@devzuz.com>.
Joakim Erdfelt wrote:
> MRM-463 has a bunch of unanswered questions for me.
>
> [ link for the lazy to use http://jira.codehaus.org/browse/MRM-463 ]
>
> .\ Synopsis \.
>
> We have to maintain a sane metadata.xml for the m2 clients.
>
>
> .\ Details \.
>
> There are 2 major kinds of metadata.xml from what I can see.
>
>
> .\ Metadata Type 1: [ groupId:artifactId ] \.
>
> One that is obtained at the groupId:artifactId level, and contains a 
> set of available versions for a specific artifactId.  With a link to 
> the 'current' or 'latest' version.
>
>
> Example 1: 
> http://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/maven-metadata.xml 
>
>
> <metadata>
>  <groupId>commons-beanutils</groupId>
>  <artifactId>commons-beanutils</artifactId>
>  <version>1.0</version>
>  <versioning>
>    <versions>
>      <version>1.0</version>
>      <version>1.2</version>
>      <version>1.3</version>
>      <version>1.4</version>
>      <version>1.4-dev</version>
>      <version>1.4.1</version>
>      <version>1.5</version>
>      <version>1.6</version>
>      <version>1.6.1</version>
>      <version>1.7-dev</version>
>      <version>1.7.0</version>
>      <version>20020520</version>
>      <version>20021128.082114</version>
>      <version>20030211.134440</version>
>      <version>dev</version>
>    </versions>
>  </versioning>
> </metadata>
>
> This example is actually bad IMO, as the top level version 
> /metadata/version element isn't the latest version, or the current 
> version, or even the last uploaded version.
>
>
> .\ Metadata Type 2: [ groupId:artifactId:version ] \.
>
> This type is version specific.
>
> So far, most released artifacts have this in their directory, but it 
> is not terribly useful IMO.
>
> Example 2: 
> http://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.6.1/maven-metadata.xml 
>
>
> <metadata>
>  <groupId>commons-beanutils</groupId>
>  <artifactId>commons-beanutils</artifactId>
>  <version>1.6.1</version>
> </metadata>
>
> Not very exciting, quite easy actually.
>
> But when we deal with snapshots, it becomes critical.
>
> First we'll look at an artifact with Timestamped snapshots.
> Example 3: 
> http://snapshots.repository.codehaus.org/org/codehaus/xfire/xfire-core/1.2-SNAPSHOT/maven-metadata.xml 
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <metadata>
>  <groupId>org.codehaus.xfire</groupId>
>  <artifactId>xfire-core</artifactId>
>  <version>1.2-SNAPSHOT</version>
>  <versioning>
>    <snapshot>
>      <timestamp>20070612.101111</timestamp>
>      <buildNumber>63</buildNumber>
>    </snapshot>
>    <lastUpdated>20070612101133</lastUpdated>
>  </versioning>
> </metadata>
>
> Next, here is an example without Timestamped snapshots.
> Example 4: 
> http://snapshots.repository.codehaus.org/org/codehaus/groovy/groovy/1.1-beta-2-SNAPSHOT/maven-metadata.xml 
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <metadata>
>  <groupId>org.codehaus.groovy</groupId>
>  <artifactId>groovy</artifactId>
>  <version>1.1-beta-2-SNAPSHOT</version>
>  <versioning>
>    <snapshot>
>      <buildNumber>2</buildNumber>
>    </snapshot>
>    <lastUpdated>20070616042726</lastUpdated>
>  </versioning>
> </metadata>
>
> Next, here is an example of an artifact with Timestamped and non 
> Timestamped artifacts.  To see this you'll need to browse the 
> directory: 
> http://people.apache.org/repo/m2-snapshot-repository/org/apache/cocoon/cocoon-ajax/1-SNAPSHOT/ 
> (course, this is appears to be a case of someone uploading their local 
> repository)
>
> Example 5: 
> http://people.apache.org/repo/m2-snapshot-repository/org/apache/cocoon/cocoon-ajax/1-SNAPSHOT/maven-metadata.xml 
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <metadata>
>  <groupId>org.apache.cocoon</groupId>
>  <artifactId>cocoon-ajax</artifactId>
>  <version>1-SNAPSHOT</version>
>  <versioning>
>    <snapshot>
>      <timestamp>20060728.031822</timestamp>
>      <buildNumber>10</buildNumber>
>    </snapshot>
>    <lastUpdated>20060728031823</lastUpdated>
>  </versioning>
> </metadata>

I think I missed updating the metadata file in the artifactId/version 
level for the repository purge..
I have to create a new jira for it then.

Thanks for putting up these examples :)

>
>
> .\ Areas of Concern \.
>
> Alright, I think we have few areas of focus around this.
>
> 1) The repository consumer run to ensure that a metadata.xml file exists.
> 2) The database consumer run to ensure that the contents of the 
> metadata.xml
>   are sane based on the list of available versions in the database.
> 3) When proxying content from a remote repository for a released 
> artifact,
>   the type 1 groupId:artifactId needs to be updated to reflect the new
>   artifactId:version that was just downloaded.
> 4) When proxying content from a remote repository for a snapshot 
> artifact,
>   the proxy mechanism needs to pull the remote metadata.xml to determine
>   what actual artifactId:version to pull.  Is it timestamped or not?
> 5) When proxying content from a remote repository for a snapshot 
> artifact,
>   the managed repository needs to have the current metadata.xml for
>   remote repository?
> 6) When presenting to the user browsing the repository, do we show 
> what is
>   in the managed repository, or the full list of potential versions 
> from all
>   downstream remote repositories too?
>
>
> .\ Ideas \.
>
> I think it would be a good idea to adopt what happens in the local 
> repository
> now, and have maven-metadata-${remote_repo_id}.xml files in the managed
> repository that the proxy mechanism keeps up to date, and the seperate 
> merge
> mechanism utilizes to keep the managed repository metadata.xml as 
> accurate
> as possible with all potential versions available.
>
> WDYT?
>

+1 to this.. I've just read the discussions between you and Brett, and 
everything seems to have been ironed out :)

> -- 
> - Joakim Erdfelt
>  joakime@apache.org
>  joakim@erdfelt.com
>

Thanks,
Deng