You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@archiva.apache.org by Brett Porter <br...@apache.org> on 2010/02/16 15:08:54 UTC

work left to do on MRM-1025 branch

Hi,

It's been a long time in the works, but this branch is now at the point that it is stable enough that I'd like to merge it to trunk before it further diverges from the current trunk (I'll start a separate vote thread given the size). It is big enough already and I'd like to avoid a 'code bomb' that nobody can understand without re-learning the whole codebase.

I believe with some work we can incorporate it into 1.4. The main goal has been completed, and then some - not just making the database optional, but removing it and JPOX altogether without loss of functionality (at least for Archiva itself - the users database is unchanged). In addition, I have started to move towards the target architecture we talked about, splitting some functionality out (maven2 specifics support, metadata storage providers) into plugins, and introducing the metadata model and repository API that can be used to tie everything together. All going well, things should look much the same as they did, but quite a bit faster and more predictable.

The documentation to review is here: http://archiva.apache.org/ref/1.4-SNAPSHOT/ (just 5 pages for now). We can continue the other thread to decide if this should be in SVN or the wiki.

As I said, there is still work to do, which I'll list briefly. If we go ahead, I will push these into JIRA.

* The biggest thing to look at is metadata-repository-file. I threw this together with property files quickly and there's no optimisation or even exception handling. We need to look at the right way to approach this - a more robust implementation of a file system store (properties or xml) is definitely workable, but would need to be combined with something like a Lucene index (as in Archiva 0.9) to make some of the operations fast enough. What I would like to look at instead is using JCR (with file system persistence - not a database!) to see how well it reacts to a lot of operations. As you can tell from the docs, the storage is tailored to living in a hierarchical content repository in whatever form that takes, and the storage is isolated behind an API.

* Along those lines, the content model needs some revision, as there are still Maven specifics baked in to some areas.

* I didn't spend a lot of time on Javadoc because the APIs were changing. I'd like to go back and ensure that it is extremely high quality on the new code. There are some more things that can be done to refine the API as well once a couple more systems are using it and we have the metadata storage finalised.

* Deletion detection in scanning is still missing, though it would now be quite easy to add back, I didn't get around to it.

* There are a number of other "TODO" markers in the code still - as I was focusing on making the important bits work, I left some to come back to. It's not missing things, but small improvements we should either make or decide not to do after all.

* While it should be very straightforward, we need to test some scenarios of upgrading from Archiva 1.3.

* Finally, we need to go through JIRA and close out all the database related issues and re-test things that this may have corrected. I managed to fix several bugs in the process already.

Beyond a 1.4 release, the types of things we might do with it are:
- refactoring to remove repository-layer and archiva-model. While the new design is similar in intent, there is significantly less coupling from what grew over time, less reliance on scanning, and the model is meant to be extensible. They co-operate just fine right now, but it would benefit us to move it over quickly so that the code is more approachable.
- In particular, moving the proxy module and webdav modules (including groups) over to the API will help ensure it is mature enough for such use cases.
- Changing redback to have a simple, non-database storage as well (which is all 90% of users will need)

Any comments or questions on the above?

Thanks,
Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/

Re: work left to do on MRM-1025 branch

Posted by Brett Porter <br...@apache.org>.

On 19/02/2010, at 2:45 AM, Deng Ching wrote:

> Does the artifact info in the content model get updated when it is changed?
> I tried deploying an artifact in the pre-configured 'internal' repo and got
> this warning when I browsed to the artifact info page:

At the moment, there isn't any timestamp checking - it will update it from the storage if it knows it is incomplete or missing. The message is misleading actually, the 'consumer' version you are using doesn't file a problem report. Basically that message means there wasn't a valid POM. You can check your logs to see the problem.

> 
> "The model may be incomplete due to a previous error in resolving
> information.."
> 
> From the repo problem report, the cause was that the path to the artifact
> cannot be resolved because of the repo's path set which is
> './data/repositories/internal'.

This sounds like an additional bug. I've noted them both for now.

> After changing it to an absolute path, I
> re-deployed the artifact again then ran the scanner (with Process All
> Artifacts set to true) but after browsing to the artifact info page, I still
> get the same warning and an empty groupId, artifactId and packaging so I'm
> wondering if the content model gets updated and how is it done?
> 
> Thanks,
> Deng

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/

Re: work left to do on MRM-1025 branch

Posted by Deng Ching <oc...@apache.org>.

Follow up question :)

On Tue, Feb 16, 2010 at 10:08 PM, Brett Porter <br...@apache.org> wrote:

> Hi,
>
> It's been a long time in the works, but this branch is now at the point
> that it is stable enough that I'd like to merge it to trunk before it
> further diverges from the current trunk (I'll start a separate vote thread
> given the size). It is big enough already and I'd like to avoid a 'code
> bomb' that nobody can understand without re-learning the whole codebase.
>
> I believe with some work we can incorporate it into 1.4. The main goal has
> been completed, and then some - not just making the database optional, but
> removing it and JPOX altogether without loss of functionality (at least for
> Archiva itself - the users database is unchanged). In addition, I have
> started to move towards the target architecture we talked about, splitting
> some functionality out (maven2 specifics support, metadata storage
> providers) into plugins, and introducing the metadata model and repository
> API that can be used to tie everything together. All going well, things
> should look much the same as they did, but quite a bit faster and more
> predictable.
>
> The documentation to review is here:
> http://archiva.apache.org/ref/1.4-SNAPSHOT/ (just 5 pages for now). We can
> continue the other thread to decide if this should be in SVN or the wiki.
>

Does the artifact info in the content model get updated when it is changed?
I tried deploying an artifact in the pre-configured 'internal' repo and got
this warning when I browsed to the artifact info page:

"The model may be incomplete due to a previous error in resolving
information.."

>From the repo problem report, the cause was that the path to the artifact
cannot be resolved because of the repo's path set which is
'./data/repositories/internal'. After changing it to an absolute path, I
re-deployed the artifact again then ran the scanner (with Process All
Artifacts set to true) but after browsing to the artifact info page, I still
get the same warning and an empty groupId, artifactId and packaging so I'm
wondering if the content model gets updated and how is it done?

Thanks,
Deng

Re: work left to do on MRM-1025 branch

Posted by Brett Porter <br...@apache.org>.


On 19/02/2010, at 1:59 AM, Deng Ching wrote:

> Hi Brett,
> 
> I'm getting test failures in MRM-1025 branch in
> archiva-modules/plugins/maven2-repository module. From the surefire-reports
> of Maven2RepositoryMetadataResolverTest, it looks like an ordering problem
> in the results being compared in one of the assertions..


Should be fixed in r911453

- Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/

Re: work left to do on MRM-1025 branch

Posted by Deng Ching <oc...@apache.org>.

Hi Brett,

I'm getting test failures in MRM-1025 branch in
archiva-modules/plugins/maven2-repository module. From the surefire-reports
of Maven2RepositoryMetadataResolverTest, it looks like an ordering problem
in the results being compared in one of the assertions..

-Deng

On Thu, Feb 18, 2010 at 10:47 PM, Deng Ching <oc...@apache.org> wrote:

> On Wed, Feb 17, 2010 at 6:44 AM, Brett Porter <br...@apache.org> wrote:
>
>>
>> On 17/02/2010, at 3:28 AM, Deng Ching wrote:
>>
>> >
>> > In the Configuration section in
>> > http://archiva.apache.org/ref/1.4-SNAPSHOT/metadata-content-model.html, it
>> > was mentioned that the config should be shadowed to a file in the file
>> > system. Is this a one-to-one relationship (e.g. one repository == one
>> config
>> > file)?
>>
>> Not necessarily. I expect this would continue to use the same file as now,
>> but we can probably simplify the handling quite a bit.
>>
>>
> Ah, ok :) I thought it's going to be a different file from the archiva
> config.
>
>
>>  >>
>> >> * Deletion detection in scanning is still missing, though it would now
>> be
>> >> quite easy to add back, I didn't get around to it.
>> >>
>> >
>> > In trunk now and in the previous releases, new artifacts are detected
>> based
>> > on its modified date and the last repo scan.. I assume it's still the
>> same?
>>
>> Yes, no change to the scanning at present, the last repo scans are
>> recorded in the content repository. However, the deletion detection was in
>> the "database scanning" that is no longer present. While everything else in
>> there was migrated to a different solution, this needs to be added back to
>> go through the content repository and find stale nodes. The reason I didn't
>> do it yet was that I was considering whether to introduce this as a properly
>> scheduled, separate, service, or to just take the easier option for now and
>> tack it on to the end of the repository scan.
>>
>
> Makes sense..
>
> Thanks,
> Deng
>
>
>> >>
>> >> * While it should be very straightforward, we need to test some
>> scenarios
>> >> of upgrading from Archiva 1.3.
>> >>
>> >
>> > ..and also upgrading from the lower versions :)
>>
>> Good point!
>>
>> > I haven't finished reviewing everything and I'm pretty tired (brain
>> isn't
>> > working properly now) so I'll just continue looking over this tomorrow
>> :)
>>
>> Thanks!
>>
>> - Brett
>>
>> --
>> Brett Porter
>> brett@apache.org
>> http://brettporter.wordpress.com/
>>
>>
>>
>>
>>
>

Re: work left to do on MRM-1025 branch

Posted by Deng Ching <oc...@apache.org>.

On Wed, Feb 17, 2010 at 6:44 AM, Brett Porter <br...@apache.org> wrote:

>
> On 17/02/2010, at 3:28 AM, Deng Ching wrote:
>
> >
> > In the Configuration section in
> > http://archiva.apache.org/ref/1.4-SNAPSHOT/metadata-content-model.html ,
> it
> > was mentioned that the config should be shadowed to a file in the file
> > system. Is this a one-to-one relationship (e.g. one repository == one
> config
> > file)?
>
> Not necessarily. I expect this would continue to use the same file as now,
> but we can probably simplify the handling quite a bit.
>
>
Ah, ok :) I thought it's going to be a different file from the archiva
config.


> >>
> >> * Deletion detection in scanning is still missing, though it would now
> be
> >> quite easy to add back, I didn't get around to it.
> >>
> >
> > In trunk now and in the previous releases, new artifacts are detected
> based
> > on its modified date and the last repo scan.. I assume it's still the
> same?
>
> Yes, no change to the scanning at present, the last repo scans are recorded
> in the content repository. However, the deletion detection was in the
> "database scanning" that is no longer present. While everything else in
> there was migrated to a different solution, this needs to be added back to
> go through the content repository and find stale nodes. The reason I didn't
> do it yet was that I was considering whether to introduce this as a properly
> scheduled, separate, service, or to just take the easier option for now and
> tack it on to the end of the repository scan.
>

Makes sense..

Thanks,
Deng


> >>
> >> * While it should be very straightforward, we need to test some
> scenarios
> >> of upgrading from Archiva 1.3.
> >>
> >
> > ..and also upgrading from the lower versions :)
>
> Good point!
>
> > I haven't finished reviewing everything and I'm pretty tired (brain isn't
> > working properly now) so I'll just continue looking over this tomorrow :)
>
> Thanks!
>
> - Brett
>
> --
> Brett Porter
> brett@apache.org
> http://brettporter.wordpress.com/
>
>
>
>
>

Re: work left to do on MRM-1025 branch

Posted by Brett Porter <br...@apache.org>.

On 17/02/2010, at 3:28 AM, Deng Ching wrote:

> 
> In the Configuration section in
> http://archiva.apache.org/ref/1.4-SNAPSHOT/metadata-content-model.html , it
> was mentioned that the config should be shadowed to a file in the file
> system. Is this a one-to-one relationship (e.g. one repository == one config
> file)?

Not necessarily. I expect this would continue to use the same file as now, but we can probably simplify the handling quite a bit.

>> 
>> * Deletion detection in scanning is still missing, though it would now be
>> quite easy to add back, I didn't get around to it.
>> 
> 
> In trunk now and in the previous releases, new artifacts are detected based
> on its modified date and the last repo scan.. I assume it's still the same?

Yes, no change to the scanning at present, the last repo scans are recorded in the content repository. However, the deletion detection was in the "database scanning" that is no longer present. While everything else in there was migrated to a different solution, this needs to be added back to go through the content repository and find stale nodes. The reason I didn't do it yet was that I was considering whether to introduce this as a properly scheduled, separate, service, or to just take the easier option for now and tack it on to the end of the repository scan.

>> 
>> * While it should be very straightforward, we need to test some scenarios
>> of upgrading from Archiva 1.3.
>> 
> 
> ..and also upgrading from the lower versions :)

Good point!

> I haven't finished reviewing everything and I'm pretty tired (brain isn't
> working properly now) so I'll just continue looking over this tomorrow :)

Thanks!

- Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/

Re: work left to do on MRM-1025 branch

Posted by Deng Ching <oc...@apache.org>.

On Tue, Feb 16, 2010 at 10:08 PM, Brett Porter <br...@apache.org> wrote:

> Hi,
>
> It's been a long time in the works, but this branch is now at the point
> that it is stable enough that I'd like to merge it to trunk before it
> further diverges from the current trunk (I'll start a separate vote thread
> given the size). It is big enough already and I'd like to avoid a 'code
> bomb' that nobody can understand without re-learning the whole codebase.
>
> I believe with some work we can incorporate it into 1.4. The main goal has
> been completed, and then some - not just making the database optional, but
> removing it and JPOX altogether without loss of functionality (at least for
> Archiva itself - the users database is unchanged). In addition, I have
> started to move towards the target architecture we talked about, splitting
> some functionality out (maven2 specifics support, metadata storage
> providers) into plugins, and introducing the metadata model and repository
> API that can be used to tie everything together. All going well, things
> should look much the same as they did, but quite a bit faster and more
> predictable.
>

\o/


>
> The documentation to review is here:
> http://archiva.apache.org/ref/1.4-SNAPSHOT/ (just 5 pages for now). We can
> continue the other thread to decide if this should be in SVN or the wiki.
>

In the Configuration section in
http://archiva.apache.org/ref/1.4-SNAPSHOT/metadata-content-model.html , it
was mentioned that the config should be shadowed to a file in the file
system. Is this a one-to-one relationship (e.g. one repository == one config
file)?


>
> As I said, there is still work to do, which I'll list briefly. If we go
> ahead, I will push these into JIRA.
>
> * The biggest thing to look at is metadata-repository-file. I threw this
> together with property files quickly and there's no optimisation or even
> exception handling. We need to look at the right way to approach this - a
> more robust implementation of a file system store (properties or xml) is
> definitely workable, but would need to be combined with something like a
> Lucene index (as in Archiva 0.9) to make some of the operations fast enough.
> What I would like to look at instead is using JCR (with file system
> persistence - not a database!) to see how well it reacts to a lot of
> operations. As you can tell from the docs, the storage is tailored to living
> in a hierarchical content repository in whatever form that takes, and the
> storage is isolated behind an API.
>
> * Along those lines, the content model needs some revision, as there are
> still Maven specifics baked in to some areas.
>
> * I didn't spend a lot of time on Javadoc because the APIs were changing.
> I'd like to go back and ensure that it is extremely high quality on the new
> code. There are some more things that can be done to refine the API as well
> once a couple more systems are using it and we have the metadata storage
> finalised.
>
> * Deletion detection in scanning is still missing, though it would now be
> quite easy to add back, I didn't get around to it.
>

In trunk now and in the previous releases, new artifacts are detected based
on its modified date and the last repo scan.. I assume it's still the same?


>
> * There are a number of other "TODO" markers in the code still - as I was
> focusing on making the important bits work, I left some to come back to.
> It's not missing things, but small improvements we should either make or
> decide not to do after all.
>
> * While it should be very straightforward, we need to test some scenarios
> of upgrading from Archiva 1.3.
>

..and also upgrading from the lower versions :)


>
> * Finally, we need to go through JIRA and close out all the database
> related issues and re-test things that this may have corrected. I managed to
> fix several bugs in the process already.
>

> Beyond a 1.4 release, the types of things we might do with it are:
> - refactoring to remove repository-layer and archiva-model. While the new
> design is similar in intent, there is significantly less coupling from what
> grew over time, less reliance on scanning, and the model is meant to be
> extensible. They co-operate just fine right now, but it would benefit us to
> move it over quickly so that the code is more approachable.
> - In particular, moving the proxy module and webdav modules (including
> groups) over to the API will help ensure it is mature enough for such use
> cases.
> - Changing redback to have a simple, non-database storage as well (which is
> all 90% of users will need)


> Any comments or questions on the above?
>
> Thanks,
> Brett
>
> --
> Brett Porter
> brett@apache.org
> http://brettporter.wordpress.com/
>
>
I haven't finished reviewing everything and I'm pretty tired (brain isn't
working properly now) so I'll just continue looking over this tomorrow :)

Thanks for all the efforts in this Brett!

-Deng