You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@archiva.apache.org by Brett Porter <br...@apache.org> on 2008/12/01 04:13:59 UTC
progress on database decoupling
Hi,
Just a short note - in line with the previous discussion we've had
about decoupling the database such that Archiva will run without it
(but can use it for additional stats, etc through a plugin), and
setting up an extensible metadata format, I've continued the work
under MRM-1025.
See: http://svn.apache.org/viewvc/archiva/branches/MRM-1025/
and: http://cwiki.apache.org/confluence/display/ARCHIVA/Metadata+storage
Any comments, questions, volunteers? :)
- Brett
--
Brett Porter
brett@apache.org
http://blogs.exist.com/bporter/
Re: progress on database decoupling
Posted by Brett Porter <br...@apache.org>.
On 01/12/2008, at 4:59 PM, Rahul Thakur wrote:
> Hi Brett,
>
> Just had a quick look.
>
> What is the minimum JDK requirement for this - JDK 5.0?
Yep, we're using that actively now, though some of the code is
catching up.
- Brett
>
>
> I noticed ProjectModelDAO#queryProjectModels(... while similar methods
> ArtifactDAO#queryArtifacts(..)
> RepositoryProblemDAO#queryRepositoryProblems(..)
>
> do not.
>
> Cheers,
> Rahul
>
>
> On 12/1/2008 4:13 PM, Brett Porter wrote:
>> Hi,
>>
>> Just a short note - in line with the previous discussion we've had
>> about decoupling the database such that Archiva will run without it
>> (but can use it for additional stats, etc through a plugin), and
>> setting up an extensible metadata format, I've continued the work
>> under MRM-1025.
>>
>> See: http://svn.apache.org/viewvc/archiva/branches/MRM-1025/
>> and: http://cwiki.apache.org/confluence/display/ARCHIVA/Metadata+storage
>>
>> Any comments, questions, volunteers? :)
>>
>> - Brett
>>
>> --
>> Brett Porter
>> brett@apache.org
>> http://blogs.exist.com/bporter/
>>
>>
>
--
Brett Porter
brett@apache.org
http://blogs.exist.com/bporter/
Re: progress on database decoupling
Posted by Rahul Thakur <ra...@gmail.com>.
Hi Brett,
Just had a quick look.
What is the minimum JDK requirement for this - JDK 5.0?
I noticed ProjectModelDAO#queryProjectModels(... while similar methods
ArtifactDAO#queryArtifacts(..)
RepositoryProblemDAO#queryRepositoryProblems(..)
do not.
Cheers,
Rahul
On 12/1/2008 4:13 PM, Brett Porter wrote:
> Hi,
>
> Just a short note - in line with the previous discussion we've had
> about decoupling the database such that Archiva will run without it
> (but can use it for additional stats, etc through a plugin), and
> setting up an extensible metadata format, I've continued the work
> under MRM-1025.
>
> See: http://svn.apache.org/viewvc/archiva/branches/MRM-1025/
> and: http://cwiki.apache.org/confluence/display/ARCHIVA/Metadata+storage
>
> Any comments, questions, volunteers? :)
>
> - Brett
>
> --
> Brett Porter
> brett@apache.org
> http://blogs.exist.com/bporter/
>
>
Re: progress on database decoupling
Posted by Joakim Erdfelt <jo...@gmail.com>.
One thing I think brett failed to mention, is that this decoupling is
just a step towards having the database as an optional component via
the plugin system being worked on by James.
The database is just moving from being a core component to being an
optional component.
- Joakim
On Mon, Dec 1, 2008 at 7:25 AM, Brett Porter <br...@apache.org> wrote:
>
> On 01/12/2008, at 7:17 PM, Brett Porter wrote:
>
>> There is one particular reference to a thread at the bottom of the wiki
>> page linked below, but the main reference thread would be the target
>> architecture one [1] (I'm not sure why Markmail has stopped detecting
>> threads though...).
>>
>> It is not so much to remove, but decouple so that it will run with basic
>> functionality without the database.
>>
>> That theme is probably scattered, so I can summarise:
>> - derby takes quite a lot of memory which is a potential hinderance to
>> running your own instance
>> - the performance of populating the database has been poor on a large
>> repository
>
> just to attempt to quantify this, the preliminary results are (37938
> artifacts):
> - current scan: 10 Minutes 54 Seconds (update database, including generating
> checksums)
> - alternate scan: 35 seconds (not generating checksums), 2 Minutes 55
> Seconds (generating checksums)
>
> Not highly scientific - and once fleshed out the metadata writing might
> increase marginally - but I think the magnitude of difference is clear :)
>
> We can also get a decent percentage win just by deferring all of the bits
> that need to read the entire file contents (checksums, jarinfo) to a later
> time, and generate it all at once if possible.
>
>>
>> - harder to diagnose problems when the database is not in a consistent
>> state
>> - we don't particularly take advantage of the "robustness, reliability and
>> scalability" of the database as it effectively acts as a cache for the local
>> storage, doesn't handle concurrent servers, etc.
>>
>> More importantly, there are a number of things about the current design
>> (not necessarily the database) that are a barrier to contribution IMO. Some
>> parts are quite tightly coupled, and the database code is mixed in to the
>> model. There is a mix of using paths and artifact references which causes a
>> lot of back and forward conversions, and some Maven concepts are baked in
>> that don't make sense for other repository types. The over-reliance on
>> scanning which is a hang over from the very first code I checked in is
>> biting us worst of all I think.
>>
>> I hope this all makes sense :)
>>
>> Cheers,
>> Brett
>>
>> [1] http://markmail.org/message/6o6byzjsccgzgkmr
>>
>>
>> On 01/12/2008, at 2:24 PM, Martin Cooper wrote:
>>
>>> Hey Brett,
>>>
>>> Do you have a handy link to the previous discussions you mention? I'm
>>> curious as to why someone would elect to give up the robustness,
>>> reliability
>>> and scalability of a database, since I would have counted those as assets
>>> rather than something to work to remove.
>>>
>>> Thanks!
>>>
>>> --
>>> Martin Cooper
>>>
>
> --
> Brett Porter
> brett@apache.org
> http://blogs.exist.com/bporter/
>
>
Re: progress on database decoupling
Posted by Brett Porter <br...@apache.org>.
On 01/12/2008, at 7:17 PM, Brett Porter wrote:
> There is one particular reference to a thread at the bottom of the
> wiki page linked below, but the main reference thread would be the
> target architecture one [1] (I'm not sure why Markmail has stopped
> detecting threads though...).
>
> It is not so much to remove, but decouple so that it will run with
> basic functionality without the database.
>
> That theme is probably scattered, so I can summarise:
> - derby takes quite a lot of memory which is a potential hinderance
> to running your own instance
> - the performance of populating the database has been poor on a
> large repository
just to attempt to quantify this, the preliminary results are (37938
artifacts):
- current scan: 10 Minutes 54 Seconds (update database, including
generating checksums)
- alternate scan: 35 seconds (not generating checksums), 2 Minutes 55
Seconds (generating checksums)
Not highly scientific - and once fleshed out the metadata writing
might increase marginally - but I think the magnitude of difference is
clear :)
We can also get a decent percentage win just by deferring all of the
bits that need to read the entire file contents (checksums, jarinfo)
to a later time, and generate it all at once if possible.
>
> - harder to diagnose problems when the database is not in a
> consistent state
> - we don't particularly take advantage of the "robustness,
> reliability and scalability" of the database as it effectively acts
> as a cache for the local storage, doesn't handle concurrent servers,
> etc.
>
> More importantly, there are a number of things about the current
> design (not necessarily the database) that are a barrier to
> contribution IMO. Some parts are quite tightly coupled, and the
> database code is mixed in to the model. There is a mix of using
> paths and artifact references which causes a lot of back and forward
> conversions, and some Maven concepts are baked in that don't make
> sense for other repository types. The over-reliance on scanning
> which is a hang over from the very first code I checked in is biting
> us worst of all I think.
>
> I hope this all makes sense :)
>
> Cheers,
> Brett
>
> [1] http://markmail.org/message/6o6byzjsccgzgkmr
>
>
> On 01/12/2008, at 2:24 PM, Martin Cooper wrote:
>
>> Hey Brett,
>>
>> Do you have a handy link to the previous discussions you mention? I'm
>> curious as to why someone would elect to give up the robustness,
>> reliability
>> and scalability of a database, since I would have counted those as
>> assets
>> rather than something to work to remove.
>>
>> Thanks!
>>
>> --
>> Martin Cooper
>>
--
Brett Porter
brett@apache.org
http://blogs.exist.com/bporter/
Re: progress on database decoupling
Posted by Brett Porter <br...@apache.org>.
There is one particular reference to a thread at the bottom of the
wiki page linked below, but the main reference thread would be the
target architecture one [1] (I'm not sure why Markmail has stopped
detecting threads though...).
It is not so much to remove, but decouple so that it will run with
basic functionality without the database.
That theme is probably scattered, so I can summarise:
- derby takes quite a lot of memory which is a potential hinderance to
running your own instance
- the performance of populating the database has been poor on a large
repository
- harder to diagnose problems when the database is not in a consistent
state
- we don't particularly take advantage of the "robustness, reliability
and scalability" of the database as it effectively acts as a cache for
the local storage, doesn't handle concurrent servers, etc.
More importantly, there are a number of things about the current
design (not necessarily the database) that are a barrier to
contribution IMO. Some parts are quite tightly coupled, and the
database code is mixed in to the model. There is a mix of using paths
and artifact references which causes a lot of back and forward
conversions, and some Maven concepts are baked in that don't make
sense for other repository types. The over-reliance on scanning which
is a hang over from the very first code I checked in is biting us
worst of all I think.
I hope this all makes sense :)
Cheers,
Brett
[1] http://markmail.org/message/6o6byzjsccgzgkmr
On 01/12/2008, at 2:24 PM, Martin Cooper wrote:
> Hey Brett,
>
> Do you have a handy link to the previous discussions you mention? I'm
> curious as to why someone would elect to give up the robustness,
> reliability
> and scalability of a database, since I would have counted those as
> assets
> rather than something to work to remove.
>
> Thanks!
>
> --
> Martin Cooper
>
>
> On Sun, Nov 30, 2008 at 7:13 PM, Brett Porter <br...@apache.org>
> wrote:
>
>> Hi,
>>
>> Just a short note - in line with the previous discussion we've had
>> about
>> decoupling the database such that Archiva will run without it (but
>> can use
>> it for additional stats, etc through a plugin), and setting up an
>> extensible
>> metadata format, I've continued the work under MRM-1025.
>>
>> See: http://svn.apache.org/viewvc/archiva/branches/MRM-1025/
>> and: http://cwiki.apache.org/confluence/display/ARCHIVA/Metadata+storage
>>
>> Any comments, questions, volunteers? :)
>>
>> - Brett
>>
>> --
>> Brett Porter
>> brett@apache.org
>> http://blogs.exist.com/bporter/
>>
>>
--
Brett Porter
brett@apache.org
http://blogs.exist.com/bporter/
Re: progress on database decoupling
Posted by Martin Cooper <ma...@apache.org>.
Hey Brett,
Do you have a handy link to the previous discussions you mention? I'm
curious as to why someone would elect to give up the robustness, reliability
and scalability of a database, since I would have counted those as assets
rather than something to work to remove.
Thanks!
--
Martin Cooper
On Sun, Nov 30, 2008 at 7:13 PM, Brett Porter <br...@apache.org> wrote:
> Hi,
>
> Just a short note - in line with the previous discussion we've had about
> decoupling the database such that Archiva will run without it (but can use
> it for additional stats, etc through a plugin), and setting up an extensible
> metadata format, I've continued the work under MRM-1025.
>
> See: http://svn.apache.org/viewvc/archiva/branches/MRM-1025/
> and: http://cwiki.apache.org/confluence/display/ARCHIVA/Metadata+storage
>
> Any comments, questions, volunteers? :)
>
> - Brett
>
> --
> Brett Porter
> brett@apache.org
> http://blogs.exist.com/bporter/
>
>