You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@archiva.apache.org by kishore g <g....@gmail.com> on 2012/12/12 01:10:20 UTC

Replication and Fault tolerance using Helix

Hi,

I am writing this email to propose a solution to add replication and fault
tolerance to Archiva. To be honest my knowledge and understanding of
Archiva is superficial. What ever I have understood about it is from
reading docs and my interaction with Olivier.

As of today, Archiva does not support replication and does not support
automatic fail over. Archiva has two main storage types, the files uploaded
to the repository which uses file system for storage and metadata storage
for which Jack Rabbit is used. Archiva also supports notification mechanism
where a consumer can be notified of the changes in the repository.

In order to have fault tolerance and replication, we can have multiple
archiva instances running for redundancy. One of them will be elected as
the master and will accept writes/reads. And the remaining will be slaves
and only serve reads. The slaves can get notified from the master of every
change and it will apply the changes. When the master dies, one of the
slaves will become the master and serve writes.

Apache Helix is a newly incubated project and provides the basic building
blocks to add partition management, recovery from failure and cluster
expansion with ease. I have built a sample prototype how one can build such
a replicated file store using Helix.  More information can be found here.
http://helix.incubator.apache.org/recipes/rsync_replicated_file_store.html

I have used rsync for replication and  apache jci module to detect file
system changes to show case the recipe as a generic use case. However in
case of Archiva, one can use the notification mechanism provided by archiva
consumer for detecting changes and using archiva api's to fetch the changed
files.

There are lot of other benefits that comes from integrating with Helix. For
example, it allows rolling upgrade without impacting the clients, change
the topology dynamically, supports cluster wide scheduling and monitoring
of various tasks.
More info on Helix: http://helix.incubator.apache.org/index.html
User list: user-subscribe@helix.incubator.apache.org

I have been interacting with Olivier while building this prototype and he
has provided valuable suggestions  I would be glad to get feedback on this
approach and if it makes sense to try out an integration with Helix and is
this something that is of value to Archiva.

thanks,
Kishore G

Re: Replication and Fault tolerance using Helix

Posted by Olivier Lamy <ol...@apache.org>.
2012/12/21 kishore g <g....@gmail.com>:
> Thanks. Few things to add.
>
> * Replication in the recipe was done using rsync. Not sure if  rsync is ok
> for Archiva. Its possible for Archiva to not use rsync. As I mentioned in
> my previous email, Archiva supports notification mechanism   in the form of
> consumers. This means the slave instance can act as a consumer and master
> can push the notifications to slave. Each notification will contain the
> list of files changed. Once the slave gets the list of changes, it can
> request those files from master and save it locally. Slave will need to
> have ACL's to fetch all the files. This can make it independent of rsync.
> * I have not considered the user information that is stored in
> database/ldap. Is it possible to enhance the notification mechanism to
> notify consumers of change in user information.
>
> Thoughts ?

IMHO we mus be able to configure this sync mechanism per repository.
For users that can be complicated to sync to jdbc/ldap entries (can be
done after ?)

>
> Whats the best way to move forward ?

We have a sandbox svn path open to all asf committers:
http://svn.apache.org/repos/asf/archiva/sandbox/trunk/
So feel free to hack here :-)

>
> thanks,
> Kishore G
>
>
>
> On Thu, Dec 20, 2012 at 6:47 AM, Adrien Lecharpentier <
> adrien.lecharpentier@gmail.com> wrote:
>
>> IMO, it is a very good idea. I've already saw some config like this but
>> using proxy connector to "replicate" artifacts to archiva-instance close to
>> users (specific teams/dept). Using Helix could be a time saving and great
>> performance enhancement solution.
>>
>> -- Adrien
>>
>>
>> 2012/12/20 Olivier Lamy <ol...@apache.org>
>>
>> > Hi Kishore,
>> > Thanks for your email and explanations !
>> >
>> > @Others
>> > As I'm involved in Helix incubator, my idea was to use it to a sync
>> > mechanism with Archiva for artifacts.
>> > Basically a primary/master instance where users deploy their artifacts
>> > then n slave instances for reading artifacts.
>> >
>> > Does it make sense to you ?
>> >
>> > 2012/12/12 kishore g <g....@gmail.com>:
>> > > Hi,
>> > >
>> > > I am writing this email to propose a solution to add replication and
>> > fault
>> > > tolerance to Archiva. To be honest my knowledge and understanding of
>> > > Archiva is superficial. What ever I have understood about it is from
>> > > reading docs and my interaction with Olivier.
>> > >
>> > > As of today, Archiva does not support replication and does not support
>> > > automatic fail over. Archiva has two main storage types, the files
>> > uploaded
>> > > to the repository which uses file system for storage and metadata
>> storage
>> > > for which Jack Rabbit is used. Archiva also supports notification
>> > mechanism
>> > > where a consumer can be notified of the changes in the repository.
>> > >
>> > > In order to have fault tolerance and replication, we can have multiple
>> > > archiva instances running for redundancy. One of them will be elected
>> as
>> > > the master and will accept writes/reads. And the remaining will be
>> slaves
>> > > and only serve reads. The slaves can get notified from the master of
>> > every
>> > > change and it will apply the changes. When the master dies, one of the
>> > > slaves will become the master and serve writes.
>> > >
>> > > Apache Helix is a newly incubated project and provides the basic
>> building
>> > > blocks to add partition management, recovery from failure and cluster
>> > > expansion with ease. I have built a sample prototype how one can build
>> > such
>> > > a replicated file store using Helix.  More information can be found
>> here.
>> > >
>> >
>> http://helix.incubator.apache.org/recipes/rsync_replicated_file_store.html
>> > >
>> > > I have used rsync for replication and  apache jci module to detect file
>> > > system changes to show case the recipe as a generic use case. However
>> in
>> > > case of Archiva, one can use the notification mechanism provided by
>> > archiva
>> > > consumer for detecting changes and using archiva api's to fetch the
>> > changed
>> > > files.
>> > >
>> > > There are lot of other benefits that comes from integrating with Helix.
>> > For
>> > > example, it allows rolling upgrade without impacting the clients,
>> change
>> > > the topology dynamically, supports cluster wide scheduling and
>> monitoring
>> > > of various tasks.
>> > > More info on Helix: http://helix.incubator.apache.org/index.html
>> > > User list: user-subscribe@helix.incubator.apache.org
>> > >
>> > > I have been interacting with Olivier while building this prototype and
>> he
>> > > has provided valuable suggestions  I would be glad to get feedback on
>> > this
>> > > approach and if it makes sense to try out an integration with Helix and
>> > is
>> > > this something that is of value to Archiva.
>> > >
>> > > thanks,
>> > > Kishore G
>> >
>> >
>> >
>> > --
>> > Olivier Lamy
>> > Talend: http://coders.talend.com
>> > http://twitter.com/olamy | http://linkedin.com/in/olamy
>> >
>>
>>
>>
>> --
>> Adrien Lecharpentier
>>



--
Olivier Lamy
Talend: http://coders.talend.com
http://twitter.com/olamy | http://linkedin.com/in/olamy

Re: Replication and Fault tolerance using Helix

Posted by kishore g <g....@gmail.com>.
Thanks. Few things to add.

* Replication in the recipe was done using rsync. Not sure if  rsync is ok
for Archiva. Its possible for Archiva to not use rsync. As I mentioned in
my previous email, Archiva supports notification mechanism   in the form of
consumers. This means the slave instance can act as a consumer and master
can push the notifications to slave. Each notification will contain the
list of files changed. Once the slave gets the list of changes, it can
request those files from master and save it locally. Slave will need to
have ACL's to fetch all the files. This can make it independent of rsync.
* I have not considered the user information that is stored in
database/ldap. Is it possible to enhance the notification mechanism to
notify consumers of change in user information.

Thoughts ?

Whats the best way to move forward ?

thanks,
Kishore G



On Thu, Dec 20, 2012 at 6:47 AM, Adrien Lecharpentier <
adrien.lecharpentier@gmail.com> wrote:

> IMO, it is a very good idea. I've already saw some config like this but
> using proxy connector to "replicate" artifacts to archiva-instance close to
> users (specific teams/dept). Using Helix could be a time saving and great
> performance enhancement solution.
>
> -- Adrien
>
>
> 2012/12/20 Olivier Lamy <ol...@apache.org>
>
> > Hi Kishore,
> > Thanks for your email and explanations !
> >
> > @Others
> > As I'm involved in Helix incubator, my idea was to use it to a sync
> > mechanism with Archiva for artifacts.
> > Basically a primary/master instance where users deploy their artifacts
> > then n slave instances for reading artifacts.
> >
> > Does it make sense to you ?
> >
> > 2012/12/12 kishore g <g....@gmail.com>:
> > > Hi,
> > >
> > > I am writing this email to propose a solution to add replication and
> > fault
> > > tolerance to Archiva. To be honest my knowledge and understanding of
> > > Archiva is superficial. What ever I have understood about it is from
> > > reading docs and my interaction with Olivier.
> > >
> > > As of today, Archiva does not support replication and does not support
> > > automatic fail over. Archiva has two main storage types, the files
> > uploaded
> > > to the repository which uses file system for storage and metadata
> storage
> > > for which Jack Rabbit is used. Archiva also supports notification
> > mechanism
> > > where a consumer can be notified of the changes in the repository.
> > >
> > > In order to have fault tolerance and replication, we can have multiple
> > > archiva instances running for redundancy. One of them will be elected
> as
> > > the master and will accept writes/reads. And the remaining will be
> slaves
> > > and only serve reads. The slaves can get notified from the master of
> > every
> > > change and it will apply the changes. When the master dies, one of the
> > > slaves will become the master and serve writes.
> > >
> > > Apache Helix is a newly incubated project and provides the basic
> building
> > > blocks to add partition management, recovery from failure and cluster
> > > expansion with ease. I have built a sample prototype how one can build
> > such
> > > a replicated file store using Helix.  More information can be found
> here.
> > >
> >
> http://helix.incubator.apache.org/recipes/rsync_replicated_file_store.html
> > >
> > > I have used rsync for replication and  apache jci module to detect file
> > > system changes to show case the recipe as a generic use case. However
> in
> > > case of Archiva, one can use the notification mechanism provided by
> > archiva
> > > consumer for detecting changes and using archiva api's to fetch the
> > changed
> > > files.
> > >
> > > There are lot of other benefits that comes from integrating with Helix.
> > For
> > > example, it allows rolling upgrade without impacting the clients,
> change
> > > the topology dynamically, supports cluster wide scheduling and
> monitoring
> > > of various tasks.
> > > More info on Helix: http://helix.incubator.apache.org/index.html
> > > User list: user-subscribe@helix.incubator.apache.org
> > >
> > > I have been interacting with Olivier while building this prototype and
> he
> > > has provided valuable suggestions  I would be glad to get feedback on
> > this
> > > approach and if it makes sense to try out an integration with Helix and
> > is
> > > this something that is of value to Archiva.
> > >
> > > thanks,
> > > Kishore G
> >
> >
> >
> > --
> > Olivier Lamy
> > Talend: http://coders.talend.com
> > http://twitter.com/olamy | http://linkedin.com/in/olamy
> >
>
>
>
> --
> Adrien Lecharpentier
>

Re: Replication and Fault tolerance using Helix

Posted by Adrien Lecharpentier <ad...@gmail.com>.
IMO, it is a very good idea. I've already saw some config like this but
using proxy connector to "replicate" artifacts to archiva-instance close to
users (specific teams/dept). Using Helix could be a time saving and great
performance enhancement solution.

-- Adrien


2012/12/20 Olivier Lamy <ol...@apache.org>

> Hi Kishore,
> Thanks for your email and explanations !
>
> @Others
> As I'm involved in Helix incubator, my idea was to use it to a sync
> mechanism with Archiva for artifacts.
> Basically a primary/master instance where users deploy their artifacts
> then n slave instances for reading artifacts.
>
> Does it make sense to you ?
>
> 2012/12/12 kishore g <g....@gmail.com>:
> > Hi,
> >
> > I am writing this email to propose a solution to add replication and
> fault
> > tolerance to Archiva. To be honest my knowledge and understanding of
> > Archiva is superficial. What ever I have understood about it is from
> > reading docs and my interaction with Olivier.
> >
> > As of today, Archiva does not support replication and does not support
> > automatic fail over. Archiva has two main storage types, the files
> uploaded
> > to the repository which uses file system for storage and metadata storage
> > for which Jack Rabbit is used. Archiva also supports notification
> mechanism
> > where a consumer can be notified of the changes in the repository.
> >
> > In order to have fault tolerance and replication, we can have multiple
> > archiva instances running for redundancy. One of them will be elected as
> > the master and will accept writes/reads. And the remaining will be slaves
> > and only serve reads. The slaves can get notified from the master of
> every
> > change and it will apply the changes. When the master dies, one of the
> > slaves will become the master and serve writes.
> >
> > Apache Helix is a newly incubated project and provides the basic building
> > blocks to add partition management, recovery from failure and cluster
> > expansion with ease. I have built a sample prototype how one can build
> such
> > a replicated file store using Helix.  More information can be found here.
> >
> http://helix.incubator.apache.org/recipes/rsync_replicated_file_store.html
> >
> > I have used rsync for replication and  apache jci module to detect file
> > system changes to show case the recipe as a generic use case. However in
> > case of Archiva, one can use the notification mechanism provided by
> archiva
> > consumer for detecting changes and using archiva api's to fetch the
> changed
> > files.
> >
> > There are lot of other benefits that comes from integrating with Helix.
> For
> > example, it allows rolling upgrade without impacting the clients, change
> > the topology dynamically, supports cluster wide scheduling and monitoring
> > of various tasks.
> > More info on Helix: http://helix.incubator.apache.org/index.html
> > User list: user-subscribe@helix.incubator.apache.org
> >
> > I have been interacting with Olivier while building this prototype and he
> > has provided valuable suggestions  I would be glad to get feedback on
> this
> > approach and if it makes sense to try out an integration with Helix and
> is
> > this something that is of value to Archiva.
> >
> > thanks,
> > Kishore G
>
>
>
> --
> Olivier Lamy
> Talend: http://coders.talend.com
> http://twitter.com/olamy | http://linkedin.com/in/olamy
>



-- 
Adrien Lecharpentier

Re: Replication and Fault tolerance using Helix

Posted by Olivier Lamy <ol...@apache.org>.
Hi Kishore,
Thanks for your email and explanations !

@Others
As I'm involved in Helix incubator, my idea was to use it to a sync
mechanism with Archiva for artifacts.
Basically a primary/master instance where users deploy their artifacts
then n slave instances for reading artifacts.

Does it make sense to you ?

2012/12/12 kishore g <g....@gmail.com>:
> Hi,
>
> I am writing this email to propose a solution to add replication and fault
> tolerance to Archiva. To be honest my knowledge and understanding of
> Archiva is superficial. What ever I have understood about it is from
> reading docs and my interaction with Olivier.
>
> As of today, Archiva does not support replication and does not support
> automatic fail over. Archiva has two main storage types, the files uploaded
> to the repository which uses file system for storage and metadata storage
> for which Jack Rabbit is used. Archiva also supports notification mechanism
> where a consumer can be notified of the changes in the repository.
>
> In order to have fault tolerance and replication, we can have multiple
> archiva instances running for redundancy. One of them will be elected as
> the master and will accept writes/reads. And the remaining will be slaves
> and only serve reads. The slaves can get notified from the master of every
> change and it will apply the changes. When the master dies, one of the
> slaves will become the master and serve writes.
>
> Apache Helix is a newly incubated project and provides the basic building
> blocks to add partition management, recovery from failure and cluster
> expansion with ease. I have built a sample prototype how one can build such
> a replicated file store using Helix.  More information can be found here.
> http://helix.incubator.apache.org/recipes/rsync_replicated_file_store.html
>
> I have used rsync for replication and  apache jci module to detect file
> system changes to show case the recipe as a generic use case. However in
> case of Archiva, one can use the notification mechanism provided by archiva
> consumer for detecting changes and using archiva api's to fetch the changed
> files.
>
> There are lot of other benefits that comes from integrating with Helix. For
> example, it allows rolling upgrade without impacting the clients, change
> the topology dynamically, supports cluster wide scheduling and monitoring
> of various tasks.
> More info on Helix: http://helix.incubator.apache.org/index.html
> User list: user-subscribe@helix.incubator.apache.org
>
> I have been interacting with Olivier while building this prototype and he
> has provided valuable suggestions  I would be glad to get feedback on this
> approach and if it makes sense to try out an integration with Helix and is
> this something that is of value to Archiva.
>
> thanks,
> Kishore G



-- 
Olivier Lamy
Talend: http://coders.talend.com
http://twitter.com/olamy | http://linkedin.com/in/olamy