You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ace.apache.org by Wayne Nelson <wn...@proofpoint.com> on 2014/05/30 00:12:33 UTC

Scalable and highly available data store

The org.apache.ace.repository.Repository interface is an abstraction over the persistent data store. However, it doesn't appear to support arbitrary queries. Instead, you must call checkout() and unmarshal an XML document from the returned InputStream. This means that the entire data store must be able to reside in memory.

This doesn't seem like it will scale well to hundreds of thousands of targets in a single data store.


The commit() method uses optimistic locking, which means that only one thread can update the database at a time.

This seems like we are limited to a single point of failure with a single server serving up all requests from all agents.


Are these conclusion corrects or did I missing something?




Thanks,

Wayne Nelson l Senior Director Engineering
Proofpoint, Inc.
M: 801-633-0587
E: wnelson@proofpoint.com<ma...@proofpoint.com>
<http://www.proofpoint.com/>
threat protection l compliance l archiving & governance l secure communication


Re: Scalable and highly available data store

Posted by Marcel Offermans <ma...@luminis.eu>.
Hello Wayne,

On 30 May 2014, at 0:12 am, Wayne Nelson <wn...@proofpoint.com> wrote:

> The org.apache.ace.repository.Repository interface is an abstraction over the persistent data store. However, it doesn't appear to support arbitrary queries. Instead, you must call checkout() and unmarshal an XML document from the returned InputStream. This means that the entire data store must be able to reside in memory.

Yes and no.

ACE consists of a "server" and a "client" part (which can be unified into one, but don't have to be).

The ACE server is where the targets connect, and when a target checks for updates, it will use a SAX parser to just get the parts it needs for that request.

The ACE client is where you manipulate the configuration. There the whole repository is indeed unmarshalled into memory, so the amount of targets is limited by memory here.

> This doesn't seem like it will scale well to hundreds of thousands of targets in a single data store.

Probably not, no, but you can partition the targets into separate stores (we have a mechanism that is similar to multitenancy for that, you can create different instances of the stores for specific "customers").

> The commit() method uses optimistic locking, which means that only one thread can update the database at a time.

That is correct, the current client has no feature to "merge" results that have been committed by a different client, so all a client can then do is a new checkout and then apply the changes it wanted to do again.

> This seems like we are limited to a single point of failure with a single server serving up all requests from all agents.

When agents talk to the server, there is no locking at all. Locking is only necessary when updating the configuration, so for example when you upload new bundles, or add new targets/agents. Also, the repositories can be replicated, so agents can connect to any server.

Greetings, Marcel