You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2011/08/10 17:42:41 UTC

Replication & activeMQ

Hi guys,

currently, we are using ActiveMQ to store the modifications we send to 
the client. This leads to an issue caused by the way we have configured 
it, simply because all the mods are stored in memory, and never removed.

Obviously, this is bad.

Thinking about it, my opinion is that it's may be a bit overkilling 
considering our need :
- when a mod is made on the provider, it has to be sent to the consumer
- in any case, we store the mod in a file associated with the consumer
- we send the mod to the consumer unless we *know* that the consumer is 
offline
- we will have no way to be sure that the consumer has correctly 
received the mod
- when a consumer gets online again, it will send a cookie with the last 
CSN it received
- in this case, we have to get all the mods from the file, and send the 
mods to the consumer.

Now, we already are saving the mods in a file, in the 
JournalInterceptor. We just have to implement the recovery system (ie, 
finding the last entry sent from the file) and send all the following 
entries. Then we can delete all the entries older than the requested one.

I don't think that using our own implementation would be an issue here. 
I mean, ActiveMQ is a great piece of software, but having to go through 
tends of options (hundreds?), most of which are totally useless in our 
case, is a bit overkilling for this version.

thoughts ?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Replication & activeMQ, update

Posted by Emmanuel Lecharny <el...@gmail.com>.

Hi,

so we have had a cnvo offline with Kiran early this morning, while most 
of you guys were sleeping :) Let me write down here what we discussed about.

We agreed that using ActiveMQ is totally overkilling. It's like using a 
hammer to kill a fly. We need something lighter, and I thnk the previous 
mails already stated that. What we discussed about this morning was the 
alternative we could use.
Yesterday, I felt a bit depressed about the existing alternative, and as 
usual, my first thought was "let's rewrite this damn journa mechanism 
from scratch". A typical NIH syndrom, something that is often cured by a 
few desperate hours of useless coding, and a good night of sleep.

So this morning, we had a look at KalahDB, the internal Journal system 
used by ActiveMQ. At first, it sound interesting, as it's lightweight, 
and offers what we need : a journal, file backed, with rotation. The 
problem is that we can't easily build a cursor on top of it, as finding 
a CSN from this log requires a full scan of the files. Plus the doco is 
really light, not to mention the total absence of Javadoc and samples... 
(sometime, I feel like our code is one of the most Javadoced code, ever, 
in the OSS world :/ )

Then we agreed that what is needed is to have a CSN index on top of the 
journal, as we want to be able to directly find the position of a 
specific CSN, plus to be able to read all the entries from this CSN in a 
chronological order.

That leads us to the idea to use an Index (the one we define in the 
server) to store the journal. The biggest advantage of using this data 
structure is that it already works, we know how to use it, it's tested, 
it's quite efficient, and it can use any kind of underlying storage 
(JDBM, HBase, LDIF files).

We will test this approach today.

thoughts ?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Replication & activeMQ

Posted by Emmanuel Lecharny <el...@gmail.com>.

On 8/10/11 6:16 PM, Kiran Ayyagari wrote:
>
>> Now, we already are saving the mods in a file, in the JournalInterceptor. We
>> just have to implement the recovery system (ie, finding the last entry sent
>> from the file) and send all the following entries. Then we can delete all
>> the entries older than the requested one.
>>
>> I don't think that using our own implementation would be an issue here. I
>> mean, ActiveMQ is a great piece of software, but having to go through tends
>> of options (hundreds?), most of which are totally useless in our case, is a
>> bit overkilling for this version.
>>
>> thoughts ?
>>
> the syncrepl protocol provides a high level of granularity about the
> kind of data that can be
> replicated, so not each mod in this journal is necessarily replicated
> to a client (it can even be
> a serious issue to send to that client based on the sensitivity of data).
>
> This brings the issue of filtering the data that needs to be sent to
> the client, this requires significant
> time for scanning, processing and maintaining the position pointers in
> the monolithic journal.
Don't get me wrong : we will have one journal per replication consumer. 
The filtering will then be done once, on the provider side.
>
> ActiveMQ's core offers all these features so that was a preferred
> choice instead of writing a journal with all the above
> mentioned features, cause implementing such a journal is quite a
> handful of work and our main problem to be solved
> is replication.
>
> Having said that ActiveMQ is definitely an over kill to be used as a
> journal, the main part that we actually need is
> it's journal/store implementation called 'kahadb' but it is easy to
> use through high level message queue interfaces.
>
> I would like to spend some time on 'kahadb' source to see if that can
> be easily embeddable and serves our purpose.
Yes, this is exactly what I was looking at when I saw your mail. That 
may be a very good balance between complexity and ease of use.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Replication & activeMQ

Posted by Kiran Ayyagari <ka...@apache.org>.

On Wed, Aug 10, 2011 at 9:12 PM, Emmanuel Lecharny <el...@gmail.com> wrote:
> Hi guys,
>
> currently, we are using ActiveMQ to store the modifications we send to the
> client. This leads to an issue caused by the way we have configured it,
> simply because all the mods are stored in memory, and never removed.
>
> Obviously, this is bad.
>
> Thinking about it, my opinion is that it's may be a bit overkilling
yes, indeed, please see further below for the rationale behind this decision
> considering our need :
> - when a mod is made on the provider, it has to be sent to the consumer
> - in any case, we store the mod in a file associated with the consumer
> - we send the mod to the consumer unless we *know* that the consumer is
> offline
> - we will have no way to be sure that the consumer has correctly received
> the mod
> - when a consumer gets online again, it will send a cookie with the last CSN
> it received
agree
> - in this case, we have to get all the mods from the file, and send the mods
> to the consumer.
>
and this is where the things turn out to be complex (definitely not impossible)
> Now, we already are saving the mods in a file, in the JournalInterceptor. We
> just have to implement the recovery system (ie, finding the last entry sent
> from the file) and send all the following entries. Then we can delete all
> the entries older than the requested one.
>
> I don't think that using our own implementation would be an issue here. I
> mean, ActiveMQ is a great piece of software, but having to go through tends
> of options (hundreds?), most of which are totally useless in our case, is a
> bit overkilling for this version.
>
> thoughts ?
>
the syncrepl protocol provides a high level of granularity about the
kind of data that can be
replicated, so not each mod in this journal is necessarily replicated
to a client (it can even be
a serious issue to send to that client based on the sensitivity of data).

This brings the issue of filtering the data that needs to be sent to
the client, this requires significant
time for scanning, processing and maintaining the position pointers in
the monolithic journal.

ActiveMQ's core offers all these features so that was a preferred
choice instead of writing a journal with all the above
mentioned features, cause implementing such a journal is quite a
handful of work and our main problem to be solved
is replication.

Having said that ActiveMQ is definitely an over kill to be used as a
journal, the main part that we actually need is
it's journal/store implementation called 'kahadb' but it is easy to
use through high level message queue interfaces.

I would like to spend some time on 'kahadb' source to see if that can
be easily embeddable and serves our purpose.

By no means am against a new implementation but I prefer to use some
battle tested component (for which kahadb perfectly
qualifies) if available.

-- 
Kiran Ayyagari