You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Ildar Absalyamov <il...@gmail.com> on 2017/01/19 03:02:58 UTC

Force LSM component flush & NC-CC messaging ACK

Hi devs,

Since I was out for quite a while and a lot of things happened in a meantime in a codebase I wanted to clarify couple of things.

I was wondering if there is any legitimate way to force the data of in-memory components to be flushed, other then stop the whole instance? 
It used to be that choosing a different default dataverse with “use” statement did that trick, but that is not the case anymore.

Another question is regarding CC<->NC & NC<->NC messaging. Does the sender get some kind of ACK that the message was received by the addressee? Say if I send a message just before the instance shutdown will the shutdown hook wait until the message is delivered and processed?

Best regards,
Ildar


Re: Force LSM component flush & NC-CC messaging ACK

Posted by Ildar Absalyamov <il...@gmail.com>.
As Mike mentioned I need this force flush to trigger the stats collecting during my experiments. I brought up messaging only because I noticed if I use shutdown for force flush some messages are lost due to CC being shutdown by the time they arrive.

Anyway this ConnectorAPI indeed did exactly that I wanted. Thanks Wail!
Given that we have an API way of forcing the flush, I am not sure if that the language level construct is need.

> On Jan 21, 2017, at 08:51, Wail Alkowaileet <wa...@gmail.com> wrote:
> 
> I remember one reason to enforce flush is for Preglix connector [1][2][3].
> 
> For the messaging framework, I believe that you probably have the same
> issue I had. I did what Till has suggested as it is guaranteed by the
> robustness of AsterixDB and not the user who might kill the process anyway.
> 
> [1]
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/main/java/org/apache/asterix/api/http/servlet/ConnectorAPIServlet.java
> [2]
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/main/java/org/apache/asterix/util/FlushDatasetUtils.java
> [3]
> https://github.com/apache/asterixdb/blob/2f9d4c3ab4d55598fe9a14fbf28faef12bed208b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/operators/std/FlushDatasetOperatorDescriptor.java
> 
> On Sat, Jan 21, 2017 at 7:17 PM, Mike Carey <dt...@gmail.com> wrote:
> 
>> I believe Ildar is just looking for a way to ensure, in doing experiments,
>> that things are all in disk components.  His stats-gathering extensions
>> camp on the LSM lifecycle - flushes in particular - and he wants to finish
>> that process in his testing and experiments.  Wail's schema inference stuff
>> has a similar flavor.  So the goal is to flush any lingering memory
>> components to disk for a given dataset at the end of the "experiment
>> lifecycle".
>> 
>> We have DDL to compact a dataset - which flushes AND compacts - it might
>> also be useful to have DDL to flush a dataset without also forcing
>> compaction - as a way for an administrator to release that dataset's
>> in-memory component related resources.  (Not that it's "necessary" for any
>> correctness reason - just might be nice to be able to do that.  That could
>> also be useful in scripting more user-level-oriented recovery tests.)
>> 
>> Thus, I'd likely vote for adding a harmless new DDL statement - another
>> arm of the one that supports compaction - for this.
>> 
>> Cheers,
>> 
>> Mike
>> 
>> 
>> 
>> On 1/21/17 6:21 AM, Till Westmann wrote:
>> 
>>> Hi Ildar,
>>> 
>>> On 19 Jan 2017, at 4:02, Ildar Absalyamov wrote:
>>> 
>>> Since I was out for quite a while and a lot of things happened in a
>>>> meantime in a codebase I wanted to clarify couple of things.
>>>> 
>>>> I was wondering if there is any legitimate way to force the data of
>>>> in-memory components to be flushed, other then stop the whole instance?
>>>> It used to be that choosing a different default dataverse with “use”
>>>> statement did that trick, but that is not the case anymore.
>>>> 
>>> 
>>> Just wondering, why do you want to flush the in-memory components to disk?
>>> 
>>> Another question is regarding CC<->NC & NC<->NC messaging. Does the
>>>> sender get some kind of ACK that the message was received by the addressee?
>>>> Say if I send a message just before the instance shutdown will the shutdown
>>>> hook wait until the message is delivered and processed?
>>>> 
>>> 
>>> I agree with Murtadha, that I can certainly be done. However, we also
>>> need to assume that some shutdowns won’t be clean and so the messages might
>>> not be received. So it might be easier to just be able to recover from
>>> missing messages than to be able to recover *and* to synchronize on
>>> shutdown. Just a thought - maybe that’s not even an issue for your use-case.
>>> 
>>> Cheers,
>>> Till
>>> 
>> 
>> 
> 
> 
> -- 
> 
> *Regards,*
> Wail Alkowaileet

Best regards,
Ildar


Re: Force LSM component flush & NC-CC messaging ACK

Posted by Wail Alkowaileet <wa...@gmail.com>.
I remember one reason to enforce flush is for Preglix connector [1][2][3].

For the messaging framework, I believe that you probably have the same
issue I had. I did what Till has suggested as it is guaranteed by the
robustness of AsterixDB and not the user who might kill the process anyway.

[1]
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/main/java/org/apache/asterix/api/http/servlet/ConnectorAPIServlet.java
[2]
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/main/java/org/apache/asterix/util/FlushDatasetUtils.java
[3]
https://github.com/apache/asterixdb/blob/2f9d4c3ab4d55598fe9a14fbf28faef12bed208b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/operators/std/FlushDatasetOperatorDescriptor.java

On Sat, Jan 21, 2017 at 7:17 PM, Mike Carey <dt...@gmail.com> wrote:

> I believe Ildar is just looking for a way to ensure, in doing experiments,
> that things are all in disk components.  His stats-gathering extensions
> camp on the LSM lifecycle - flushes in particular - and he wants to finish
> that process in his testing and experiments.  Wail's schema inference stuff
> has a similar flavor.  So the goal is to flush any lingering memory
> components to disk for a given dataset at the end of the "experiment
> lifecycle".
>
> We have DDL to compact a dataset - which flushes AND compacts - it might
> also be useful to have DDL to flush a dataset without also forcing
> compaction - as a way for an administrator to release that dataset's
> in-memory component related resources.  (Not that it's "necessary" for any
> correctness reason - just might be nice to be able to do that.  That could
> also be useful in scripting more user-level-oriented recovery tests.)
>
> Thus, I'd likely vote for adding a harmless new DDL statement - another
> arm of the one that supports compaction - for this.
>
> Cheers,
>
> Mike
>
>
>
> On 1/21/17 6:21 AM, Till Westmann wrote:
>
>> Hi Ildar,
>>
>> On 19 Jan 2017, at 4:02, Ildar Absalyamov wrote:
>>
>> Since I was out for quite a while and a lot of things happened in a
>>> meantime in a codebase I wanted to clarify couple of things.
>>>
>>> I was wondering if there is any legitimate way to force the data of
>>> in-memory components to be flushed, other then stop the whole instance?
>>> It used to be that choosing a different default dataverse with “use”
>>> statement did that trick, but that is not the case anymore.
>>>
>>
>> Just wondering, why do you want to flush the in-memory components to disk?
>>
>> Another question is regarding CC<->NC & NC<->NC messaging. Does the
>>> sender get some kind of ACK that the message was received by the addressee?
>>> Say if I send a message just before the instance shutdown will the shutdown
>>> hook wait until the message is delivered and processed?
>>>
>>
>> I agree with Murtadha, that I can certainly be done. However, we also
>> need to assume that some shutdowns won’t be clean and so the messages might
>> not be received. So it might be easier to just be able to recover from
>> missing messages than to be able to recover *and* to synchronize on
>> shutdown. Just a thought - maybe that’s not even an issue for your use-case.
>>
>> Cheers,
>> Till
>>
>
>


-- 

*Regards,*
Wail Alkowaileet

Re: Force LSM component flush & NC-CC messaging ACK

Posted by Mike Carey <dt...@gmail.com>.
I believe Ildar is just looking for a way to ensure, in doing 
experiments, that things are all in disk components.  His 
stats-gathering extensions camp on the LSM lifecycle - flushes in 
particular - and he wants to finish that process in his testing and 
experiments.  Wail's schema inference stuff has a similar flavor.  So 
the goal is to flush any lingering memory components to disk for a given 
dataset at the end of the "experiment lifecycle".

We have DDL to compact a dataset - which flushes AND compacts - it might 
also be useful to have DDL to flush a dataset without also forcing 
compaction - as a way for an administrator to release that dataset's 
in-memory component related resources.  (Not that it's "necessary" for 
any correctness reason - just might be nice to be able to do that.  That 
could also be useful in scripting more user-level-oriented recovery tests.)

Thus, I'd likely vote for adding a harmless new DDL statement - another 
arm of the one that supports compaction - for this.

Cheers,

Mike


On 1/21/17 6:21 AM, Till Westmann wrote:
> Hi Ildar,
>
> On 19 Jan 2017, at 4:02, Ildar Absalyamov wrote:
>
>> Since I was out for quite a while and a lot of things happened in a 
>> meantime in a codebase I wanted to clarify couple of things.
>>
>> I was wondering if there is any legitimate way to force the data of 
>> in-memory components to be flushed, other then stop the whole instance?
>> It used to be that choosing a different default dataverse with \u201cuse\u201d 
>> statement did that trick, but that is not the case anymore.
>
> Just wondering, why do you want to flush the in-memory components to 
> disk?
>
>> Another question is regarding CC<->NC & NC<->NC messaging. Does the 
>> sender get some kind of ACK that the message was received by the 
>> addressee? Say if I send a message just before the instance shutdown 
>> will the shutdown hook wait until the message is delivered and 
>> processed?
>
> I agree with Murtadha, that I can certainly be done. However, we also 
> need to assume that some shutdowns won\u2019t be clean and so the messages 
> might not be received. So it might be easier to just be able to 
> recover from missing messages than to be able to recover *and* to 
> synchronize on shutdown. Just a thought - maybe that\u2019s not even an 
> issue for your use-case.
>
> Cheers,
> Till


Re: Force LSM component flush & NC-CC messaging ACK

Posted by Till Westmann <ti...@apache.org>.
Hi Ildar,

On 19 Jan 2017, at 4:02, Ildar Absalyamov wrote:

> Since I was out for quite a while and a lot of things happened in a 
> meantime in a codebase I wanted to clarify couple of things.
>
> I was wondering if there is any legitimate way to force the data of 
> in-memory components to be flushed, other then stop the whole 
> instance?
> It used to be that choosing a different default dataverse with 
> \u201cuse\u201d statement did that trick, but that is not the case anymore.

Just wondering, why do you want to flush the in-memory components to 
disk?

> Another question is regarding CC<->NC & NC<->NC messaging. Does the 
> sender get some kind of ACK that the message was received by the 
> addressee? Say if I send a message just before the instance shutdown 
> will the shutdown hook wait until the message is delivered and 
> processed?

I agree with Murtadha, that I can certainly be done. However, we also 
need to assume that some shutdowns won\u2019t be clean and so the messages 
might not be received. So it might be easier to just be able to recover 
from missing messages than to be able to recover *and* to synchronize on 
shutdown. Just a thought - maybe that\u2019s not even an issue for your 
use-case.

Cheers,
Till

Re: Force LSM component flush & NC-CC messaging ACK

Posted by Murtadha Hubail <hu...@gmail.com>.
Hi Ildar,

I remember there was a compact dataset command that forces the flush, but I have never used it. Alternatively, you can simply add an HTTP cluster API to be used during development and send a message to all NCs to flush all datasets upon receiving the request.

As for the message ACK during instance shutdown, it is not there by default but you can implement it yourself. For example, you can force an NC to wait - as part of stopping a life cycle component (e.g. StatisticsManager) -  until it receives some kind of a message, which indicates that all messages have been processed, from other NCs or the CC.

Cheers,
Murtadha

> On Jan 19, 2017, at 6:02 AM, Ildar Absalyamov <il...@gmail.com> wrote:
> 
> Hi devs,
> 
> Since I was out for quite a while and a lot of things happened in a meantime in a codebase I wanted to clarify couple of things.
> 
> I was wondering if there is any legitimate way to force the data of in-memory components to be flushed, other then stop the whole instance? 
> It used to be that choosing a different default dataverse with “use” statement did that trick, but that is not the case anymore.
> 
> Another question is regarding CC<->NC & NC<->NC messaging. Does the sender get some kind of ACK that the message was received by the addressee? Say if I send a message just before the instance shutdown will the shutdown hook wait until the message is delivered and processed?
> 
> Best regards,
> Ildar
>