You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@servicecomb.apache.org by Zheng Feng <zh...@gmail.com> on 2018/01/27 15:53:44 UTC

Re: [Discussion] How to make sure events are handled only once amongdifferent stateless Saga pack alphas

The different alpha server I assumes that it has the different name, So we
can insert the name of the alpha server in the TxEvent record.
When the alpha server is scanning the TxEvent records for the time out
handling, it could only select these ones match the alpha name.
It looks like that we don't need the lock here and it has to make sure the
alpha server name is unique.

2018-01-27 14:36 GMT+08:00 郑扬勇 <ya...@qq.com>:

> It seems all solution need import "lock";
>  If one event can only handle by one alpha at the same time,we may need
> election mechanism ?
>
>   ------------------ 原始邮件 ------------------
>   发件人: "Eric Lee";<er...@gmail.com>;
>  发送时间: 2018年1月26日(星期五) 上午10:58
>  收件人: "dev"<de...@servicecomb.incubator.apache.org>;
>
>  主题: [Discussion] How to make sure events are handled only once
> amongdifferent stateless Saga pack alphas
>
>
>
> Background
> Currently, the transaction timeout is controlled by omega which makes omega
> stateful. Being stateful makes omega recovery relies greatly on the
> previous states. Hence, we need to move the timeout management from omega
> to alpha to simplify implementation of omega. After that, omega will be a
> stateless agent.
>
> Difficulty
> How to make sure each timeout record are handled only once globally by
> multiple alpha servers? Each alpha server is also stateless. All states are
> stored in database. Alpha will scan the timeout events and handles them one
> by one periodically. Different alpha may process the same event at the same
> time which should be avoided because each event should be handled only
> once.
>
> Possible Solutions:
> 1. Add a expireTime column in TxEvent entity. Then lock the access to the
> timeout event to avoid concurrent access to the same event. Since TxEvent
> may involves many operations, adding the lock may introduce latency in
> other transaction.
> 2. Create a new entity like the Command entity. Then lock the access to
> this entity and update the status asynchronously when it is done.
> 3. Register timeout settings to alpha whenever omega starts. Then query
> TxEvent and ServiceConfig table to find out timeout events. This way still
> can not make sure each event is handled once as the range of the lock is
> too wide to target at a specific event.
>
> However, the above solutions still not perfect for the problem because the
> lock will become invalid as soon as the query is done and another alpha may
> query from database and process the same event before the timeout event
> being processed by the previous alpha.
>
> Current implementation details can move forward to
> https://github.com/apache/incubator-servicecomb-saga/pull/122 .
>
> Any suggestion is welcome.
>
>
> Best Regards!
> Eric Lee
>

Re: [Discussion] How to make sure events are handled only once amongdifferent stateless Saga pack alphas

Posted by Yang Bo <oa...@gmail.com>.

Hi,

Using the database as the locking mechanism may have performance issues. We
will need to lock the whole table for picking a new task to begin.

We will need a way to synchronize the jobs' status
1. Use a master-worker model. The master communicate with the database and
dispatch jobs to the workers. This model is simple to understand and
implement. But it may face performance issues in the master node. There are
a lot of communication between the master and workers. And when the number
of tasks is large, the syncrhonization between the workers and master will
block and affect the whole system's performance.

2. Use a third-parties for distributed locking, like etcd or redis. As we
use a sql database for storing data already, redis may works better for us.
Or we can just use etcd to replace the sql database.

3. Implement a distributed lock. Seems an overkill.


On Sun, Jan 28, 2018 at 10:52 AM, Eric Lee <er...@gmail.com> wrote:

> I guess we can add a status column in the timeout table. It has three
> types: NEW, PENDING, DONE. When event starts, the status turns NEW. When
> the EventScanner detects the timeout event, it sets the status to PENDING.
> When another EventScanner scans the same timeout event, if it can not
> update its status to PENDING, it will skip this event.
>
> 2018-01-28 9:51 GMT+08:00 Willem Jiang <wi...@gmail.com>:
>
> > Yeah, this could help us resolve the issue that different alpha server
> > check the same TxEvent.
> > But what if some of the alpha is offline,  the timeout message cannot be
> > handled any more.
> > Maybe we can create a leader from the alpha server to do job assign work,
> > and supervise the timeout event.
> >
> >
> > Willem Jiang
> >
> > Blog: http://willemjiang.blogspot.com (English)
> >           http://jnn.iteye.com  (Chinese)
> > Twitter: willemjiang
> > Weibo: 姜宁willem
> >
> > On Sat, Jan 27, 2018 at 11:53 PM, Zheng Feng <zh...@gmail.com> wrote:
> >
> > > The different alpha server I assumes that it has the different name, So
> > we
> > > can insert the name of the alpha server in the TxEvent record.
> > > When the alpha server is scanning the TxEvent records for the time out
> > > handling, it could only select these ones match the alpha name.
> > > It looks like that we don't need the lock here and it has to make sure
> > the
> > > alpha server name is unique.
> > >
> > > 2018-01-27 14:36 GMT+08:00 郑扬勇 <ya...@qq.com>:
> > >
> > > > It seems all solution need import "lock";
> > > >  If one event can only handle by one alpha at the same time,we may
> need
> > > > election mechanism ?
> > > >
> > > >   ------------------ 原始邮件 ------------------
> > > >   发件人: "Eric Lee";<er...@gmail.com>;
> > > >  发送时间: 2018年1月26日(星期五) 上午10:58
> > > >  收件人: "dev"<de...@servicecomb.incubator.apache.org>;
> > > >
> > > >  主题: [Discussion] How to make sure events are handled only once
> > > > amongdifferent stateless Saga pack alphas
> > > >
> > > >
> > > >
> > > > Background
> > > > Currently, the transaction timeout is controlled by omega which makes
> > > omega
> > > > stateful. Being stateful makes omega recovery relies greatly on the
> > > > previous states. Hence, we need to move the timeout management from
> > omega
> > > > to alpha to simplify implementation of omega. After that, omega will
> > be a
> > > > stateless agent.
> > > >
> > > > Difficulty
> > > > How to make sure each timeout record are handled only once globally
> by
> > > > multiple alpha servers? Each alpha server is also stateless. All
> states
> > > are
> > > > stored in database. Alpha will scan the timeout events and handles
> them
> > > one
> > > > by one periodically. Different alpha may process the same event at
> the
> > > same
> > > > time which should be avoided because each event should be handled
> only
> > > > once.
> > > >
> > > > Possible Solutions:
> > > > 1. Add a expireTime column in TxEvent entity. Then lock the access to
> > the
> > > > timeout event to avoid concurrent access to the same event. Since
> > TxEvent
> > > > may involves many operations, adding the lock may introduce latency
> in
> > > > other transaction.
> > > > 2. Create a new entity like the Command entity. Then lock the access
> to
> > > > this entity and update the status asynchronously when it is done.
> > > > 3. Register timeout settings to alpha whenever omega starts. Then
> query
> > > > TxEvent and ServiceConfig table to find out timeout events. This way
> > > still
> > > > can not make sure each event is handled once as the range of the lock
> > is
> > > > too wide to target at a specific event.
> > > >
> > > > However, the above solutions still not perfect for the problem
> because
> > > the
> > > > lock will become invalid as soon as the query is done and another
> alpha
> > > may
> > > > query from database and process the same event before the timeout
> event
> > > > being processed by the previous alpha.
> > > >
> > > > Current implementation details can move forward to
> > > > https://github.com/apache/incubator-servicecomb-saga/pull/122 .
> > > >
> > > > Any suggestion is welcome.
> > > >
> > > >
> > > > Best Regards!
> > > > Eric Lee
> > > >
> > >
> >
>



-- 
Yang,
Best Regards

Re: [Discussion] How to make sure events are handled only once amongdifferent stateless Saga pack alphas

Posted by Eric Lee <er...@gmail.com>.

I guess we can add a status column in the timeout table. It has three
types: NEW, PENDING, DONE. When event starts, the status turns NEW. When
the EventScanner detects the timeout event, it sets the status to PENDING.
When another EventScanner scans the same timeout event, if it can not
update its status to PENDING, it will skip this event.

2018-01-28 9:51 GMT+08:00 Willem Jiang <wi...@gmail.com>:

> Yeah, this could help us resolve the issue that different alpha server
> check the same TxEvent.
> But what if some of the alpha is offline,  the timeout message cannot be
> handled any more.
> Maybe we can create a leader from the alpha server to do job assign work,
> and supervise the timeout event.
>
>
> Willem Jiang
>
> Blog: http://willemjiang.blogspot.com (English)
>           http://jnn.iteye.com  (Chinese)
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Sat, Jan 27, 2018 at 11:53 PM, Zheng Feng <zh...@gmail.com> wrote:
>
> > The different alpha server I assumes that it has the different name, So
> we
> > can insert the name of the alpha server in the TxEvent record.
> > When the alpha server is scanning the TxEvent records for the time out
> > handling, it could only select these ones match the alpha name.
> > It looks like that we don't need the lock here and it has to make sure
> the
> > alpha server name is unique.
> >
> > 2018-01-27 14:36 GMT+08:00 郑扬勇 <ya...@qq.com>:
> >
> > > It seems all solution need import "lock";
> > >  If one event can only handle by one alpha at the same time,we may need
> > > election mechanism ?
> > >
> > >   ------------------ 原始邮件 ------------------
> > >   发件人: "Eric Lee";<er...@gmail.com>;
> > >  发送时间: 2018年1月26日(星期五) 上午10:58
> > >  收件人: "dev"<de...@servicecomb.incubator.apache.org>;
> > >
> > >  主题: [Discussion] How to make sure events are handled only once
> > > amongdifferent stateless Saga pack alphas
> > >
> > >
> > >
> > > Background
> > > Currently, the transaction timeout is controlled by omega which makes
> > omega
> > > stateful. Being stateful makes omega recovery relies greatly on the
> > > previous states. Hence, we need to move the timeout management from
> omega
> > > to alpha to simplify implementation of omega. After that, omega will
> be a
> > > stateless agent.
> > >
> > > Difficulty
> > > How to make sure each timeout record are handled only once globally by
> > > multiple alpha servers? Each alpha server is also stateless. All states
> > are
> > > stored in database. Alpha will scan the timeout events and handles them
> > one
> > > by one periodically. Different alpha may process the same event at the
> > same
> > > time which should be avoided because each event should be handled only
> > > once.
> > >
> > > Possible Solutions:
> > > 1. Add a expireTime column in TxEvent entity. Then lock the access to
> the
> > > timeout event to avoid concurrent access to the same event. Since
> TxEvent
> > > may involves many operations, adding the lock may introduce latency in
> > > other transaction.
> > > 2. Create a new entity like the Command entity. Then lock the access to
> > > this entity and update the status asynchronously when it is done.
> > > 3. Register timeout settings to alpha whenever omega starts. Then query
> > > TxEvent and ServiceConfig table to find out timeout events. This way
> > still
> > > can not make sure each event is handled once as the range of the lock
> is
> > > too wide to target at a specific event.
> > >
> > > However, the above solutions still not perfect for the problem because
> > the
> > > lock will become invalid as soon as the query is done and another alpha
> > may
> > > query from database and process the same event before the timeout event
> > > being processed by the previous alpha.
> > >
> > > Current implementation details can move forward to
> > > https://github.com/apache/incubator-servicecomb-saga/pull/122 .
> > >
> > > Any suggestion is welcome.
> > >
> > >
> > > Best Regards!
> > > Eric Lee
> > >
> >
>

Re: [Discussion] How to make sure events are handled only once amongdifferent stateless Saga pack alphas

Posted by Zheng Feng <zh...@gmail.com>.

Hi Willem,

2018-01-28 9:51 GMT+08:00 Willem Jiang <wi...@gmail.com>:

> Yeah, this could help us resolve the issue that different alpha server
> check the same TxEvent.
> But what if some of the alpha is offline,  the timeout message cannot be
> handled any more.
>
 OK, we could restart the alpha with the same server name as I think it is
might be the part of the recovery processing. Anyway, it would be an other
immigrate process to update the server name to the available alpha ones if
we can not restart the dead ones.

Maybe we can create a leader from the alpha server to do job assign work,
> and supervise the timeout event.
>
yeah, I think there was a similar leader election which using in the camel
cluster [1]

[1]
https://github.com/nicolaferraro/spring-boot-camel-narayana-scalable/blob/master/src/main/java/com/example/CustomNarayanaRecoveryManagerBean.java

>
>
> Willem Jiang
>
> Blog: http://willemjiang.blogspot.com (English)
>           http://jnn.iteye.com  (Chinese)
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Sat, Jan 27, 2018 at 11:53 PM, Zheng Feng <zh...@gmail.com> wrote:
>
> > The different alpha server I assumes that it has the different name, So
> we
> > can insert the name of the alpha server in the TxEvent record.
> > When the alpha server is scanning the TxEvent records for the time out
> > handling, it could only select these ones match the alpha name.
> > It looks like that we don't need the lock here and it has to make sure
> the
> > alpha server name is unique.
> >
> > 2018-01-27 14:36 GMT+08:00 郑扬勇 <ya...@qq.com>:
> >
> > > It seems all solution need import "lock";
> > >  If one event can only handle by one alpha at the same time,we may need
> > > election mechanism ?
> > >
> > >   ------------------ 原始邮件 ------------------
> > >   发件人: "Eric Lee";<er...@gmail.com>;
> > >  发送时间: 2018年1月26日(星期五) 上午10:58
> > >  收件人: "dev"<de...@servicecomb.incubator.apache.org>;
> > >
> > >  主题: [Discussion] How to make sure events are handled only once
> > > amongdifferent stateless Saga pack alphas
> > >
> > >
> > >
> > > Background
> > > Currently, the transaction timeout is controlled by omega which makes
> > omega
> > > stateful. Being stateful makes omega recovery relies greatly on the
> > > previous states. Hence, we need to move the timeout management from
> omega
> > > to alpha to simplify implementation of omega. After that, omega will
> be a
> > > stateless agent.
> > >
> > > Difficulty
> > > How to make sure each timeout record are handled only once globally by
> > > multiple alpha servers? Each alpha server is also stateless. All states
> > are
> > > stored in database. Alpha will scan the timeout events and handles them
> > one
> > > by one periodically. Different alpha may process the same event at the
> > same
> > > time which should be avoided because each event should be handled only
> > > once.
> > >
> > > Possible Solutions:
> > > 1. Add a expireTime column in TxEvent entity. Then lock the access to
> the
> > > timeout event to avoid concurrent access to the same event. Since
> TxEvent
> > > may involves many operations, adding the lock may introduce latency in
> > > other transaction.
> > > 2. Create a new entity like the Command entity. Then lock the access to
> > > this entity and update the status asynchronously when it is done.
> > > 3. Register timeout settings to alpha whenever omega starts. Then query
> > > TxEvent and ServiceConfig table to find out timeout events. This way
> > still
> > > can not make sure each event is handled once as the range of the lock
> is
> > > too wide to target at a specific event.
> > >
> > > However, the above solutions still not perfect for the problem because
> > the
> > > lock will become invalid as soon as the query is done and another alpha
> > may
> > > query from database and process the same event before the timeout event
> > > being processed by the previous alpha.
> > >
> > > Current implementation details can move forward to
> > > https://github.com/apache/incubator-servicecomb-saga/pull/122 .
> > >
> > > Any suggestion is welcome.
> > >
> > >
> > > Best Regards!
> > > Eric Lee
> > >
> >
>

Re: [Discussion] How to make sure events are handled only once amongdifferent stateless Saga pack alphas

Posted by Willem Jiang <wi...@gmail.com>.

Yeah, this could help us resolve the issue that different alpha server
check the same TxEvent.
But what if some of the alpha is offline,  the timeout message cannot be
handled any more.
Maybe we can create a leader from the alpha server to do job assign work,
and supervise the timeout event.


Willem Jiang

Blog: http://willemjiang.blogspot.com (English)
          http://jnn.iteye.com  (Chinese)
Twitter: willemjiang
Weibo: 姜宁willem

On Sat, Jan 27, 2018 at 11:53 PM, Zheng Feng <zh...@gmail.com> wrote:

> The different alpha server I assumes that it has the different name, So we
> can insert the name of the alpha server in the TxEvent record.
> When the alpha server is scanning the TxEvent records for the time out
> handling, it could only select these ones match the alpha name.
> It looks like that we don't need the lock here and it has to make sure the
> alpha server name is unique.
>
> 2018-01-27 14:36 GMT+08:00 郑扬勇 <ya...@qq.com>:
>
> > It seems all solution need import "lock";
> >  If one event can only handle by one alpha at the same time,we may need
> > election mechanism ?
> >
> >   ------------------ 原始邮件 ------------------
> >   发件人: "Eric Lee";<er...@gmail.com>;
> >  发送时间: 2018年1月26日(星期五) 上午10:58
> >  收件人: "dev"<de...@servicecomb.incubator.apache.org>;
> >
> >  主题: [Discussion] How to make sure events are handled only once
> > amongdifferent stateless Saga pack alphas
> >
> >
> >
> > Background
> > Currently, the transaction timeout is controlled by omega which makes
> omega
> > stateful. Being stateful makes omega recovery relies greatly on the
> > previous states. Hence, we need to move the timeout management from omega
> > to alpha to simplify implementation of omega. After that, omega will be a
> > stateless agent.
> >
> > Difficulty
> > How to make sure each timeout record are handled only once globally by
> > multiple alpha servers? Each alpha server is also stateless. All states
> are
> > stored in database. Alpha will scan the timeout events and handles them
> one
> > by one periodically. Different alpha may process the same event at the
> same
> > time which should be avoided because each event should be handled only
> > once.
> >
> > Possible Solutions:
> > 1. Add a expireTime column in TxEvent entity. Then lock the access to the
> > timeout event to avoid concurrent access to the same event. Since TxEvent
> > may involves many operations, adding the lock may introduce latency in
> > other transaction.
> > 2. Create a new entity like the Command entity. Then lock the access to
> > this entity and update the status asynchronously when it is done.
> > 3. Register timeout settings to alpha whenever omega starts. Then query
> > TxEvent and ServiceConfig table to find out timeout events. This way
> still
> > can not make sure each event is handled once as the range of the lock is
> > too wide to target at a specific event.
> >
> > However, the above solutions still not perfect for the problem because
> the
> > lock will become invalid as soon as the query is done and another alpha
> may
> > query from database and process the same event before the timeout event
> > being processed by the previous alpha.
> >
> > Current implementation details can move forward to
> > https://github.com/apache/incubator-servicecomb-saga/pull/122 .
> >
> > Any suggestion is welcome.
> >
> >
> > Best Regards!
> > Eric Lee
> >
>