You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com> on 2012/07/25 12:32:31 UTC

Transactional Multiplex (fan out) Sink

Hi,

In our system, we need to fan out the aggregated flow to several destinations. Usually the flow to each destination is identical.

There is a nice feature of NG, the "multiplexing flow", which can satisfy our requirements. It is implemented by using separated channels, which is easy to do transaction control.

But in our case, the fan out is replicating in most cases. If using the current "Replicating to Channels" configuration, we will get several identical channels on the same host, which may consume a large amount resources (memory, disk, etc.). The performance may possibly drop. And the events to each destination may not be synchronized.

I read NG source, I think I could move the multiplex from Channel to Sink, that is, using single Channel, fan out to different Sinks, which may solve the problems (resource usage, performance, event synchronization) of multiple Channels.

I see that there are "LoadBalancingSinkProcessor" and "SinkSelector" classes, but they cannot be used to achieve the target of replicating events from one Channel to different Sinks.

The following is an optional implementation of the Transactional Multiplex (or fan out) Sink:

  1.  Add a Transactional Multiplex Sink Processor, which will group the operations of all fan out Sinks into one transaction, and use a certain policy to commit the transaction.
  2.  Add MultiplexSink, which simply processes the Events and report status, no transaction.
  3.  Add "peek()" and "remove()" to Channel and Transaction.

The policy of committing a transaction can be defined as follow (suppose we have N Sinks) :

  1.  When M(0=<M<=N) Sinks succeed;
  2.  When specified M(0<M<=N) Sinks (important sinks) succeed.

A selector can also be used by Transactional Multiplex Sink Processor to filter the events for some Sinks (Optional).

And this can be combined with the existing Multiplex Channel Flow: Multiplex events into different Channels, each Channel can replicate to different Sinks.

Would like to hear your suggestions firstly.
If it is reasonable, I will create a ticket in JIRA and provide the patch for review.

Cheers,
Yongkun Wang (Kun)

Re: Transactional Multiplex (fan out) Sink

Posted by "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com>.
Hi Brock, 

Thanks for commenting on this post.

If the contents of each channel is different, using multiple channel makes
sense.
But if the source simply replicating the event to several channels,
maintaining a bunch of identical channels on the same host may not be an
optimized solution. So I want to try the proposed fan out sink.

Yongkun

On 12/07/31 21:43, "Brock Noland" <br...@cloudera.com> wrote:

>I would create a JIRA for this. FWIW, many people run multiple
>channels per host so I don't necessarily agree with this:
>
>"But in our case, the fan out is replicating in most cases. If using
>the current "Replicating to Channels" configuration, we will get
>several identical channels on the same host, which may consume a large
>amount resources (memory, disk, etc.). The performance may possibly
>drop."
>
>Brock
>
>On Tue, Jul 31, 2012 at 5:09 AM, Wang, Yongkun | Yongkun | BDD
><yo...@mail.rakuten.com> wrote:
>> Any suggestions on this proposal? It won't break current design.
>>
>> Regards,
>> Yongkun Wang
>>
>> From: <Wang>, "Wang, Yongkun | Yongkun | BDD"
>><yo...@mail.rakuten.com>>
>> Reply-To: "user@flume.apache.org<ma...@flume.apache.org>"
>><us...@flume.apache.org>>
>> Date: 2012年7月25日水曜日 19:32
>> To: "dev@flume.apache.org<ma...@flume.apache.org>"
>><de...@flume.apache.org>>
>> Cc: "user@flume.apache.org<ma...@flume.apache.org>"
>><us...@flume.apache.org>>
>> Subject: Transactional Multiplex (fan out) Sink
>>
>> Hi,
>>
>> In our system, we need to fan out the aggregated flow to several
>>destinations. Usually the flow to each destination is identical.
>>
>> There is a nice feature of NG, the "multiplexing flow", which can
>>satisfy our requirements. It is implemented by using separated channels,
>>which is easy to do transaction control.
>>
>> But in our case, the fan out is replicating in most cases. If using the
>>current "Replicating to Channels" configuration, we will get several
>>identical channels on the same host, which may consume a large amount
>>resources (memory, disk, etc.). The performance may possibly drop. And
>>the events to each destination may not be synchronized.
>>
>> I read NG source, I think I could move the multiplex from Channel to
>>Sink, that is, using single Channel, fan out to different Sinks, which
>>may solve the problems (resource usage, performance, event
>>synchronization) of multiple Channels.
>>
>> I see that there are "LoadBalancingSinkProcessor" and "SinkSelector"
>>classes, but they cannot be used to achieve the target of replicating
>>events from one Channel to different Sinks.
>>
>> The following is an optional implementation of the Transactional
>>Multiplex (or fan out) Sink:
>>
>>   1.  Add a Transactional Multiplex Sink Processor, which will group
>>the operations of all fan out Sinks into one transaction, and use a
>>certain policy to commit the transaction.
>>   2.  Add MultiplexSink, which simply processes the Events and report
>>status, no transaction.
>>   3.  Add "peek()" and "remove()" to Channel and Transaction.
>>
>> The policy of committing a transaction can be defined as follow
>>(suppose we have N Sinks) :
>>
>>   1.  When M(0=<M<=N) Sinks succeed;
>>   2.  When specified M(0<M<=N) Sinks (important sinks) succeed.
>>
>> A selector can also be used by Transactional Multiplex Sink Processor
>>to filter the events for some Sinks (Optional).
>>
>> And this can be combined with the existing Multiplex Channel Flow:
>>Multiplex events into different Channels, each Channel can replicate to
>>different Sinks.
>>
>> Would like to hear your suggestions firstly.
>> If it is reasonable, I will create a ticket in JIRA and provide the
>>patch for review.
>>
>> Cheers,
>> Yongkun Wang (Kun)
>
>
>
>-- 
>Apache MRUnit - Unit testing MapReduce -
>http://incubator.apache.org/mrunit/



Re: Transactional Multiplex (fan out) Sink

Posted by "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com>.
Hi Brock, 

Thanks for commenting on this post.

If the contents of each channel is different, using multiple channel makes
sense.
But if the source simply replicating the event to several channels,
maintaining a bunch of identical channels on the same host may not be an
optimized solution. So I want to try the proposed fan out sink.

Yongkun

On 12/07/31 21:43, "Brock Noland" <br...@cloudera.com> wrote:

>I would create a JIRA for this. FWIW, many people run multiple
>channels per host so I don't necessarily agree with this:
>
>"But in our case, the fan out is replicating in most cases. If using
>the current "Replicating to Channels" configuration, we will get
>several identical channels on the same host, which may consume a large
>amount resources (memory, disk, etc.). The performance may possibly
>drop."
>
>Brock
>
>On Tue, Jul 31, 2012 at 5:09 AM, Wang, Yongkun | Yongkun | BDD
><yo...@mail.rakuten.com> wrote:
>> Any suggestions on this proposal? It won't break current design.
>>
>> Regards,
>> Yongkun Wang
>>
>> From: <Wang>, "Wang, Yongkun | Yongkun | BDD"
>><yo...@mail.rakuten.com>>
>> Reply-To: "user@flume.apache.org<ma...@flume.apache.org>"
>><us...@flume.apache.org>>
>> Date: 2012年7月25日水曜日 19:32
>> To: "dev@flume.apache.org<ma...@flume.apache.org>"
>><de...@flume.apache.org>>
>> Cc: "user@flume.apache.org<ma...@flume.apache.org>"
>><us...@flume.apache.org>>
>> Subject: Transactional Multiplex (fan out) Sink
>>
>> Hi,
>>
>> In our system, we need to fan out the aggregated flow to several
>>destinations. Usually the flow to each destination is identical.
>>
>> There is a nice feature of NG, the "multiplexing flow", which can
>>satisfy our requirements. It is implemented by using separated channels,
>>which is easy to do transaction control.
>>
>> But in our case, the fan out is replicating in most cases. If using the
>>current "Replicating to Channels" configuration, we will get several
>>identical channels on the same host, which may consume a large amount
>>resources (memory, disk, etc.). The performance may possibly drop. And
>>the events to each destination may not be synchronized.
>>
>> I read NG source, I think I could move the multiplex from Channel to
>>Sink, that is, using single Channel, fan out to different Sinks, which
>>may solve the problems (resource usage, performance, event
>>synchronization) of multiple Channels.
>>
>> I see that there are "LoadBalancingSinkProcessor" and "SinkSelector"
>>classes, but they cannot be used to achieve the target of replicating
>>events from one Channel to different Sinks.
>>
>> The following is an optional implementation of the Transactional
>>Multiplex (or fan out) Sink:
>>
>>   1.  Add a Transactional Multiplex Sink Processor, which will group
>>the operations of all fan out Sinks into one transaction, and use a
>>certain policy to commit the transaction.
>>   2.  Add MultiplexSink, which simply processes the Events and report
>>status, no transaction.
>>   3.  Add "peek()" and "remove()" to Channel and Transaction.
>>
>> The policy of committing a transaction can be defined as follow
>>(suppose we have N Sinks) :
>>
>>   1.  When M(0=<M<=N) Sinks succeed;
>>   2.  When specified M(0<M<=N) Sinks (important sinks) succeed.
>>
>> A selector can also be used by Transactional Multiplex Sink Processor
>>to filter the events for some Sinks (Optional).
>>
>> And this can be combined with the existing Multiplex Channel Flow:
>>Multiplex events into different Channels, each Channel can replicate to
>>different Sinks.
>>
>> Would like to hear your suggestions firstly.
>> If it is reasonable, I will create a ticket in JIRA and provide the
>>patch for review.
>>
>> Cheers,
>> Yongkun Wang (Kun)
>
>
>
>-- 
>Apache MRUnit - Unit testing MapReduce -
>http://incubator.apache.org/mrunit/



Re: Transactional Multiplex (fan out) Sink

Posted by Brock Noland <br...@cloudera.com>.
I would create a JIRA for this. FWIW, many people run multiple
channels per host so I don't necessarily agree with this:

"But in our case, the fan out is replicating in most cases. If using
the current "Replicating to Channels" configuration, we will get
several identical channels on the same host, which may consume a large
amount resources (memory, disk, etc.). The performance may possibly
drop."

Brock

On Tue, Jul 31, 2012 at 5:09 AM, Wang, Yongkun | Yongkun | BDD
<yo...@mail.rakuten.com> wrote:
> Any suggestions on this proposal? It won't break current design.
>
> Regards,
> Yongkun Wang
>
> From: <Wang>, "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com>>
> Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
> Date: 2012年7月25日水曜日 19:32
> To: "dev@flume.apache.org<ma...@flume.apache.org>" <de...@flume.apache.org>>
> Cc: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
> Subject: Transactional Multiplex (fan out) Sink
>
> Hi,
>
> In our system, we need to fan out the aggregated flow to several destinations. Usually the flow to each destination is identical.
>
> There is a nice feature of NG, the "multiplexing flow", which can satisfy our requirements. It is implemented by using separated channels, which is easy to do transaction control.
>
> But in our case, the fan out is replicating in most cases. If using the current "Replicating to Channels" configuration, we will get several identical channels on the same host, which may consume a large amount resources (memory, disk, etc.). The performance may possibly drop. And the events to each destination may not be synchronized.
>
> I read NG source, I think I could move the multiplex from Channel to Sink, that is, using single Channel, fan out to different Sinks, which may solve the problems (resource usage, performance, event synchronization) of multiple Channels.
>
> I see that there are "LoadBalancingSinkProcessor" and "SinkSelector" classes, but they cannot be used to achieve the target of replicating events from one Channel to different Sinks.
>
> The following is an optional implementation of the Transactional Multiplex (or fan out) Sink:
>
>   1.  Add a Transactional Multiplex Sink Processor, which will group the operations of all fan out Sinks into one transaction, and use a certain policy to commit the transaction.
>   2.  Add MultiplexSink, which simply processes the Events and report status, no transaction.
>   3.  Add "peek()" and "remove()" to Channel and Transaction.
>
> The policy of committing a transaction can be defined as follow (suppose we have N Sinks) :
>
>   1.  When M(0=<M<=N) Sinks succeed;
>   2.  When specified M(0<M<=N) Sinks (important sinks) succeed.
>
> A selector can also be used by Transactional Multiplex Sink Processor to filter the events for some Sinks (Optional).
>
> And this can be combined with the existing Multiplex Channel Flow: Multiplex events into different Channels, each Channel can replicate to different Sinks.
>
> Would like to hear your suggestions firstly.
> If it is reasonable, I will create a ticket in JIRA and provide the patch for review.
>
> Cheers,
> Yongkun Wang (Kun)



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Transactional Multiplex (fan out) Sink

Posted by Brock Noland <br...@cloudera.com>.
I would create a JIRA for this. FWIW, many people run multiple
channels per host so I don't necessarily agree with this:

"But in our case, the fan out is replicating in most cases. If using
the current "Replicating to Channels" configuration, we will get
several identical channels on the same host, which may consume a large
amount resources (memory, disk, etc.). The performance may possibly
drop."

Brock

On Tue, Jul 31, 2012 at 5:09 AM, Wang, Yongkun | Yongkun | BDD
<yo...@mail.rakuten.com> wrote:
> Any suggestions on this proposal? It won't break current design.
>
> Regards,
> Yongkun Wang
>
> From: <Wang>, "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com>>
> Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
> Date: 2012年7月25日水曜日 19:32
> To: "dev@flume.apache.org<ma...@flume.apache.org>" <de...@flume.apache.org>>
> Cc: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
> Subject: Transactional Multiplex (fan out) Sink
>
> Hi,
>
> In our system, we need to fan out the aggregated flow to several destinations. Usually the flow to each destination is identical.
>
> There is a nice feature of NG, the "multiplexing flow", which can satisfy our requirements. It is implemented by using separated channels, which is easy to do transaction control.
>
> But in our case, the fan out is replicating in most cases. If using the current "Replicating to Channels" configuration, we will get several identical channels on the same host, which may consume a large amount resources (memory, disk, etc.). The performance may possibly drop. And the events to each destination may not be synchronized.
>
> I read NG source, I think I could move the multiplex from Channel to Sink, that is, using single Channel, fan out to different Sinks, which may solve the problems (resource usage, performance, event synchronization) of multiple Channels.
>
> I see that there are "LoadBalancingSinkProcessor" and "SinkSelector" classes, but they cannot be used to achieve the target of replicating events from one Channel to different Sinks.
>
> The following is an optional implementation of the Transactional Multiplex (or fan out) Sink:
>
>   1.  Add a Transactional Multiplex Sink Processor, which will group the operations of all fan out Sinks into one transaction, and use a certain policy to commit the transaction.
>   2.  Add MultiplexSink, which simply processes the Events and report status, no transaction.
>   3.  Add "peek()" and "remove()" to Channel and Transaction.
>
> The policy of committing a transaction can be defined as follow (suppose we have N Sinks) :
>
>   1.  When M(0=<M<=N) Sinks succeed;
>   2.  When specified M(0<M<=N) Sinks (important sinks) succeed.
>
> A selector can also be used by Transactional Multiplex Sink Processor to filter the events for some Sinks (Optional).
>
> And this can be combined with the existing Multiplex Channel Flow: Multiplex events into different Channels, each Channel can replicate to different Sinks.
>
> Would like to hear your suggestions firstly.
> If it is reasonable, I will create a ticket in JIRA and provide the patch for review.
>
> Cheers,
> Yongkun Wang (Kun)



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Transactional Multiplex (fan out) Sink

Posted by "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com>.
Any suggestions on this proposal? It won't break current design.

Regards,
Yongkun Wang

From: <Wang>, "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: 2012年7月25日水曜日 19:32
To: "dev@flume.apache.org<ma...@flume.apache.org>" <de...@flume.apache.org>>
Cc: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Transactional Multiplex (fan out) Sink

Hi,

In our system, we need to fan out the aggregated flow to several destinations. Usually the flow to each destination is identical.

There is a nice feature of NG, the "multiplexing flow", which can satisfy our requirements. It is implemented by using separated channels, which is easy to do transaction control.

But in our case, the fan out is replicating in most cases. If using the current "Replicating to Channels" configuration, we will get several identical channels on the same host, which may consume a large amount resources (memory, disk, etc.). The performance may possibly drop. And the events to each destination may not be synchronized.

I read NG source, I think I could move the multiplex from Channel to Sink, that is, using single Channel, fan out to different Sinks, which may solve the problems (resource usage, performance, event synchronization) of multiple Channels.

I see that there are "LoadBalancingSinkProcessor" and "SinkSelector" classes, but they cannot be used to achieve the target of replicating events from one Channel to different Sinks.

The following is an optional implementation of the Transactional Multiplex (or fan out) Sink:

  1.  Add a Transactional Multiplex Sink Processor, which will group the operations of all fan out Sinks into one transaction, and use a certain policy to commit the transaction.
  2.  Add MultiplexSink, which simply processes the Events and report status, no transaction.
  3.  Add "peek()" and "remove()" to Channel and Transaction.

The policy of committing a transaction can be defined as follow (suppose we have N Sinks) :

  1.  When M(0=<M<=N) Sinks succeed;
  2.  When specified M(0<M<=N) Sinks (important sinks) succeed.

A selector can also be used by Transactional Multiplex Sink Processor to filter the events for some Sinks (Optional).

And this can be combined with the existing Multiplex Channel Flow: Multiplex events into different Channels, each Channel can replicate to different Sinks.

Would like to hear your suggestions firstly.
If it is reasonable, I will create a ticket in JIRA and provide the patch for review.

Cheers,
Yongkun Wang (Kun)

Re: Transactional Multiplex (fan out) Sink

Posted by "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com>.
Any suggestions on this proposal? It won't break current design.

Regards,
Yongkun Wang

From: <Wang>, "Wang, Yongkun | Yongkun | BDD" <yo...@mail.rakuten.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: 2012年7月25日水曜日 19:32
To: "dev@flume.apache.org<ma...@flume.apache.org>" <de...@flume.apache.org>>
Cc: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Transactional Multiplex (fan out) Sink

Hi,

In our system, we need to fan out the aggregated flow to several destinations. Usually the flow to each destination is identical.

There is a nice feature of NG, the "multiplexing flow", which can satisfy our requirements. It is implemented by using separated channels, which is easy to do transaction control.

But in our case, the fan out is replicating in most cases. If using the current "Replicating to Channels" configuration, we will get several identical channels on the same host, which may consume a large amount resources (memory, disk, etc.). The performance may possibly drop. And the events to each destination may not be synchronized.

I read NG source, I think I could move the multiplex from Channel to Sink, that is, using single Channel, fan out to different Sinks, which may solve the problems (resource usage, performance, event synchronization) of multiple Channels.

I see that there are "LoadBalancingSinkProcessor" and "SinkSelector" classes, but they cannot be used to achieve the target of replicating events from one Channel to different Sinks.

The following is an optional implementation of the Transactional Multiplex (or fan out) Sink:

  1.  Add a Transactional Multiplex Sink Processor, which will group the operations of all fan out Sinks into one transaction, and use a certain policy to commit the transaction.
  2.  Add MultiplexSink, which simply processes the Events and report status, no transaction.
  3.  Add "peek()" and "remove()" to Channel and Transaction.

The policy of committing a transaction can be defined as follow (suppose we have N Sinks) :

  1.  When M(0=<M<=N) Sinks succeed;
  2.  When specified M(0<M<=N) Sinks (important sinks) succeed.

A selector can also be used by Transactional Multiplex Sink Processor to filter the events for some Sinks (Optional).

And this can be combined with the existing Multiplex Channel Flow: Multiplex events into different Channels, each Channel can replicate to different Sinks.

Would like to hear your suggestions firstly.
If it is reasonable, I will create a ticket in JIRA and provide the patch for review.

Cheers,
Yongkun Wang (Kun)