You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Senthilvel Rangaswamy <se...@gmail.com> on 2012/07/09 08:18:36 UTC

Restarts without data loss

We are using Flume 1.2.0 with memory channel. When we rollout new
configs/decorators
we may need to restart flume at which point any events in memory channel is
gone. Any
ways to avoid this ?

Thanks,
-- 
..Senthil

"If there's anything more important than my ego around, I want it
 caught and shot now."
                                                    - Douglas Adams.

Re: Restarts without data loss

Posted by Senthilvel Rangaswamy <se...@gmail.com>.

Inder,

No worries. We added filechannel as another sink with lower priority. So
when
the actual sink fails, the events starts getting recorded in files.

--
..Senthil

On Mon, Jul 9, 2012 at 12:04 AM, Inder Pall <in...@gmail.com> wrote:

> Senthil,
>
> sorry to highjack this thread, but this caught my attention.
> How do you, if you do switching of channels i.e. mem to file dynamically.
>
> - inder
>
>
> On Mon, Jul 9, 2012 at 12:31 PM, Senthilvel Rangaswamy <
> senthilvel@gmail.com> wrote:
>
>> We do use persistent channel when there is overflow. Using FileChannel
>> for regular operations
>> is too slow for us.
>>
>>
>> On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <br...@cloudera.com> wrote:
>>
>>>  I am guessing you are aware, but you could use a persistent channel
>>> such as file channel.
>>>
>>> --
>>> Brock Noland
>>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>>
>>> On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
>>>
>>> We are using Flume 1.2.0 with memory channel. When we rollout new
>>> configs/decorators
>>> we may need to restart flume at which point any events in memory channel
>>> is gone. Any
>>> ways to avoid this ?
>>>
>>> Thanks,
>>> --
>>> ..Senthil
>>>
>>> "If there's anything more important than my ego around, I want it
>>>  caught and shot now."
>>>                                                     - Douglas Adams.
>>>
>>>
>>>
>>
>>
>> --
>> ..Senthil
>>
>> "If there's anything more important than my ego around, I want it
>>  caught and shot now."
>>                                                     - Douglas Adams.
>>
>>
>
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub
>



-- 
..Senthil

"If there's anything more important than my ego around, I want it
 caught and shot now."
                                                    - Douglas Adams.

Re: Restarts without data loss

Posted by Inder Pall <in...@gmail.com>.

Senthil,

sorry to highjack this thread, but this caught my attention.
How do you, if you do switching of channels i.e. mem to file dynamically.

- inder

On Mon, Jul 9, 2012 at 12:31 PM, Senthilvel Rangaswamy <senthilvel@gmail.com
> wrote:

> We do use persistent channel when there is overflow. Using FileChannel for
> regular operations
> is too slow for us.
>
>
> On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <br...@cloudera.com> wrote:
>
>>  I am guessing you are aware, but you could use a persistent channel such
>> as file channel.
>>
>> --
>> Brock Noland
>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>
>> On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
>>
>> We are using Flume 1.2.0 with memory channel. When we rollout new
>> configs/decorators
>> we may need to restart flume at which point any events in memory channel
>> is gone. Any
>> ways to avoid this ?
>>
>> Thanks,
>> --
>> ..Senthil
>>
>> "If there's anything more important than my ego around, I want it
>>  caught and shot now."
>>                                                     - Douglas Adams.
>>
>>
>>
>
>
> --
> ..Senthil
>
> "If there's anything more important than my ego around, I want it
>  caught and shot now."
>                                                     - Douglas Adams.
>
>


-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Restarts without data loss

Posted by Mubarak Seyed <se...@apple.com>.

Out of interest, there is a JIRA for graceful shutdown - FLUME-1318. Please add your design thoughts in JIRA


--Mubarak

On Jul 9, 2012, at 10:36 AM, Brock Noland wrote:

> If you ran the workload with file channel and then took 10 thread
> dumps I think we'd have enough to understand what is going on.
> 
> Brock
> 
> On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
> <ju...@cyberagent.co.jp> wrote:
>> It is currently pushing only 10 events per second or so(roughly 250 bytes
>> per event). This is with datadir/checkpoint on the same directory. Of course
>> the fact that there is a tail process running and that tomcat is also
>> writing out logs is without a doubt compounding the problem somewhat.
>> 
>> I haven't taken a serious look at thread dumps of the file channel since I
>> don't have a thorough understanding of it. However analysis has involved
>> trying varying numbers of sinks(no throughput difference) and replacing with
>> memory channel(which easily handles the 650 ish requests per second we have
>> per server for the particular api, no problems even with a single sink).
>> 
>> Since you say there's heavy fsyncing, and with 7200rpm disks, each seek will
>> have an average latency of 4.16ms, so for alternating seeks between the
>> checkpoint and the data dir, if each of those writes happens in order,
>> you're already limited to best case of barely more than 100 events per
>> second. Our experience so far has shown it to be significantly less.
>> 
>> I do believe that batching a bunch of puts or takes with a single commit
>> together as two seeks followed by writes(or one if we can only use a single
>> storage file) could give significant returns when paired with a batching
>> sink/source(which many already do... Requesting multiple items at a time).
>> 
>> If there is any specific data you would like I would be happy to try and
>> provide it.
>> 
>> 
>> On 07/09/2012 05:22 PM, Brock Noland wrote:
>> 
>> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
>> <ju...@cyberagent.co.jp> wrote:
>>> 
>>> - Intended setup with flume was a file channel connected to an avro sink.
>>> With only a single disk available, it is extremely slow. JDBC channel is
>>> also extremely slow, and MemoryChannel will fill up and start refusing puts
>>> as soon as a network issue comes up.
>> 
>> 
>> Have you taken a few thread dumps or done other analysis? When you say
>> "extremely slow" what do you mean? Configured for no dataloss FileChannel is
>> going to be doing a lot of fsync'ing so I am not surprised it's slow. That
>> is a property of disks not FileChannel. I think we should use group commit
>> but that shouldn't make it 10x faster.
>> 
>> Brock
>> 
>> 
>> 
> 
> 
> 
> -- 
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Restarts without data loss

Posted by Brock Noland <br...@cloudera.com>.

Yeah good point, the ExecSink does no batching and as such will be quite
slow when interacting with any channel which guarantees no dataloss on a
commit.

On Tue, Jul 10, 2012 at 8:54 AM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

> A further observation:
>
> When running our collector node with avro source and hdfssink, I observed
> it keeping up with about 1400+ events per second. Upon looking at the exec
> sink I noticed it sends every item as a separate event to the processor. So
> I think I may have misunderstood the frequency with which fsync is
> happening, and that the main issue is any sink/source that works together
> with the channel in tiny amounts(resulting in frequent disk flushes and
> strangling throughput).
>
> While improvements to the channel would be very welcome, it may be more
> productive to document this  behavior and introduce  batching modes to
> those sources/sinks that do not currently feature one.
>
>
> On 07/10/2012 11:14 AM, Juhani Connolly wrote:
>
>> On 07/10/2012 02:36 AM, Brock Noland wrote:
>>
>>> If you ran the workload with file channel and then took 10 thread
>>> dumps I think we'd have enough to understand what is going on.
>>>
>>> Brock
>>>
>> I've taken some dumps and you can find them here:
>> http://people.apache.org/~**juhanic/ca-flume-fc-dumps.tar.**gz<http://people.apache.org/~juhanic/ca-flume-fc-dumps.tar.gz>
>>
>> I also included a png from visualvm's thread visualization where you can
>> confirm that the source is constantly busy(trying to get stuff into the
>> file channel), while the 5 sinks are pretty idle. Let me know if there's
>> anything else I can provide
>>
>>  On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
>>> <juhani_connolly@cyberagent.**co.jp <ju...@cyberagent.co.jp>>
>>> wrote:
>>>
>>>> It is currently pushing only 10 events per second or so(roughly 250
>>>> bytes
>>>> per event). This is with datadir/checkpoint on the same directory. Of
>>>> course
>>>> the fact that there is a tail process running and that tomcat is also
>>>> writing out logs is without a doubt compounding the problem somewhat.
>>>>
>>>> I haven't taken a serious look at thread dumps of the file channel
>>>> since I
>>>> don't have a thorough understanding of it. However analysis has involved
>>>> trying varying numbers of sinks(no throughput difference) and replacing
>>>> with
>>>> memory channel(which easily handles the 650 ish requests per second we
>>>> have
>>>> per server for the particular api, no problems even with a single sink).
>>>>
>>>> Since you say there's heavy fsyncing, and with 7200rpm disks, each seek
>>>> will
>>>> have an average latency of 4.16ms, so for alternating seeks between the
>>>> checkpoint and the data dir, if each of those writes happens in order,
>>>> you're already limited to best case of barely more than 100 events per
>>>> second. Our experience so far has shown it to be significantly less.
>>>>
>>>> I do believe that batching a bunch of puts or takes with a single commit
>>>> together as two seeks followed by writes(or one if we can only use a
>>>> single
>>>> storage file) could give significant returns when paired with a batching
>>>> sink/source(which many already do... Requesting multiple items at a
>>>> time).
>>>>
>>>> If there is any specific data you would like I would be happy to try and
>>>> provide it.
>>>>
>>>>
>>>> On 07/09/2012 05:22 PM, Brock Noland wrote:
>>>>
>>>> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
>>>> <juhani_connolly@cyberagent.**co.jp <ju...@cyberagent.co.jp>>
>>>> wrote:
>>>>
>>>>>   - Intended setup with flume was a file channel connected to an avro
>>>>> sink.
>>>>> With only a single disk available, it is extremely slow. JDBC channel
>>>>> is
>>>>> also extremely slow, and MemoryChannel will fill up and start refusing
>>>>> puts
>>>>> as soon as a network issue comes up.
>>>>>
>>>>
>>>> Have you taken a few thread dumps or done other analysis? When you say
>>>> "extremely slow" what do you mean? Configured for no dataloss
>>>> FileChannel is
>>>> going to be doing a lot of fsync'ing so I am not surprised it's slow.
>>>> That
>>>> is a property of disks not FileChannel. I think we should use group
>>>> commit
>>>> but that shouldn't make it 10x faster.
>>>>
>>>> Brock
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Restarts without data loss

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

A further observation:

When running our collector node with avro source and hdfssink, I 
observed it keeping up with about 1400+ events per second. Upon looking 
at the exec sink I noticed it sends every item as a separate event to 
the processor. So I think I may have misunderstood the frequency with 
which fsync is happening, and that the main issue is any sink/source 
that works together with the channel in tiny amounts(resulting in 
frequent disk flushes and strangling throughput).

While improvements to the channel would be very welcome, it may be more 
productive to document this  behavior and introduce  batching modes to 
those sources/sinks that do not currently feature one.

On 07/10/2012 11:14 AM, Juhani Connolly wrote:
> On 07/10/2012 02:36 AM, Brock Noland wrote:
>> If you ran the workload with file channel and then took 10 thread
>> dumps I think we'd have enough to understand what is going on.
>>
>> Brock
> I've taken some dumps and you can find them here: 
> http://people.apache.org/~juhanic/ca-flume-fc-dumps.tar.gz
>
> I also included a png from visualvm's thread visualization where you 
> can confirm that the source is constantly busy(trying to get stuff 
> into the file channel), while the 5 sinks are pretty idle. Let me know 
> if there's anything else I can provide
>
>> On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
>> <ju...@cyberagent.co.jp> wrote:
>>> It is currently pushing only 10 events per second or so(roughly 250 
>>> bytes
>>> per event). This is with datadir/checkpoint on the same directory. 
>>> Of course
>>> the fact that there is a tail process running and that tomcat is also
>>> writing out logs is without a doubt compounding the problem somewhat.
>>>
>>> I haven't taken a serious look at thread dumps of the file channel 
>>> since I
>>> don't have a thorough understanding of it. However analysis has 
>>> involved
>>> trying varying numbers of sinks(no throughput difference) and 
>>> replacing with
>>> memory channel(which easily handles the 650 ish requests per second 
>>> we have
>>> per server for the particular api, no problems even with a single 
>>> sink).
>>>
>>> Since you say there's heavy fsyncing, and with 7200rpm disks, each 
>>> seek will
>>> have an average latency of 4.16ms, so for alternating seeks between the
>>> checkpoint and the data dir, if each of those writes happens in order,
>>> you're already limited to best case of barely more than 100 events per
>>> second. Our experience so far has shown it to be significantly less.
>>>
>>> I do believe that batching a bunch of puts or takes with a single 
>>> commit
>>> together as two seeks followed by writes(or one if we can only use a 
>>> single
>>> storage file) could give significant returns when paired with a 
>>> batching
>>> sink/source(which many already do... Requesting multiple items at a 
>>> time).
>>>
>>> If there is any specific data you would like I would be happy to try 
>>> and
>>> provide it.
>>>
>>>
>>> On 07/09/2012 05:22 PM, Brock Noland wrote:
>>>
>>> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
>>> <ju...@cyberagent.co.jp> wrote:
>>>>   - Intended setup with flume was a file channel connected to an 
>>>> avro sink.
>>>> With only a single disk available, it is extremely slow. JDBC 
>>>> channel is
>>>> also extremely slow, and MemoryChannel will fill up and start 
>>>> refusing puts
>>>> as soon as a network issue comes up.
>>>
>>> Have you taken a few thread dumps or done other analysis? When you say
>>> "extremely slow" what do you mean? Configured for no dataloss 
>>> FileChannel is
>>> going to be doing a lot of fsync'ing so I am not surprised it's 
>>> slow. That
>>> is a property of disks not FileChannel. I think we should use group 
>>> commit
>>> but that shouldn't make it 10x faster.
>>>
>>> Brock
>>>
>>>
>>>
>>
>>
>
>
>

Re: Restarts without data loss

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

On 07/10/2012 02:36 AM, Brock Noland wrote:
> If you ran the workload with file channel and then took 10 thread
> dumps I think we'd have enough to understand what is going on.
>
> Brock
I've taken some dumps and you can find them here: 
http://people.apache.org/~juhanic/ca-flume-fc-dumps.tar.gz

I also included a png from visualvm's thread visualization where you can 
confirm that the source is constantly busy(trying to get stuff into the 
file channel), while the 5 sinks are pretty idle. Let me know if there's 
anything else I can provide

> On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
> <ju...@cyberagent.co.jp> wrote:
>> It is currently pushing only 10 events per second or so(roughly 250 bytes
>> per event). This is with datadir/checkpoint on the same directory. Of course
>> the fact that there is a tail process running and that tomcat is also
>> writing out logs is without a doubt compounding the problem somewhat.
>>
>> I haven't taken a serious look at thread dumps of the file channel since I
>> don't have a thorough understanding of it. However analysis has involved
>> trying varying numbers of sinks(no throughput difference) and replacing with
>> memory channel(which easily handles the 650 ish requests per second we have
>> per server for the particular api, no problems even with a single sink).
>>
>> Since you say there's heavy fsyncing, and with 7200rpm disks, each seek will
>> have an average latency of 4.16ms, so for alternating seeks between the
>> checkpoint and the data dir, if each of those writes happens in order,
>> you're already limited to best case of barely more than 100 events per
>> second. Our experience so far has shown it to be significantly less.
>>
>> I do believe that batching a bunch of puts or takes with a single commit
>> together as two seeks followed by writes(or one if we can only use a single
>> storage file) could give significant returns when paired with a batching
>> sink/source(which many already do... Requesting multiple items at a time).
>>
>> If there is any specific data you would like I would be happy to try and
>> provide it.
>>
>>
>> On 07/09/2012 05:22 PM, Brock Noland wrote:
>>
>> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
>> <ju...@cyberagent.co.jp> wrote:
>>>   - Intended setup with flume was a file channel connected to an avro sink.
>>> With only a single disk available, it is extremely slow. JDBC channel is
>>> also extremely slow, and MemoryChannel will fill up and start refusing puts
>>> as soon as a network issue comes up.
>>
>> Have you taken a few thread dumps or done other analysis? When you say
>> "extremely slow" what do you mean? Configured for no dataloss FileChannel is
>> going to be doing a lot of fsync'ing so I am not surprised it's slow. That
>> is a property of disks not FileChannel. I think we should use group commit
>> but that shouldn't make it 10x faster.
>>
>> Brock
>>
>>
>>
>
>

Re: Restarts without data loss

Posted by Brock Noland <br...@cloudera.com>.

If you ran the workload with file channel and then took 10 thread
dumps I think we'd have enough to understand what is going on.

Brock

On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
<ju...@cyberagent.co.jp> wrote:
> It is currently pushing only 10 events per second or so(roughly 250 bytes
> per event). This is with datadir/checkpoint on the same directory. Of course
> the fact that there is a tail process running and that tomcat is also
> writing out logs is without a doubt compounding the problem somewhat.
>
> I haven't taken a serious look at thread dumps of the file channel since I
> don't have a thorough understanding of it. However analysis has involved
> trying varying numbers of sinks(no throughput difference) and replacing with
> memory channel(which easily handles the 650 ish requests per second we have
> per server for the particular api, no problems even with a single sink).
>
> Since you say there's heavy fsyncing, and with 7200rpm disks, each seek will
> have an average latency of 4.16ms, so for alternating seeks between the
> checkpoint and the data dir, if each of those writes happens in order,
> you're already limited to best case of barely more than 100 events per
> second. Our experience so far has shown it to be significantly less.
>
> I do believe that batching a bunch of puts or takes with a single commit
> together as two seeks followed by writes(or one if we can only use a single
> storage file) could give significant returns when paired with a batching
> sink/source(which many already do... Requesting multiple items at a time).
>
> If there is any specific data you would like I would be happy to try and
> provide it.
>
>
> On 07/09/2012 05:22 PM, Brock Noland wrote:
>
> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
> <ju...@cyberagent.co.jp> wrote:
>>
>>  - Intended setup with flume was a file channel connected to an avro sink.
>> With only a single disk available, it is extremely slow. JDBC channel is
>> also extremely slow, and MemoryChannel will fill up and start refusing puts
>> as soon as a network issue comes up.
>
>
> Have you taken a few thread dumps or done other analysis? When you say
> "extremely slow" what do you mean? Configured for no dataloss FileChannel is
> going to be doing a lot of fsync'ing so I am not surprised it's slow. That
> is a property of disks not FileChannel. I think we should use group commit
> but that shouldn't make it 10x faster.
>
> Brock
>
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Restarts without data loss

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

It is currently pushing only 10 events per second or so(roughly 250 
bytes per event). This is with datadir/checkpoint on the same directory. 
Of course the fact that there is a tail process running and that tomcat 
is also writing out logs is without a doubt compounding the problem 
somewhat.

I haven't taken a serious look at thread dumps of the file channel since 
I don't have a thorough understanding of it. However analysis has 
involved trying varying numbers of sinks(no throughput difference) and 
replacing with memory channel(which easily handles the 650 ish requests 
per second we have per server for the particular api, no problems even 
with a single sink).

Since you say there's heavy fsyncing, and with 7200rpm disks, each seek 
will have an average latency of 4.16ms, so for alternating seeks between 
the checkpoint and the data dir, if each of those writes happens in 
order, you're already limited to best case of barely more than 100 
events per second. Our experience so far has shown it to be 
significantly less.

I do believe that batching a bunch of puts or takes with a single commit 
together as two seeks followed by writes(or one if we can only use a 
single storage file) could give significant returns when paired with a 
batching sink/source(which many already do... Requesting multiple items 
at a time).

If there is any specific data you would like I would be happy to try and 
provide it.

On 07/09/2012 05:22 PM, Brock Noland wrote:
> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly 
> <juhani_connolly@cyberagent.co.jp 
> <ma...@cyberagent.co.jp>> wrote:
>
>      - Intended setup with flume was a file channel connected to an
>     avro sink. With only a single disk available, it is extremely
>     slow. JDBC channel is also extremely slow, and MemoryChannel will
>     fill up and start refusing puts as soon as a network issue comes up.
>
>
> Have you taken a few thread dumps or done other analysis? When you say 
> "extremely slow" what do you mean? Configured for no dataloss 
> FileChannel is going to be doing a lot of fsync'ing so I am not 
> surprised it's slow. That is a property of disks not FileChannel. I 
> think we should use group commit but that shouldn't make it 10x faster.
>
> Brock

Re: Restarts without data loss

Posted by Brock Noland <br...@cloudera.com>.

On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

>   - Intended setup with flume was a file channel connected to an avro
> sink. With only a single disk available, it is extremely slow. JDBC channel
> is also extremely slow, and MemoryChannel will fill up and start refusing
> puts as soon as a network issue comes up.
>

Have you taken a few thread dumps or done other analysis? When you say
"extremely slow" what do you mean? Configured for no dataloss FileChannel
is going to be doing a lot of fsync'ing so I am not surprised it's slow.
That is a property of disks not FileChannel. I think we should use group
commit but that shouldn't make it 10x faster.

Brock

Re: Restarts without data loss

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

Hari: you mean multiple disks, not multiple folders? Running off a 
single disk the performance is unfortunately not "reasonably good".

The reality of most companies hoping to aggregate logs is that a lot of 
machines generating the logs have a single set of raided disks, and that 
using multiple disks is not an option. Please do keep this in mind when 
running tests and not just the "best case scenario". After all, flume is 
going to be co-habiting on a server that was made for the primary task 
in mind. The servers are built for their primary purposes, not for flume.

In our case what we had hoped to do on our log sources, and currently 
are doing with scribed(which has its own issues, hence wanting to move):

- Run agents on all our log generating servers, using a channel that can 
retain data in case of network issues communicating with the collector 
layer.
  - Current setup is a scribed buffer store with network store as 
primary, file as secondary.
  - Intended setup with flume was a file channel connected to an avro 
sink. With only a single disk available, it is extremely slow. JDBC 
channel is also extremely slow, and MemoryChannel will fill up and start 
refusing puts as soon as a network issue comes up.

I think this is a very common use case and one that is likely holding up 
adoption until we solve it(at least is is for us).

On 07/09/2012 04:07 PM, Hari Shreedharan wrote:
> Senthil,
>
> Have you tried using it recently, with multiple data folders etc. In 
> recent tests, we have seen reasonably good performance. Of course, the 
> performance of MemoryChannel would be much better, since it is 
> in-memory :-). You should try to use the FileChannel as much as you 
> can, else there is a risk of losing data.
>
> Thanks
> Hari
>
> -- 
> Hari Shreedharan
>
> On Monday, July 9, 2012 at 12:01 AM, Senthilvel Rangaswamy wrote:
>
>> We do use persistent channel when there is overflow. Using 
>> FileChannel for regular operations
>> is too slow for us.
>>
>> On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <brock@cloudera.com 
>> <ma...@cloudera.com>> wrote:
>>> I am guessing you are aware, but you could use a persistent channel 
>>> such as file channel.
>>>
>>> -- 
>>> Brock Noland
>>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>>
>>> On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
>>>
>>>> We are using Flume 1.2.0 with memory channel. When we rollout new 
>>>> configs/decorators
>>>> we may need to restart flume at which point any events in memory 
>>>> channel is gone. Any
>>>> ways to avoid this ?
>>>>
>>>> Thanks,
>>>> -- 
>>>> ..Senthil
>>>>
>>>> "If there's anything more important than my ego around, I want it
>>>>  caught and shot now."
>>>>            - Douglas Adams.
>>>>
>>>
>>
>>
>>
>> -- 
>> ..Senthil
>>
>> "If there's anything more important than my ego around, I want it
>>  caught and shot now."
>>                                                     - Douglas Adams.
>>
>

Re: Restarts without data loss

Posted by Hari Shreedharan <hs...@cloudera.com>.

Unfortunately I don't have any specific numbers that I can share. It has been tested for extended periods of time, and has proved durable and stable. Hopefully, someone will be able to run perf tests on the channel soon.

Thanks
Hari

-- 
Hari Shreedharan


On Monday, July 9, 2012 at 12:12 AM, Senthilvel Rangaswamy wrote:

> Hari,
> 
> Would you have some numbers around the performance of FileChannel ? Like, how many events you
> were able to process etc.,
> 
> --
> ..Senthil
> 
> On Mon, Jul 9, 2012 at 12:07 AM, Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)> wrote:
> > Senthil, 
> > 
> > Have you tried using it recently, with multiple data folders etc. In recent tests, we have seen reasonably good performance. Of course, the performance of MemoryChannel would be much better, since it is in-memory :-). You should try to use the FileChannel as much as you can, else there is a risk of losing data. 
> > 
> > Thanks
> > Hari
> > 
> > -- 
> > Hari Shreedharan
> > 
> > 
> > On Monday, July 9, 2012 at 12:01 AM, Senthilvel Rangaswamy wrote:
> > 
> > > We do use persistent channel when there is overflow. Using FileChannel for regular operations
> > > is too slow for us.
> > > 
> > > On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <brock@cloudera.com (mailto:brock@cloudera.com)> wrote:
> > > > I am guessing you are aware, but you could use a persistent channel such as file channel.  
> > > > 
> > > > -- 
> > > > Brock Noland
> > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > 
> > > > 
> > > > On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
> > > > 
> > > > > We are using Flume 1.2.0 with memory channel. When we rollout new configs/decorators
> > > > > we may need to restart flume at which point any events in memory channel is gone. Any
> > > > > ways to avoid this ?
> > > > > 
> > > > > Thanks,
> > > > > -- 
> > > > > ..Senthil
> > > > > 
> > > > > "If there's anything more important than my ego around, I want it 
> > > > >  caught and shot now."
> > > > >                                                     - Douglas Adams.
> > > > > 
> > > > 
> > > 
> > > 
> > > 
> > > -- 
> > > ..Senthil
> > > 
> > > "If there's anything more important than my ego around, I want it 
> > >  caught and shot now."
> > >                                                     - Douglas Adams.
> > > 
> > 
> 
> 
> 
> -- 
> ..Senthil
> 
> "If there's anything more important than my ego around, I want it 
>  caught and shot now."
>                                                     - Douglas Adams.
>

Re: Restarts without data loss

Posted by Senthilvel Rangaswamy <se...@gmail.com>.

Hari,

Would you have some numbers around the performance of FileChannel ? Like,
how many events you
were able to process etc.,

--
..Senthil

On Mon, Jul 9, 2012 at 12:07 AM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

> Senthil,
>
> Have you tried using it recently, with multiple data folders etc. In
> recent tests, we have seen reasonably good performance. Of course, the
> performance of MemoryChannel would be much better, since it is in-memory
> :-). You should try to use the FileChannel as much as you can, else there
> is a risk of losing data.
>
> Thanks
> Hari
>
> --
> Hari Shreedharan
>
> On Monday, July 9, 2012 at 12:01 AM, Senthilvel Rangaswamy wrote:
>
> We do use persistent channel when there is overflow. Using FileChannel for
> regular operations
> is too slow for us.
>
> On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <br...@cloudera.com> wrote:
>
>  I am guessing you are aware, but you could use a persistent channel such
> as file channel.
>
> --
> Brock Noland
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>
> On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
>
> We are using Flume 1.2.0 with memory channel. When we rollout new
> configs/decorators
> we may need to restart flume at which point any events in memory channel
> is gone. Any
> ways to avoid this ?
>
> Thanks,
> --
> ..Senthil
>
> "If there's anything more important than my ego around, I want it
>  caught and shot now."
>                                                     - Douglas Adams.
>
>
>
>
>
> --
> ..Senthil
>
> "If there's anything more important than my ego around, I want it
>  caught and shot now."
>                                                     - Douglas Adams.
>
>
>


-- 
..Senthil

"If there's anything more important than my ego around, I want it
 caught and shot now."
                                                    - Douglas Adams.

Re: Restarts without data loss

Posted by Hari Shreedharan <hs...@cloudera.com>.

Senthil, 

Have you tried using it recently, with multiple data folders etc. In recent tests, we have seen reasonably good performance. Of course, the performance of MemoryChannel would be much better, since it is in-memory :-). You should try to use the FileChannel as much as you can, else there is a risk of losing data.

Thanks
Hari

-- 
Hari Shreedharan


On Monday, July 9, 2012 at 12:01 AM, Senthilvel Rangaswamy wrote:

> We do use persistent channel when there is overflow. Using FileChannel for regular operations
> is too slow for us.
> 
> On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <brock@cloudera.com (mailto:brock@cloudera.com)> wrote:
> > I am guessing you are aware, but you could use a persistent channel such as file channel.  
> > 
> > -- 
> > Brock Noland
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > 
> > 
> > On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
> > 
> > > We are using Flume 1.2.0 with memory channel. When we rollout new configs/decorators
> > > we may need to restart flume at which point any events in memory channel is gone. Any
> > > ways to avoid this ?
> > > 
> > > Thanks,
> > > -- 
> > > ..Senthil
> > > 
> > > "If there's anything more important than my ego around, I want it 
> > >  caught and shot now."
> > >                                                     - Douglas Adams.
> > > 
> > 
> 
> 
> 
> -- 
> ..Senthil
> 
> "If there's anything more important than my ego around, I want it 
>  caught and shot now."
>                                                     - Douglas Adams.
>

Re: Restarts without data loss

Posted by Senthilvel Rangaswamy <se...@gmail.com>.

We do use persistent channel when there is overflow. Using FileChannel for
regular operations
is too slow for us.

On Sun, Jul 8, 2012 at 11:58 PM, Brock Noland <br...@cloudera.com> wrote:

>  I am guessing you are aware, but you could use a persistent channel such
> as file channel.
>
> --
> Brock Noland
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>
> On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:
>
> We are using Flume 1.2.0 with memory channel. When we rollout new
> configs/decorators
> we may need to restart flume at which point any events in memory channel
> is gone. Any
> ways to avoid this ?
>
> Thanks,
> --
> ..Senthil
>
> "If there's anything more important than my ego around, I want it
>  caught and shot now."
>                                                     - Douglas Adams.
>
>
>


-- 
..Senthil

"If there's anything more important than my ego around, I want it
 caught and shot now."
                                                    - Douglas Adams.

Re: Restarts without data loss

Posted by Brock Noland <br...@cloudera.com>.

I am guessing you are aware, but you could use a persistent channel such as file channel.  

-- 
Brock Noland
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, July 9, 2012 at 7:18 AM, Senthilvel Rangaswamy wrote:

> We are using Flume 1.2.0 with memory channel. When we rollout new configs/decorators
> we may need to restart flume at which point any events in memory channel is gone. Any
> ways to avoid this ?
> 
> Thanks,
> -- 
> ..Senthil
> 
> "If there's anything more important than my ego around, I want it 
>  caught and shot now."
>                                                     - Douglas Adams.
>

Re: Restarts without data loss

Posted by Tejinder Aulakh <te...@sharethis.com>.

I agree with Inder. We need a graceful shutdown which will shutdown agent
only after the channel has been fully drained.

Tejinder

On Mon, Jul 9, 2012 at 12:02 AM, Inder Pall <in...@gmail.com> wrote:

> Arvind,
>
> to me this is an important use-case for frequent prod rollouts. How about
> thinking in the direction of supporting graceful shutdown for agents.
>
> I can't think of an elegant solution at the moment which will address all
> cases however what are thoughts regarding something on the lines ->
>
> 1. agent receives a shutdown signal.
> 2. puts all it's channel in isolation mode(wherein no sources can put
> stuff into it.)
> 3. when the sinks attached to this channel drain( we do the real shutdown).
>
> i know we can find issues with this algo however i want to highlight the
> importance of graceful shutdown being supported as a first class use-case
> here.
>
> - Inder
>
>
> On Mon, Jul 9, 2012 at 12:16 PM, alo alt <wg...@gmail.com> wrote:
>
>> Simple solution:
>>
>> Two configs on different ports, iptables with transparent forwarding to
>> both ports. Block the first one, all events will be redirected to the other
>> port. Wait 5 minutes, the mem channel should be clear now. Do you changes,
>> start the new config, redirect the traffic to these port and change the
>> other config.
>>
>> cheers,
>>  alex
>>
>>
>> On Jul 9, 2012, at 8:29 AM, Arvind Prabhakar wrote:
>>
>> > Hi,
>> >
>> > On Sun, Jul 8, 2012 at 11:18 PM, Senthilvel Rangaswamy <
>> senthilvel@gmail.com
>> >> wrote:
>> >
>> >> We are using Flume 1.2.0 with memory channel. When we rollout new
>> >> configs/decorators
>> >> we may need to restart flume at which point any events in memory
>> channel
>> >> is gone. Any
>> >> ways to avoid this ?
>> >>
>> >
>> > One way to address this would be to make sure that the upstream sink or
>> > client can be routed to a different agent when necessary. That way when
>> you
>> > do want to restart the file channel, you would first route all the
>> traffic
>> > elsewhere, drain the channel and then do the shutdown as necessary. Once
>> > the system is back up, you could route the traffic back to this agent.
>> >
>> > I am sure that there are multiple other ways of doing this.
>> >
>> > Regards,
>> > Arvind Prabhakar
>> >
>> >
>> >
>> >>
>> >> Thanks,
>> >> --
>> >> ..Senthil
>> >>
>> >> "If there's anything more important than my ego around, I want it
>> >> caught and shot now."
>> >>                                                    - Douglas Adams.
>> >>
>> >>
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub
>



-- 
*Tejinder Aulakh
*Senior Software Engineer, ShareThis
*e:* tejinder@sharethis.com
*m:* 510.708.2499*
*
** <http://pinterest.com/sharethis/>* <http://sharethis.com/>
**Learn More:*  SQI (Social Quality Index) - A Universal Measure of Social
Quality <http://sharethis.com/sqi>

[image: Facebook] <http://www.facebook.com/sharethis> [image:
Twitter]<https://twitter.com/#!/SHARETHIS>
 [image: LinkedIn]<http://www.linkedin.com/company/207839?trk=pro_other_cmpy>
 [image: pinterest] <http://pinterest.com/sharethis/>

Re: Restarts without data loss

Posted by Inder Pall <in...@gmail.com>.

Thanks Arvind.

- Inder

On Tue, Jul 10, 2012 at 2:53 PM, Arvind Prabhakar <ar...@apache.org> wrote:

> One clarification - as Mubarak mentioned, there is already a Jira for this
> FLUME-1318 <https://issues.apache.org/jira/browse/FLUME-1318>. So instead
> of filing a new issue, you can add your details and thoughts to this.
>
> Regards,
> Arvind Prabhakar
>
>
> On Tue, Jul 10, 2012 at 2:20 AM, Arvind Prabhakar <ar...@apache.org>wrote:
>
>> Hi Inder,
>> On Mon, Jul 9, 2012 at 12:02 AM, Inder Pall <in...@gmail.com> wrote:
>>
>>> Arvind,
>>>
>>> to me this is an important use-case for frequent prod rollouts. How
>>> about thinking in the direction of supporting graceful shutdown for agents.
>>>
>>
>> I do believe that the Agent does shutdown gracefully on interrupt.
>> Specifically the components are started in a specific order (FLUME-1236<https://issues.apache.org/jira/browse/FLUME-1236>)
>> and then shutdown in the reverse order (FLUME-1325<https://issues.apache.org/jira/browse/FLUME-1325>).
>> If you find that is not the case, do please file a Jira with appropriate
>> details.
>>
>>
>>> I can't think of an elegant solution at the moment which will address
>>> all cases however what are thoughts regarding something on the lines ->
>>>
>>> 1. agent receives a shutdown signal.
>>> 2. puts all it's channel in isolation mode(wherein no sources can put
>>> stuff into it.)
>>> 3. when the sinks attached to this channel drain( we do the real
>>> shutdown).
>>>
>>> i know we can find issues with this algo however i want to highlight the
>>> importance of graceful shutdown being supported as a first class use-case
>>> here.
>>>
>>
>> I guess what you are asking for is a drain-and-shutdown semantic. I think
>> it is a perfectly reasonable request and something we should consider
>> carefully as it will likely be used in production environments. In order to
>> implement that, we would need to first create a system that allows the
>> ability to send soft-interrupts such as commands via a socket and then
>> create an implementation that provides for the semantics you describe
>> above, along with regular shutdown semantics.
>>
>> The best bet to go about this would be start by filing a Jira, and adding
>> as many details as you can to clearly specify it. And perhaps even taking a
>> crack at doing a patch for it!
>>
>> Regards,
>> Arvind Prabhakar
>>
>>
>>>
>>> - Inder
>>>
>>>
>>> On Mon, Jul 9, 2012 at 12:16 PM, alo alt <wg...@gmail.com> wrote:
>>>
>>>> Simple solution:
>>>>
>>>> Two configs on different ports, iptables with transparent forwarding to
>>>> both ports. Block the first one, all events will be redirected to the other
>>>> port. Wait 5 minutes, the mem channel should be clear now. Do you changes,
>>>> start the new config, redirect the traffic to these port and change the
>>>> other config.
>>>>
>>>> cheers,
>>>>  alex
>>>>
>>>>
>>>> On Jul 9, 2012, at 8:29 AM, Arvind Prabhakar wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > On Sun, Jul 8, 2012 at 11:18 PM, Senthilvel Rangaswamy <
>>>> senthilvel@gmail.com
>>>> >> wrote:
>>>> >
>>>> >> We are using Flume 1.2.0 with memory channel. When we rollout new
>>>> >> configs/decorators
>>>> >> we may need to restart flume at which point any events in memory
>>>> channel
>>>> >> is gone. Any
>>>> >> ways to avoid this ?
>>>> >>
>>>> >
>>>> > One way to address this would be to make sure that the upstream sink
>>>> or
>>>> > client can be routed to a different agent when necessary. That way
>>>> when you
>>>> > do want to restart the file channel, you would first route all the
>>>> traffic
>>>> > elsewhere, drain the channel and then do the shutdown as necessary.
>>>> Once
>>>> > the system is back up, you could route the traffic back to this agent.
>>>> >
>>>> > I am sure that there are multiple other ways of doing this.
>>>> >
>>>> > Regards,
>>>> > Arvind Prabhakar
>>>> >
>>>> >
>>>> >
>>>> >>
>>>> >> Thanks,
>>>> >> --
>>>> >> ..Senthil
>>>> >>
>>>> >> "If there's anything more important than my ego around, I want it
>>>> >> caught and shot now."
>>>> >>                                                    - Douglas Adams.
>>>> >>
>>>> >>
>>>>
>>>>
>>>> --
>>>> Alexander Alten-Lorenz
>>>> http://mapredit.blogspot.com
>>>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>> - Inder
>>>   Tech Platforms @Inmobi
>>>   Linkedin - http://goo.gl/eR4Ub
>>>
>>
>>
>


-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Restarts without data loss

Posted by Arvind Prabhakar <ar...@apache.org>.

One clarification - as Mubarak mentioned, there is already a Jira for this
FLUME-1318 <https://issues.apache.org/jira/browse/FLUME-1318>. So instead
of filing a new issue, you can add your details and thoughts to this.

Regards,
Arvind Prabhakar

On Tue, Jul 10, 2012 at 2:20 AM, Arvind Prabhakar <ar...@apache.org> wrote:

> Hi Inder,
> On Mon, Jul 9, 2012 at 12:02 AM, Inder Pall <in...@gmail.com> wrote:
>
>> Arvind,
>>
>> to me this is an important use-case for frequent prod rollouts. How about
>> thinking in the direction of supporting graceful shutdown for agents.
>>
>
> I do believe that the Agent does shutdown gracefully on interrupt.
> Specifically the components are started in a specific order (FLUME-1236<https://issues.apache.org/jira/browse/FLUME-1236>)
> and then shutdown in the reverse order (FLUME-1325<https://issues.apache.org/jira/browse/FLUME-1325>).
> If you find that is not the case, do please file a Jira with appropriate
> details.
>
>
>> I can't think of an elegant solution at the moment which will address all
>> cases however what are thoughts regarding something on the lines ->
>>
>> 1. agent receives a shutdown signal.
>> 2. puts all it's channel in isolation mode(wherein no sources can put
>> stuff into it.)
>> 3. when the sinks attached to this channel drain( we do the real
>> shutdown).
>>
>> i know we can find issues with this algo however i want to highlight the
>> importance of graceful shutdown being supported as a first class use-case
>> here.
>>
>
> I guess what you are asking for is a drain-and-shutdown semantic. I think
> it is a perfectly reasonable request and something we should consider
> carefully as it will likely be used in production environments. In order to
> implement that, we would need to first create a system that allows the
> ability to send soft-interrupts such as commands via a socket and then
> create an implementation that provides for the semantics you describe
> above, along with regular shutdown semantics.
>
> The best bet to go about this would be start by filing a Jira, and adding
> as many details as you can to clearly specify it. And perhaps even taking a
> crack at doing a patch for it!
>
> Regards,
> Arvind Prabhakar
>
>
>>
>> - Inder
>>
>>
>> On Mon, Jul 9, 2012 at 12:16 PM, alo alt <wg...@gmail.com> wrote:
>>
>>> Simple solution:
>>>
>>> Two configs on different ports, iptables with transparent forwarding to
>>> both ports. Block the first one, all events will be redirected to the other
>>> port. Wait 5 minutes, the mem channel should be clear now. Do you changes,
>>> start the new config, redirect the traffic to these port and change the
>>> other config.
>>>
>>> cheers,
>>>  alex
>>>
>>>
>>> On Jul 9, 2012, at 8:29 AM, Arvind Prabhakar wrote:
>>>
>>> > Hi,
>>> >
>>> > On Sun, Jul 8, 2012 at 11:18 PM, Senthilvel Rangaswamy <
>>> senthilvel@gmail.com
>>> >> wrote:
>>> >
>>> >> We are using Flume 1.2.0 with memory channel. When we rollout new
>>> >> configs/decorators
>>> >> we may need to restart flume at which point any events in memory
>>> channel
>>> >> is gone. Any
>>> >> ways to avoid this ?
>>> >>
>>> >
>>> > One way to address this would be to make sure that the upstream sink or
>>> > client can be routed to a different agent when necessary. That way
>>> when you
>>> > do want to restart the file channel, you would first route all the
>>> traffic
>>> > elsewhere, drain the channel and then do the shutdown as necessary.
>>> Once
>>> > the system is back up, you could route the traffic back to this agent.
>>> >
>>> > I am sure that there are multiple other ways of doing this.
>>> >
>>> > Regards,
>>> > Arvind Prabhakar
>>> >
>>> >
>>> >
>>> >>
>>> >> Thanks,
>>> >> --
>>> >> ..Senthil
>>> >>
>>> >> "If there's anything more important than my ego around, I want it
>>> >> caught and shot now."
>>> >>                                                    - Douglas Adams.
>>> >>
>>> >>
>>>
>>>
>>> --
>>> Alexander Alten-Lorenz
>>> http://mapredit.blogspot.com
>>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>>
>>>
>>
>>
>> --
>> Thanks,
>> - Inder
>>   Tech Platforms @Inmobi
>>   Linkedin - http://goo.gl/eR4Ub
>>
>
>

Re: Restarts without data loss

Posted by Arvind Prabhakar <ar...@apache.org>.

Hi Inder,
On Mon, Jul 9, 2012 at 12:02 AM, Inder Pall <in...@gmail.com> wrote:

> Arvind,
>
> to me this is an important use-case for frequent prod rollouts. How about
> thinking in the direction of supporting graceful shutdown for agents.
>

I do believe that the Agent does shutdown gracefully on interrupt.
Specifically the components are started in a specific order
(FLUME-1236<https://issues.apache.org/jira/browse/FLUME-1236>)
and then shutdown in the reverse order
(FLUME-1325<https://issues.apache.org/jira/browse/FLUME-1325>).
If you find that is not the case, do please file a Jira with appropriate
details.

> I can't think of an elegant solution at the moment which will address all
> cases however what are thoughts regarding something on the lines ->
>
> 1. agent receives a shutdown signal.
> 2. puts all it's channel in isolation mode(wherein no sources can put
> stuff into it.)
> 3. when the sinks attached to this channel drain( we do the real shutdown).
>
> i know we can find issues with this algo however i want to highlight the
> importance of graceful shutdown being supported as a first class use-case
> here.
>

I guess what you are asking for is a drain-and-shutdown semantic. I think
it is a perfectly reasonable request and something we should consider
carefully as it will likely be used in production environments. In order to
implement that, we would need to first create a system that allows the
ability to send soft-interrupts such as commands via a socket and then
create an implementation that provides for the semantics you describe
above, along with regular shutdown semantics.

The best bet to go about this would be start by filing a Jira, and adding
as many details as you can to clearly specify it. And perhaps even taking a
crack at doing a patch for it!

Regards,
Arvind Prabhakar

>
> - Inder
>
>
> On Mon, Jul 9, 2012 at 12:16 PM, alo alt <wg...@gmail.com> wrote:
>
>> Simple solution:
>>
>> Two configs on different ports, iptables with transparent forwarding to
>> both ports. Block the first one, all events will be redirected to the other
>> port. Wait 5 minutes, the mem channel should be clear now. Do you changes,
>> start the new config, redirect the traffic to these port and change the
>> other config.
>>
>> cheers,
>>  alex
>>
>>
>> On Jul 9, 2012, at 8:29 AM, Arvind Prabhakar wrote:
>>
>> > Hi,
>> >
>> > On Sun, Jul 8, 2012 at 11:18 PM, Senthilvel Rangaswamy <
>> senthilvel@gmail.com
>> >> wrote:
>> >
>> >> We are using Flume 1.2.0 with memory channel. When we rollout new
>> >> configs/decorators
>> >> we may need to restart flume at which point any events in memory
>> channel
>> >> is gone. Any
>> >> ways to avoid this ?
>> >>
>> >
>> > One way to address this would be to make sure that the upstream sink or
>> > client can be routed to a different agent when necessary. That way when
>> you
>> > do want to restart the file channel, you would first route all the
>> traffic
>> > elsewhere, drain the channel and then do the shutdown as necessary. Once
>> > the system is back up, you could route the traffic back to this agent.
>> >
>> > I am sure that there are multiple other ways of doing this.
>> >
>> > Regards,
>> > Arvind Prabhakar
>> >
>> >
>> >
>> >>
>> >> Thanks,
>> >> --
>> >> ..Senthil
>> >>
>> >> "If there's anything more important than my ego around, I want it
>> >> caught and shot now."
>> >>                                                    - Douglas Adams.
>> >>
>> >>
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub
>

Re: Restarts without data loss

Posted by Inder Pall <in...@gmail.com>.

Arvind,

to me this is an important use-case for frequent prod rollouts. How about
thinking in the direction of supporting graceful shutdown for agents.

I can't think of an elegant solution at the moment which will address all
cases however what are thoughts regarding something on the lines ->

1. agent receives a shutdown signal.
2. puts all it's channel in isolation mode(wherein no sources can put stuff
into it.)
3. when the sinks attached to this channel drain( we do the real shutdown).

i know we can find issues with this algo however i want to highlight the
importance of graceful shutdown being supported as a first class use-case
here.

- Inder

On Mon, Jul 9, 2012 at 12:16 PM, alo alt <wg...@gmail.com> wrote:

> Simple solution:
>
> Two configs on different ports, iptables with transparent forwarding to
> both ports. Block the first one, all events will be redirected to the other
> port. Wait 5 minutes, the mem channel should be clear now. Do you changes,
> start the new config, redirect the traffic to these port and change the
> other config.
>
> cheers,
>  alex
>
>
> On Jul 9, 2012, at 8:29 AM, Arvind Prabhakar wrote:
>
> > Hi,
> >
> > On Sun, Jul 8, 2012 at 11:18 PM, Senthilvel Rangaswamy <
> senthilvel@gmail.com
> >> wrote:
> >
> >> We are using Flume 1.2.0 with memory channel. When we rollout new
> >> configs/decorators
> >> we may need to restart flume at which point any events in memory channel
> >> is gone. Any
> >> ways to avoid this ?
> >>
> >
> > One way to address this would be to make sure that the upstream sink or
> > client can be routed to a different agent when necessary. That way when
> you
> > do want to restart the file channel, you would first route all the
> traffic
> > elsewhere, drain the channel and then do the shutdown as necessary. Once
> > the system is back up, you could route the traffic back to this agent.
> >
> > I am sure that there are multiple other ways of doing this.
> >
> > Regards,
> > Arvind Prabhakar
> >
> >
> >
> >>
> >> Thanks,
> >> --
> >> ..Senthil
> >>
> >> "If there's anything more important than my ego around, I want it
> >> caught and shot now."
> >>                                                    - Douglas Adams.
> >>
> >>
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>


-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Restarts without data loss

Posted by Senthilvel Rangaswamy <se...@gmail.com>.

The source is a tail -F. So, I am not sure if this solution will work.

On Sun, Jul 8, 2012 at 11:46 PM, alo alt <wg...@gmail.com> wrote:

> Simple solution:
>
> Two configs on different ports, iptables with transparent forwarding to
> both ports. Block the first one, all events will be redirected to the other
> port. Wait 5 minutes, the mem channel should be clear now. Do you changes,
> start the new config, redirect the traffic to these port and change the
> other config.
>
> cheers,
>  alex
>
>
> On Jul 9, 2012, at 8:29 AM, Arvind Prabhakar wrote:
>
> > Hi,
> >
> > On Sun, Jul 8, 2012 at 11:18 PM, Senthilvel Rangaswamy <
> senthilvel@gmail.com
> >> wrote:
> >
> >> We are using Flume 1.2.0 with memory channel. When we rollout new
> >> configs/decorators
> >> we may need to restart flume at which point any events in memory channel
> >> is gone. Any
> >> ways to avoid this ?
> >>
> >
> > One way to address this would be to make sure that the upstream sink or
> > client can be routed to a different agent when necessary. That way when
> you
> > do want to restart the file channel, you would first route all the
> traffic
> > elsewhere, drain the channel and then do the shutdown as necessary. Once
> > the system is back up, you could route the traffic back to this agent.
> >
> > I am sure that there are multiple other ways of doing this.
> >
> > Regards,
> > Arvind Prabhakar
> >
> >
> >
> >>
> >> Thanks,
> >> --
> >> ..Senthil
> >>
> >> "If there's anything more important than my ego around, I want it
> >> caught and shot now."
> >>                                                    - Douglas Adams.
> >>
> >>
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>


-- 
..Senthil

"If there's anything more important than my ego around, I want it
 caught and shot now."
                                                    - Douglas Adams.

Re: Restarts without data loss

Posted by alo alt <wg...@gmail.com>.

Simple solution:

Two configs on different ports, iptables with transparent forwarding to both ports. Block the first one, all events will be redirected to the other port. Wait 5 minutes, the mem channel should be clear now. Do you changes, start the new config, redirect the traffic to these port and change the other config.

cheers,
 alex 


On Jul 9, 2012, at 8:29 AM, Arvind Prabhakar wrote:

> Hi,
> 
> On Sun, Jul 8, 2012 at 11:18 PM, Senthilvel Rangaswamy <senthilvel@gmail.com
>> wrote:
> 
>> We are using Flume 1.2.0 with memory channel. When we rollout new
>> configs/decorators
>> we may need to restart flume at which point any events in memory channel
>> is gone. Any
>> ways to avoid this ?
>> 
> 
> One way to address this would be to make sure that the upstream sink or
> client can be routed to a different agent when necessary. That way when you
> do want to restart the file channel, you would first route all the traffic
> elsewhere, drain the channel and then do the shutdown as necessary. Once
> the system is back up, you could route the traffic back to this agent.
> 
> I am sure that there are multiple other ways of doing this.
> 
> Regards,
> Arvind Prabhakar
> 
> 
> 
>> 
>> Thanks,
>> --
>> ..Senthil
>> 
>> "If there's anything more important than my ego around, I want it
>> caught and shot now."
>>                                                    - Douglas Adams.
>> 
>> 


--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: Restarts without data loss

Posted by Arvind Prabhakar <ar...@apache.org>.

Hi,

On Sun, Jul 8, 2012 at 11:18 PM, Senthilvel Rangaswamy <senthilvel@gmail.com
> wrote:

> We are using Flume 1.2.0 with memory channel. When we rollout new
> configs/decorators
> we may need to restart flume at which point any events in memory channel
> is gone. Any
> ways to avoid this ?
>

One way to address this would be to make sure that the upstream sink or
client can be routed to a different agent when necessary. That way when you
do want to restart the file channel, you would first route all the traffic
elsewhere, drain the channel and then do the shutdown as necessary. Once
the system is back up, you could route the traffic back to this agent.

I am sure that there are multiple other ways of doing this.

Regards,
Arvind Prabhakar

>
> Thanks,
> --
> ..Senthil
>
> "If there's anything more important than my ego around, I want it
>  caught and shot now."
>                                                     - Douglas Adams.
>
>