You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Robert B Hamilton <ro...@gm.com> on 2015/06/09 04:30:47 UTC

log dump tool?

Is there anything like a logdump tool for flume file channel?
Specifically I’m looking for some way to extract say the event data for the last N puts.
Alternatively can the logs be modified so that the last N (sink) commits will be ignored on restart?

The scenario that I’m concerned about is this:


1.       server crashes, flume is restarted once the server is brought back.

2.       End user sees something odd in his HiveQL and speculates that data was lost.

3.       We peek into the WAL as they existed just before the restart (we saved off a copy) and either

a.       Find an event corresponding to the missing data and use that to fix the data in the destination, or

b.      Prove that the event corresponding to the missing data was not present at least as far back as the logs go

I’m just wondering if there is a tool which makes number 3 possible….



Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.

Re: log dump tool?

Posted by Gwen Shapira <gs...@cloudera.com>.
Not directly answering your question, but note that the KafkaChannel
makes #3 very easy - simple use a console consumer to read events from
the topic used by Flume to see what was written there. Your end-users
may even be able to do this for themselves.

On Mon, Jun 8, 2015 at 7:30 PM, Robert B Hamilton
<ro...@gm.com> wrote:
> Is there anything like a logdump tool for flume file channel?
>
> Specifically I’m looking for some way to extract say the event data for the
> last N puts.
>
> Alternatively can the logs be modified so that the last N (sink) commits
> will be ignored on restart?
>
>
>
> The scenario that I’m concerned about is this:
>
>
>
> 1.       server crashes, flume is restarted once the server is brought back.
>
> 2.       End user sees something odd in his HiveQL and speculates that data
> was lost.
>
> 3.       We peek into the WAL as they existed just before the restart (we
> saved off a copy) and either
>
> a.       Find an event corresponding to the missing data and use that to fix
> the data in the destination, or
>
> b.      Prove that the event corresponding to the missing data was not
> present at least as far back as the logs go
>
>
>
> I’m just wondering if there is a tool which makes number 3 possible….
>
>
>
>
>
> Nothing in this message is intended to constitute an electronic signature
> unless a specific statement to the contrary is included in this message.
>
> Confidentiality Note: This message is intended only for the person or entity
> to which it is addressed. It may contain confidential and/or privileged
> material. Any review, transmission, dissemination or other use, or taking of
> any action in reliance upon this message by persons or entities other than
> the intended recipient is prohibited and may be unlawful. If you received
> this message in error, please contact the sender and delete it from your
> computer.

RE: log dump tool?

Posted by Robert B Hamilton <ro...@gm.com>.
Resolved, thanks!
Based on this example I wrote a little tool that allows me to browse "put" events in the file channel logs, displaying optionally both the headers and the event bodies.
More importantly for my present purpose, I added a method to replace the last N takes with no-ops, which effectively forces (if checkpoint is removed) the takes to be replayed when the agent is restarted.

 It gives me an emergency mechanism to recover any events that had been written to hdfs and committed on the channel, but not yet flushed,  at the moment a flume server crashes.
I wonder if this may be of use to the flume community...

-----Original Message-----
From: Ashish [mailto:paliwalashish@gmail.com]
Sent: Tuesday, June 09, 2015 8:19 AM
To: user@flume.apache.org
Subject: Re: log dump tool?

That's true. The simplest way to check is to modify the test case and in the validator use System.out.println on the event body (which you would know how to decode). It should be up and running in no time.

thanks
Ashish

On Mon, Jun 8, 2015 at 8:50 PM, Robert B Hamilton <ro...@gm.com> wrote:
> Thanks Ashish and Hari and Gwen. Very nice example.
>
> It looks like it would not take much at all to make a variation that dumps Events from a given datadir. Thanks!
>
>
> -----Original Message-----
> From: Ashish [mailto:paliwalashish@gmail.com]
> Sent: Monday, June 08, 2015 9:49 PM
> To: user@flume.apache.org
> Subject: Re: log dump tool?
>
> Somehow documentation has not been updated. Let me check the JIRA.
>
> Look into this
> flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegr
> ityTool.java It has an example, you can use a simple logger or a
> simple file writing utility to dump the contents.
>
> I shall be writing a blog post on the same topic soon. Been busy lately.
>
> thanks
> Ashish
>
> On Mon, Jun 8, 2015 at 7:39 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>> Flume has a tool that will allow you to run all events in the file channel through a piece of custom code you’d supply:
>>
>> bin/flume-ng tool FCINTEGRITYTOOL
>>
>> You can see the arguments you’d need to supply when you execute this command.
>>
>> Thanks,
>> Hari Shreedharan
>>
>>
>>
>>
>>> On Jun 8, 2015, at 7:30 PM, Robert B Hamilton <ro...@gm.com> wrote:
>>>
>>> Is there anything like a logdump tool for flume file channel?
>>> Specifically I+IBk-m looking for some way to extract say the event data for the last N puts.
>>> Alternatively can the logs be modified so that the last N (sink) commits will be ignored on restart?
>>>
>>> The scenario that I+IBk-m concerned about is this:
>>>
>>> 1.       server crashes, flume is restarted once the server is brought back.
>>> 2.       End user sees something odd in his HiveQL and speculates that data was lost.
>>> 3.       We peek into the WAL as they existed just before the restart (we saved off a copy) and either
>>> a.       Find an event corresponding to the missing data and use that to fix the data in the destination, or
>>> b.      Prove that the event corresponding to the missing data was not present at least as far back as the logs go
>>>
>>> I+IBk-m just wondering if there is a tool which makes number 3 possible+ICY.
>>>
>>>
>>>
>>> Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.
>>>
>>> Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.
>>
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog My Photo Galleries:
> http://www.pbase.com/ashishpaliwal
>
>
> Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.
>
> Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.



--
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal


Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.

Re: log dump tool?

Posted by Ashish <pa...@gmail.com>.
That's true. The simplest way to check is to modify the test case and
in the validator use System.out.println on the event body (which you
would know how to decode). It should be up and running in no time.

thanks
Ashish

On Mon, Jun 8, 2015 at 8:50 PM, Robert B Hamilton
<ro...@gm.com> wrote:
> Thanks Ashish and Hari and Gwen. Very nice example.
>
> It looks like it would not take much at all to make a variation that dumps Events from a given datadir. Thanks!
>
>
> -----Original Message-----
> From: Ashish [mailto:paliwalashish@gmail.com]
> Sent: Monday, June 08, 2015 9:49 PM
> To: user@flume.apache.org
> Subject: Re: log dump tool?
>
> Somehow documentation has not been updated. Let me check the JIRA.
>
> Look into this flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegrityTool.java
> It has an example, you can use a simple logger or a simple file writing utility to dump the contents.
>
> I shall be writing a blog post on the same topic soon. Been busy lately.
>
> thanks
> Ashish
>
> On Mon, Jun 8, 2015 at 7:39 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>> Flume has a tool that will allow you to run all events in the file channel through a piece of custom code you’d supply:
>>
>> bin/flume-ng tool FCINTEGRITYTOOL
>>
>> You can see the arguments you’d need to supply when you execute this command.
>>
>> Thanks,
>> Hari Shreedharan
>>
>>
>>
>>
>>> On Jun 8, 2015, at 7:30 PM, Robert B Hamilton <ro...@gm.com> wrote:
>>>
>>> Is there anything like a logdump tool for flume file channel?
>>> Specifically I+IBk-m looking for some way to extract say the event data for the last N puts.
>>> Alternatively can the logs be modified so that the last N (sink) commits will be ignored on restart?
>>>
>>> The scenario that I+IBk-m concerned about is this:
>>>
>>> 1.       server crashes, flume is restarted once the server is brought back.
>>> 2.       End user sees something odd in his HiveQL and speculates that data was lost.
>>> 3.       We peek into the WAL as they existed just before the restart (we saved off a copy) and either
>>> a.       Find an event corresponding to the missing data and use that to fix the data in the destination, or
>>> b.      Prove that the event corresponding to the missing data was not present at least as far back as the logs go
>>>
>>> I+IBk-m just wondering if there is a tool which makes number 3 possible+ICY.
>>>
>>>
>>>
>>> Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.
>>>
>>> Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.
>>
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>
>
> Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.
>
> Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

RE: log dump tool?

Posted by Robert B Hamilton <ro...@gm.com>.
Thanks Ashish and Hari and Gwen. Very nice example.

It looks like it would not take much at all to make a variation that dumps Events from a given datadir. Thanks!


-----Original Message-----
From: Ashish [mailto:paliwalashish@gmail.com]
Sent: Monday, June 08, 2015 9:49 PM
To: user@flume.apache.org
Subject: Re: log dump tool?

Somehow documentation has not been updated. Let me check the JIRA.

Look into this flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegrityTool.java
It has an example, you can use a simple logger or a simple file writing utility to dump the contents.

I shall be writing a blog post on the same topic soon. Been busy lately.

thanks
Ashish

On Mon, Jun 8, 2015 at 7:39 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
> Flume has a tool that will allow you to run all events in the file channel through a piece of custom code you’d supply:
>
> bin/flume-ng tool FCINTEGRITYTOOL
>
> You can see the arguments you’d need to supply when you execute this command.
>
> Thanks,
> Hari Shreedharan
>
>
>
>
>> On Jun 8, 2015, at 7:30 PM, Robert B Hamilton <ro...@gm.com> wrote:
>>
>> Is there anything like a logdump tool for flume file channel?
>> Specifically I+IBk-m looking for some way to extract say the event data for the last N puts.
>> Alternatively can the logs be modified so that the last N (sink) commits will be ignored on restart?
>>
>> The scenario that I+IBk-m concerned about is this:
>>
>> 1.       server crashes, flume is restarted once the server is brought back.
>> 2.       End user sees something odd in his HiveQL and speculates that data was lost.
>> 3.       We peek into the WAL as they existed just before the restart (we saved off a copy) and either
>> a.       Find an event corresponding to the missing data and use that to fix the data in the destination, or
>> b.      Prove that the event corresponding to the missing data was not present at least as far back as the logs go
>>
>> I+IBk-m just wondering if there is a tool which makes number 3 possible+ICY.
>>
>>
>>
>> Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.
>>
>> Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.
>



--
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal


Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.

Re: log dump tool?

Posted by Ashish <pa...@gmail.com>.
Somehow documentation has not been updated. Let me check the JIRA.

Look into this flume-tools/src/test/java/org/apache/flume/tools/TestFileChannelIntegrityTool.java
It has an example, you can use a simple logger or a simple file
writing utility to dump the contents.

I shall be writing a blog post on the same topic soon. Been busy lately.

thanks
Ashish

On Mon, Jun 8, 2015 at 7:39 PM, Hari Shreedharan
<hs...@cloudera.com> wrote:
> Flume has a tool that will allow you to run all events in the file channel through a piece of custom code you’d supply:
>
> bin/flume-ng tool FCINTEGRITYTOOL
>
> You can see the arguments you’d need to supply when you execute this command.
>
> Thanks,
> Hari Shreedharan
>
>
>
>
>> On Jun 8, 2015, at 7:30 PM, Robert B Hamilton <ro...@gm.com> wrote:
>>
>> Is there anything like a logdump tool for flume file channel?
>> Specifically I+IBk-m looking for some way to extract say the event data for the last N puts.
>> Alternatively can the logs be modified so that the last N (sink) commits will be ignored on restart?
>>
>> The scenario that I+IBk-m concerned about is this:
>>
>> 1.       server crashes, flume is restarted once the server is brought back.
>> 2.       End user sees something odd in his HiveQL and speculates that data was lost.
>> 3.       We peek into the WAL as they existed just before the restart (we saved off a copy) and either
>> a.       Find an event corresponding to the missing data and use that to fix the data in the destination, or
>> b.      Prove that the event corresponding to the missing data was not present at least as far back as the logs go
>>
>> I+IBk-m just wondering if there is a tool which makes number 3 possible+ICY.
>>
>>
>>
>> Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.
>>
>> Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: log dump tool?

Posted by Hari Shreedharan <hs...@cloudera.com>.
Flume has a tool that will allow you to run all events in the file channel through a piece of custom code you’d supply:

bin/flume-ng tool FCINTEGRITYTOOL

You can see the arguments you’d need to supply when you execute this command.

Thanks,
Hari Shreedharan




> On Jun 8, 2015, at 7:30 PM, Robert B Hamilton <ro...@gm.com> wrote:
> 
> Is there anything like a logdump tool for flume file channel?   
> Specifically I+IBk-m looking for some way to extract say the event data for the last N puts.
> Alternatively can the logs be modified so that the last N (sink) commits will be ignored on restart?
>  
> The scenario that I+IBk-m concerned about is this:  
>  
> 1.       server crashes, flume is restarted once the server is brought back.
> 2.       End user sees something odd in his HiveQL and speculates that data was lost.
> 3.       We peek into the WAL as they existed just before the restart (we saved off a copy) and either
> a.       Find an event corresponding to the missing data and use that to fix the data in the destination, or
> b.      Prove that the event corresponding to the missing data was not present at least as far back as the logs go
>  
> I+IBk-m just wondering if there is a tool which makes number 3 possible+ICY.
>  
> 
> 
> Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message. 
> 
> Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.