You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by SaravanaKumar TR <sa...@gmail.com> on 2014/10/27 10:21:05 UTC

Need suggestion on reliable source for log processing

Hi,

I am using Apache flume 1.5.0.Quick setup explanation here.

Source:exec , tail –F command for a logfile.

Channel:  file channel

Sink: HDFS

Use case:to move real time data from logfile to HDFS.


It appears like exec is not a reliable source , as we may data loss if
channel/source is down.


So i tried with other option "spooling directory source" which is mentioned
as reliable source.But here I have a single logfile where data gets
appended in , so I dont see option of moving the file to spool directory.


Can anyone help me with providing any other reliable source option in case
where logfile gets appended with data and logfile rotation happens only at
the end of the day.


Thanks,

Saravana

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
Ahmed,

Can you please let me know how we configure logrotate.conf to move logs to
flume spool directory.
Because just having rotate directly in flume directory ends up with error I
mentioned.

Thanks,

On Fri, Nov 14, 2014 at 12:54 PM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> yes got it.I think we don't have the option to do without suffix.
>
> Sometimes flume throws error as "java.lang.IllegalStateException: File has
> changed size since being read" But I don't see any reason for a process to
> modify file after being moved to spool directory because its moved to spool
> directory via logrotate.
>
> Will flume has the option to notify us .with the process name/pid which
> modifies the file.
>
> On Thu, Nov 13, 2014 at 11:26 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>> Hi Saravana,
>>
>> I think there is no override for the .completed suffix.
>> Also, I think there is no way for Flume to distinguish which file it
>> already processed and which not.
>>
>> On Thu, Nov 13, 2014 at 4:54 AM, SaravanaKumar TR <saran0081986@gmail.com
>> > wrote:
>>
>>> Hi Ahmed,
>>>
>>> I have a  query with flume spool directory option.
>>>
>>> Is that possible to ignore fileSuffix option in spool dir source.It
>>> seems by default it will append .COMPLETED suffix.I don't want to append
>>> any suffix to the ingested file.
>>>
>>> Please let me  know if its possible.
>>>
>>> Thanks,
>>> Saravana
>>>
>>> On Mon, Oct 27, 2014 at 7:25 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>
>>>> You're welcome.
>>>>
>>>> Well... there will be at least "failed due to burned down hardware" :)
>>>>
>>>> Joke aside, there will be no solution with 100% certainty for a long
>>>> time to come.
>>>> As I see it, that is simply because maturity difference between
>>>> software, so you have to use some mumbo-jumbo techniques in order to make
>>>> them to work together without modifications.
>>>> I consider tail-f a mumbo-jumbo technique, but Flume community has been
>>>> nice enough to support level that low.
>>>>
>>>> If you care, you can implement full object-level logging in your
>>>> application via Avro and utilize Flume up to his potential... as well as
>>>> handling back-offs as you find appropriate.
>>>> But for such purpose there is also Flume's implementation of the log4j
>>>> appender, so you basically send all logs directly to the flume.
>>>> Not sure how back-offs are handled, but that's the level at which
>>>> applications should communicate.
>>>>
>>>> On the other hand, directory spool is mature to it's finest details,
>>>> supported by any application, altered easily... so that's why I have used
>>>> it.
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 2:39 PM, SaravanaKumar TR <
>>>> saran0081986@gmail.com> wrote:
>>>>
>>>>> Ahmed,
>>>>>
>>>>> Thanks for your details comments.
>>>>>
>>>>> Final point, in which cases these logging solution will be considered
>>>>> as a perfect system without  any tradeoffs,
>>>>>
>>>>> On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>>
>>>>>> Exactly up to the point.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <
>>>>>> saran0081986@gmail.com> wrote:
>>>>>>
>>>>>>> That was a good point.
>>>>>>>
>>>>>>> So if a solution mention as guarantee data delivery , it specifies
>>>>>>> that  only in the case when the event flows into the source/producers
>>>>>>> successfully by application and then from that point the system guarantee
>>>>>>> the event delivery till other end sink/consumer.
>>>>>>>
>>>>>>> It has no control over the proper flow of event reaching the
>>>>>>> source/producer.(like data loss)
>>>>>>>
>>>>>>> So there always be chances of data loss when the system goes down ,
>>>>>>> where certain tradeoff measures to be taken.
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Flume, Kafka, or any other system can only be responsible for it's
>>>>>>>> own actions. Looking from the perspective of the exec source in Flume - it
>>>>>>>> requests from the bash to give him an output from his stout. It cannot
>>>>>>>> control what bash will return.
>>>>>>>> Thus, it's not a file to him, but just a stream of text.
>>>>>>>>
>>>>>>>> When spooling directory source is in question, it will resume from
>>>>>>>> the file it failed with.
>>>>>>>> That reveals two approaches to event consumption: push and pull.
>>>>>>>>
>>>>>>>> When push approach is used then it cannot be aware of what comes
>>>>>>>> next and what was before it started to listen.
>>>>>>>>
>>>>>>>> Even so, some sources/producers, even they use pull approach,
>>>>>>>> doesn't have to know how to return to the last read event. It's up to
>>>>>>>> implementation.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ahmed
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> yes , I agree .
>>>>>>>>>
>>>>>>>>> I think no logging solution like source in flume/producer in kafka
>>>>>>>>>  have  any marking feature like exact point till it consumed from logfile ,
>>>>>>>>> to recover  incase of its failure to again start reading from the same
>>>>>>>>> point of the logfile.(before failure)
>>>>>>>>>
>>>>>>>>> This is the major point where failures were difficult to ignore.Am
>>>>>>>>> I right?
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> You can use spillable channel that will store events in memory
>>>>>>>>>> and once it fills it, it will spill to the disk.
>>>>>>>>>> Also, you can use file channel, but it's as fast as your disk is
>>>>>>>>>> and it's suggested to use a separate disk for it due to high IO with it,
>>>>>>>>>> preferably an SSD.
>>>>>>>>>>
>>>>>>>>>> But, that will not solve the issue you might run into - if the
>>>>>>>>>> flume fails for whatever the reason, you'll never be able to continue from
>>>>>>>>>> the exact point where it failed.
>>>>>>>>>> Yes, File channel preserves the state, so it will continue with
>>>>>>>>>> whatever he already received, but what about the time while it was down ?
>>>>>>>>>>
>>>>>>>>>> If you cannot change anything regarding the application that
>>>>>>>>>> produces the logs, then such circumstance has to be taken as a trade off.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes I understand the concerns with this use case.
>>>>>>>>>>>
>>>>>>>>>>> If so we need to configure failover in this scenario , can we
>>>>>>>>>>> have it like channel level ,sink channel.
>>>>>>>>>>>
>>>>>>>>>>> Does flume support to configure failover incase channel fills up.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> In fact, this is not the problem with Flume.
>>>>>>>>>>>>
>>>>>>>>>>>> No solution will function reliably for your use case, simply
>>>>>>>>>>>> because all of them will have to do some sort of tail-f or streaming on a
>>>>>>>>>>>> file and if they can't keep up with it (they mostly don't in high speed
>>>>>>>>>>>> entry points), they will drop some entries.
>>>>>>>>>>>> Please, be kind to yourself and plan for failures - if you need
>>>>>>>>>>>> to restart Flume or any other solution then you'll face dropped entries
>>>>>>>>>>>> that you'll not be able to re-ingest easily as in most cases you won't know
>>>>>>>>>>>> which ones you've dropped.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for comments Ahmed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>>>>>>>>> reliable source option for use case provided by me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If flume can't provide it, can you help me with any other log
>>>>>>>>>>>>> collector solutions which can I consider here to move real time data to
>>>>>>>>>>>>> HDFS.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <avila@devlogic.eu
>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then, you're out of luck in my opinion, as there is no way
>>>>>>>>>>>>>> other than tail -f.
>>>>>>>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>>>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off to
>>>>>>>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is a possibility to hack up the tail -f into another
>>>>>>>>>>>>>> file and then custom-rotate that duplicate file.
>>>>>>>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just a side note - If you're operating Java application
>>>>>>>>>>>>>> (Tomcat or similar), then you can create multiple output files via
>>>>>>>>>>>>>> log4j.properties configuration without application itself knowing anything
>>>>>>>>>>>>>> about it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ahmed,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here in my case , the application will rename the existing
>>>>>>>>>>>>>>> file as <logfile>.yesterdaydate and create a new file as <logfile> at 00:00
>>>>>>>>>>>>>>> AM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can't change the log rotation policy of application for
>>>>>>>>>>>>>>> now.So I guess I should rule out the option of using spooling directory
>>>>>>>>>>>>>>> source in my case.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you suggest me with any other options other than
>>>>>>>>>>>>>>> spooling dir source.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <
>>>>>>>>>>>>>>> avila@devlogic.eu> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It all depends on how log rotation is done and how
>>>>>>>>>>>>>>>> application producing the log file handles log rotation.
>>>>>>>>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file when it
>>>>>>>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some applications
>>>>>>>>>>>>>>>> might restart as a result.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If the application just reopens the log file, then you can
>>>>>>>>>>>>>>>> change your log rotation policy to be per minute.
>>>>>>>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so
>>>>>>>>>>>>>>>> you'll have to make a cron job to do it.
>>>>>>>>>>>>>>>> In such case, you would separate finished logs location and
>>>>>>>>>>>>>>>> live log location so the spooling directory source doesn't freak out about
>>>>>>>>>>>>>>>> active log file being appended.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Anyway, spooling directory source is a way to go, as it
>>>>>>>>>>>>>>>> will leave log files in place, just renamed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation
>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It appears like exec is not a reliable source , as we may
>>>>>>>>>>>>>>>>> data loss if channel/source is down.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So i tried with other option "spooling directory source"
>>>>>>>>>>>>>>>>> which is mentioned as reliable source.But here I have a single logfile
>>>>>>>>>>>>>>>>> where data gets appended in , so I dont see option of moving the file to
>>>>>>>>>>>>>>>>> spool directory.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can anyone help me with providing any other reliable
>>>>>>>>>>>>>>>>> source option in case where logfile gets appended with data and logfile
>>>>>>>>>>>>>>>>> rotation happens only at the end of the day.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Saravana
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>> ---------------------------------------
>>>>>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> ---------
>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Ahmed Vila | Senior software developer
>>>>>>>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>>>>>>>
>>>>>>>>>> Office : +387 33 942 123
>>>>>>>>>> Mobile: +387 62 139 348
>>>>>>>>>>
>>>>>>>>>> Website: www.devlogic.eu
>>>>>>>>>> E-mail   : avila@devlogic.eu
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>> ---------
>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>> ---------
>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>> communication complies with the law and company policies.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>> Ahmed Vila | Senior software developer
>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>
>>>> Office : +387 33 942 123
>>>> Mobile: +387 62 139 348
>>>>
>>>> Website: www.devlogic.eu
>>>> E-mail   : avila@devlogic.eu
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>>
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Best regards,
>> Ahmed Vila | Senior software developer
>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>
>> Office : +387 33 942 123
>> Mobile: +387 62 139 348
>>
>> Website: www.devlogic.eu
>> E-mail   : avila@devlogic.eu
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>
>

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
yes got it.I think we don't have the option to do without suffix.

Sometimes flume throws error as "java.lang.IllegalStateException: File has
changed size since being read" But I don't see any reason for a process to
modify file after being moved to spool directory because its moved to spool
directory via logrotate.

Will flume has the option to notify us .with the process name/pid which
modifies the file.

On Thu, Nov 13, 2014 at 11:26 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> Hi Saravana,
>
> I think there is no override for the .completed suffix.
> Also, I think there is no way for Flume to distinguish which file it
> already processed and which not.
>
> On Thu, Nov 13, 2014 at 4:54 AM, SaravanaKumar TR <sa...@gmail.com>
> wrote:
>
>> Hi Ahmed,
>>
>> I have a  query with flume spool directory option.
>>
>> Is that possible to ignore fileSuffix option in spool dir source.It seems
>> by default it will append .COMPLETED suffix.I don't want to append any
>> suffix to the ingested file.
>>
>> Please let me  know if its possible.
>>
>> Thanks,
>> Saravana
>>
>> On Mon, Oct 27, 2014 at 7:25 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>
>>> You're welcome.
>>>
>>> Well... there will be at least "failed due to burned down hardware" :)
>>>
>>> Joke aside, there will be no solution with 100% certainty for a long
>>> time to come.
>>> As I see it, that is simply because maturity difference between
>>> software, so you have to use some mumbo-jumbo techniques in order to make
>>> them to work together without modifications.
>>> I consider tail-f a mumbo-jumbo technique, but Flume community has been
>>> nice enough to support level that low.
>>>
>>> If you care, you can implement full object-level logging in your
>>> application via Avro and utilize Flume up to his potential... as well as
>>> handling back-offs as you find appropriate.
>>> But for such purpose there is also Flume's implementation of the log4j
>>> appender, so you basically send all logs directly to the flume.
>>> Not sure how back-offs are handled, but that's the level at which
>>> applications should communicate.
>>>
>>> On the other hand, directory spool is mature to it's finest details,
>>> supported by any application, altered easily... so that's why I have used
>>> it.
>>>
>>>
>>> On Mon, Oct 27, 2014 at 2:39 PM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> Ahmed,
>>>>
>>>> Thanks for your details comments.
>>>>
>>>> Final point, in which cases these logging solution will be considered
>>>> as a perfect system without  any tradeoffs,
>>>>
>>>> On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>
>>>>> Exactly up to the point.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> That was a good point.
>>>>>>
>>>>>> So if a solution mention as guarantee data delivery , it specifies
>>>>>> that  only in the case when the event flows into the source/producers
>>>>>> successfully by application and then from that point the system guarantee
>>>>>> the event delivery till other end sink/consumer.
>>>>>>
>>>>>> It has no control over the proper flow of event reaching the
>>>>>> source/producer.(like data loss)
>>>>>>
>>>>>> So there always be chances of data loss when the system goes down ,
>>>>>> where certain tradeoff measures to be taken.
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Flume, Kafka, or any other system can only be responsible for it's
>>>>>>> own actions. Looking from the perspective of the exec source in Flume - it
>>>>>>> requests from the bash to give him an output from his stout. It cannot
>>>>>>> control what bash will return.
>>>>>>> Thus, it's not a file to him, but just a stream of text.
>>>>>>>
>>>>>>> When spooling directory source is in question, it will resume from
>>>>>>> the file it failed with.
>>>>>>> That reveals two approaches to event consumption: push and pull.
>>>>>>>
>>>>>>> When push approach is used then it cannot be aware of what comes
>>>>>>> next and what was before it started to listen.
>>>>>>>
>>>>>>> Even so, some sources/producers, even they use pull approach,
>>>>>>> doesn't have to know how to return to the last read event. It's up to
>>>>>>> implementation.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ahmed
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>
>>>>>>>> yes , I agree .
>>>>>>>>
>>>>>>>> I think no logging solution like source in flume/producer in kafka
>>>>>>>>  have  any marking feature like exact point till it consumed from logfile ,
>>>>>>>> to recover  incase of its failure to again start reading from the same
>>>>>>>> point of the logfile.(before failure)
>>>>>>>>
>>>>>>>> This is the major point where failures were difficult to ignore.Am
>>>>>>>> I right?
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> You can use spillable channel that will store events in memory and
>>>>>>>>> once it fills it, it will spill to the disk.
>>>>>>>>> Also, you can use file channel, but it's as fast as your disk is
>>>>>>>>> and it's suggested to use a separate disk for it due to high IO with it,
>>>>>>>>> preferably an SSD.
>>>>>>>>>
>>>>>>>>> But, that will not solve the issue you might run into - if the
>>>>>>>>> flume fails for whatever the reason, you'll never be able to continue from
>>>>>>>>> the exact point where it failed.
>>>>>>>>> Yes, File channel preserves the state, so it will continue with
>>>>>>>>> whatever he already received, but what about the time while it was down ?
>>>>>>>>>
>>>>>>>>> If you cannot change anything regarding the application that
>>>>>>>>> produces the logs, then such circumstance has to be taken as a trade off.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes I understand the concerns with this use case.
>>>>>>>>>>
>>>>>>>>>> If so we need to configure failover in this scenario , can we
>>>>>>>>>> have it like channel level ,sink channel.
>>>>>>>>>>
>>>>>>>>>> Does flume support to configure failover incase channel fills up.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> In fact, this is not the problem with Flume.
>>>>>>>>>>>
>>>>>>>>>>> No solution will function reliably for your use case, simply
>>>>>>>>>>> because all of them will have to do some sort of tail-f or streaming on a
>>>>>>>>>>> file and if they can't keep up with it (they mostly don't in high speed
>>>>>>>>>>> entry points), they will drop some entries.
>>>>>>>>>>> Please, be kind to yourself and plan for failures - if you need
>>>>>>>>>>> to restart Flume or any other solution then you'll face dropped entries
>>>>>>>>>>> that you'll not be able to re-ingest easily as in most cases you won't know
>>>>>>>>>>> which ones you've dropped.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ahmed
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for comments Ahmed.
>>>>>>>>>>>>
>>>>>>>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>>>>>>>> reliable source option for use case provided by me.
>>>>>>>>>>>>
>>>>>>>>>>>> If flume can't provide it, can you help me with any other log
>>>>>>>>>>>> collector solutions which can I consider here to move real time data to
>>>>>>>>>>>> HDFS.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then, you're out of luck in my opinion, as there is no way
>>>>>>>>>>>>> other than tail -f.
>>>>>>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off to
>>>>>>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is a possibility to hack up the tail -f into another
>>>>>>>>>>>>> file and then custom-rotate that duplicate file.
>>>>>>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just a side note - If you're operating Java application
>>>>>>>>>>>>> (Tomcat or similar), then you can create multiple output files via
>>>>>>>>>>>>> log4j.properties configuration without application itself knowing anything
>>>>>>>>>>>>> about it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ahmed,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here in my case , the application will rename the existing
>>>>>>>>>>>>>> file as <logfile>.yesterdaydate and create a new file as <logfile> at 00:00
>>>>>>>>>>>>>> AM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can't change the log rotation policy of application for
>>>>>>>>>>>>>> now.So I guess I should rule out the option of using spooling directory
>>>>>>>>>>>>>> source in my case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you suggest me with any other options other than spooling
>>>>>>>>>>>>>> dir source.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <
>>>>>>>>>>>>>> avila@devlogic.eu> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It all depends on how log rotation is done and how
>>>>>>>>>>>>>>> application producing the log file handles log rotation.
>>>>>>>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file when it
>>>>>>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some applications
>>>>>>>>>>>>>>> might restart as a result.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If the application just reopens the log file, then you can
>>>>>>>>>>>>>>> change your log rotation policy to be per minute.
>>>>>>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so
>>>>>>>>>>>>>>> you'll have to make a cron job to do it.
>>>>>>>>>>>>>>> In such case, you would separate finished logs location and
>>>>>>>>>>>>>>> live log location so the spooling directory source doesn't freak out about
>>>>>>>>>>>>>>> active log file being appended.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Anyway, spooling directory source is a way to go, as it will
>>>>>>>>>>>>>>> leave log files in place, just renamed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It appears like exec is not a reliable source , as we may
>>>>>>>>>>>>>>>> data loss if channel/source is down.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So i tried with other option "spooling directory source"
>>>>>>>>>>>>>>>> which is mentioned as reliable source.But here I have a single logfile
>>>>>>>>>>>>>>>> where data gets appended in , so I dont see option of moving the file to
>>>>>>>>>>>>>>>> spool directory.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can anyone help me with providing any other reliable source
>>>>>>>>>>>>>>>> option in case where logfile gets appended with data and logfile rotation
>>>>>>>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Saravana
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>>> ---------
>>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> ---------
>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Ahmed Vila | Senior software developer
>>>>>>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>>>>>>
>>>>>>>>> Office : +387 33 942 123
>>>>>>>>> Mobile: +387 62 139 348
>>>>>>>>>
>>>>>>>>> Website: www.devlogic.eu
>>>>>>>>> E-mail   : avila@devlogic.eu
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>>> communication complies with the law and company policies.
>>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Best regards,
>>> Ahmed Vila | Senior software developer
>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>
>>> Office : +387 33 942 123
>>> Mobile: +387 62 139 348
>>>
>>> Website: www.devlogic.eu
>>> E-mail   : avila@devlogic.eu
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>>
>
>
> --
>
> Best regards,
> Ahmed Vila | Senior software developer
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
> Office : +387 33 942 123
> Mobile: +387 62 139 348
>
> Website: www.devlogic.eu
> E-mail   : avila@devlogic.eu
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Need suggestion on reliable source for log processing

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi Saravana,

I think there is no override for the .completed suffix.
Also, I think there is no way for Flume to distinguish which file it
already processed and which not.

On Thu, Nov 13, 2014 at 4:54 AM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> Hi Ahmed,
>
> I have a  query with flume spool directory option.
>
> Is that possible to ignore fileSuffix option in spool dir source.It seems
> by default it will append .COMPLETED suffix.I don't want to append any
> suffix to the ingested file.
>
> Please let me  know if its possible.
>
> Thanks,
> Saravana
>
> On Mon, Oct 27, 2014 at 7:25 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>> You're welcome.
>>
>> Well... there will be at least "failed due to burned down hardware" :)
>>
>> Joke aside, there will be no solution with 100% certainty for a long time
>> to come.
>> As I see it, that is simply because maturity difference between software,
>> so you have to use some mumbo-jumbo techniques in order to make them to
>> work together without modifications.
>> I consider tail-f a mumbo-jumbo technique, but Flume community has been
>> nice enough to support level that low.
>>
>> If you care, you can implement full object-level logging in your
>> application via Avro and utilize Flume up to his potential... as well as
>> handling back-offs as you find appropriate.
>> But for such purpose there is also Flume's implementation of the log4j
>> appender, so you basically send all logs directly to the flume.
>> Not sure how back-offs are handled, but that's the level at which
>> applications should communicate.
>>
>> On the other hand, directory spool is mature to it's finest details,
>> supported by any application, altered easily... so that's why I have used
>> it.
>>
>>
>> On Mon, Oct 27, 2014 at 2:39 PM, SaravanaKumar TR <saran0081986@gmail.com
>> > wrote:
>>
>>> Ahmed,
>>>
>>> Thanks for your details comments.
>>>
>>> Final point, in which cases these logging solution will be considered as
>>> a perfect system without  any tradeoffs,
>>>
>>> On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>
>>>> Exactly up to the point.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <
>>>> saran0081986@gmail.com> wrote:
>>>>
>>>>> That was a good point.
>>>>>
>>>>> So if a solution mention as guarantee data delivery , it specifies
>>>>> that  only in the case when the event flows into the source/producers
>>>>> successfully by application and then from that point the system guarantee
>>>>> the event delivery till other end sink/consumer.
>>>>>
>>>>> It has no control over the proper flow of event reaching the
>>>>> source/producer.(like data loss)
>>>>>
>>>>> So there always be chances of data loss when the system goes down ,
>>>>> where certain tradeoff measures to be taken.
>>>>>
>>>>> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Flume, Kafka, or any other system can only be responsible for it's
>>>>>> own actions. Looking from the perspective of the exec source in Flume - it
>>>>>> requests from the bash to give him an output from his stout. It cannot
>>>>>> control what bash will return.
>>>>>> Thus, it's not a file to him, but just a stream of text.
>>>>>>
>>>>>> When spooling directory source is in question, it will resume from
>>>>>> the file it failed with.
>>>>>> That reveals two approaches to event consumption: push and pull.
>>>>>>
>>>>>> When push approach is used then it cannot be aware of what comes next
>>>>>> and what was before it started to listen.
>>>>>>
>>>>>> Even so, some sources/producers, even they use pull approach, doesn't
>>>>>> have to know how to return to the last read event. It's up to
>>>>>> implementation.
>>>>>>
>>>>>> Regards,
>>>>>> Ahmed
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>>>>>> saran0081986@gmail.com> wrote:
>>>>>>
>>>>>>> yes , I agree .
>>>>>>>
>>>>>>> I think no logging solution like source in flume/producer in kafka
>>>>>>>  have  any marking feature like exact point till it consumed from logfile ,
>>>>>>> to recover  incase of its failure to again start reading from the same
>>>>>>> point of the logfile.(before failure)
>>>>>>>
>>>>>>> This is the major point where failures were difficult to ignore.Am I
>>>>>>> right?
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> You can use spillable channel that will store events in memory and
>>>>>>>> once it fills it, it will spill to the disk.
>>>>>>>> Also, you can use file channel, but it's as fast as your disk is
>>>>>>>> and it's suggested to use a separate disk for it due to high IO with it,
>>>>>>>> preferably an SSD.
>>>>>>>>
>>>>>>>> But, that will not solve the issue you might run into - if the
>>>>>>>> flume fails for whatever the reason, you'll never be able to continue from
>>>>>>>> the exact point where it failed.
>>>>>>>> Yes, File channel preserves the state, so it will continue with
>>>>>>>> whatever he already received, but what about the time while it was down ?
>>>>>>>>
>>>>>>>> If you cannot change anything regarding the application that
>>>>>>>> produces the logs, then such circumstance has to be taken as a trade off.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yes I understand the concerns with this use case.
>>>>>>>>>
>>>>>>>>> If so we need to configure failover in this scenario , can we have
>>>>>>>>> it like channel level ,sink channel.
>>>>>>>>>
>>>>>>>>> Does flume support to configure failover incase channel fills up.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> In fact, this is not the problem with Flume.
>>>>>>>>>>
>>>>>>>>>> No solution will function reliably for your use case, simply
>>>>>>>>>> because all of them will have to do some sort of tail-f or streaming on a
>>>>>>>>>> file and if they can't keep up with it (they mostly don't in high speed
>>>>>>>>>> entry points), they will drop some entries.
>>>>>>>>>> Please, be kind to yourself and plan for failures - if you need
>>>>>>>>>> to restart Flume or any other solution then you'll face dropped entries
>>>>>>>>>> that you'll not be able to re-ingest easily as in most cases you won't know
>>>>>>>>>> which ones you've dropped.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ahmed
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for comments Ahmed.
>>>>>>>>>>>
>>>>>>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>>>>>>> reliable source option for use case provided by me.
>>>>>>>>>>>
>>>>>>>>>>> If flume can't provide it, can you help me with any other log
>>>>>>>>>>> collector solutions which can I consider here to move real time data to
>>>>>>>>>>> HDFS.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Then, you're out of luck in my opinion, as there is no way
>>>>>>>>>>>> other than tail -f.
>>>>>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off to
>>>>>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>>>>>
>>>>>>>>>>>> There is a possibility to hack up the tail -f into another file
>>>>>>>>>>>> and then custom-rotate that duplicate file.
>>>>>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>>>>>
>>>>>>>>>>>> Just a side note - If you're operating Java application (Tomcat
>>>>>>>>>>>> or similar), then you can create multiple output files via log4j.properties
>>>>>>>>>>>> configuration without application itself knowing anything about it.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Ahmed,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here in my case , the application will rename the existing
>>>>>>>>>>>>> file as <logfile>.yesterdaydate and create a new file as <logfile> at 00:00
>>>>>>>>>>>>> AM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can't change the log rotation policy of application for
>>>>>>>>>>>>> now.So I guess I should rule out the option of using spooling directory
>>>>>>>>>>>>> source in my case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you suggest me with any other options other than spooling
>>>>>>>>>>>>> dir source.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <avila@devlogic.eu
>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It all depends on how log rotation is done and how
>>>>>>>>>>>>>> application producing the log file handles log rotation.
>>>>>>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file when it
>>>>>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some applications
>>>>>>>>>>>>>> might restart as a result.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If the application just reopens the log file, then you can
>>>>>>>>>>>>>> change your log rotation policy to be per minute.
>>>>>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so
>>>>>>>>>>>>>> you'll have to make a cron job to do it.
>>>>>>>>>>>>>> In such case, you would separate finished logs location and
>>>>>>>>>>>>>> live log location so the spooling directory source doesn't freak out about
>>>>>>>>>>>>>> active log file being appended.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Anyway, spooling directory source is a way to go, as it will
>>>>>>>>>>>>>> leave log files in place, just renamed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It appears like exec is not a reliable source , as we may
>>>>>>>>>>>>>>> data loss if channel/source is down.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So i tried with other option "spooling directory source"
>>>>>>>>>>>>>>> which is mentioned as reliable source.But here I have a single logfile
>>>>>>>>>>>>>>> where data gets appended in , so I dont see option of moving the file to
>>>>>>>>>>>>>>> spool directory.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can anyone help me with providing any other reliable source
>>>>>>>>>>>>>>> option in case where logfile gets appended with data and logfile rotation
>>>>>>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Saravana
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> ---------
>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>> ---------
>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Ahmed Vila | Senior software developer
>>>>>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>>>>>
>>>>>>>> Office : +387 33 942 123
>>>>>>>> Mobile: +387 62 139 348
>>>>>>>>
>>>>>>>> Website: www.devlogic.eu
>>>>>>>> E-mail   : avila@devlogic.eu
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>> communication complies with the law and company policies.
>>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Best regards,
>> Ahmed Vila | Senior software developer
>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>
>> Office : +387 33 942 123
>> Mobile: +387 62 139 348
>>
>> Website: www.devlogic.eu
>> E-mail   : avila@devlogic.eu
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>
>


-- 

Best regards,
Ahmed Vila | Senior software developer
DevLogic | Sarajevo | Bosnia and Herzegovina

Office : +387 33 942 123
Mobile: +387 62 139 348

Website: www.devlogic.eu
E-mail   : avila@devlogic.eu
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended
recipient(s) only. This email contains confidential information. It should
not be copied, disclosed to, retained or used by, any party other than the
intended recipient. Any unauthorised distribution, dissemination or copying
of this E-mail or its attachments, and/or any use of any information
contained in them, is strictly prohibited and may be illegal. If you are
not an intended recipient then please promptly delete this e-mail and any
attachment and all copies and inform the sender directly via email. Any
emails that you send to us may be monitored by systems or persons other
than the named communicant for the purposes of ascertaining whether the
communication complies with the law and company policies.

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
Hi Ahmed,

I have a  query with flume spool directory option.

Is that possible to ignore fileSuffix option in spool dir source.It seems
by default it will append .COMPLETED suffix.I don't want to append any
suffix to the ingested file.

Please let me  know if its possible.

Thanks,
Saravana

On Mon, Oct 27, 2014 at 7:25 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> You're welcome.
>
> Well... there will be at least "failed due to burned down hardware" :)
>
> Joke aside, there will be no solution with 100% certainty for a long time
> to come.
> As I see it, that is simply because maturity difference between software,
> so you have to use some mumbo-jumbo techniques in order to make them to
> work together without modifications.
> I consider tail-f a mumbo-jumbo technique, but Flume community has been
> nice enough to support level that low.
>
> If you care, you can implement full object-level logging in your
> application via Avro and utilize Flume up to his potential... as well as
> handling back-offs as you find appropriate.
> But for such purpose there is also Flume's implementation of the log4j
> appender, so you basically send all logs directly to the flume.
> Not sure how back-offs are handled, but that's the level at which
> applications should communicate.
>
> On the other hand, directory spool is mature to it's finest details,
> supported by any application, altered easily... so that's why I have used
> it.
>
>
> On Mon, Oct 27, 2014 at 2:39 PM, SaravanaKumar TR <sa...@gmail.com>
> wrote:
>
>> Ahmed,
>>
>> Thanks for your details comments.
>>
>> Final point, in which cases these logging solution will be considered as
>> a perfect system without  any tradeoffs,
>>
>> On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>
>>> Exactly up to the point.
>>>
>>>
>>>
>>>
>>> On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> That was a good point.
>>>>
>>>> So if a solution mention as guarantee data delivery , it specifies that
>>>>  only in the case when the event flows into the source/producers
>>>> successfully by application and then from that point the system guarantee
>>>> the event delivery till other end sink/consumer.
>>>>
>>>> It has no control over the proper flow of event reaching the
>>>> source/producer.(like data loss)
>>>>
>>>> So there always be chances of data loss when the system goes down ,
>>>> where certain tradeoff measures to be taken.
>>>>
>>>> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Flume, Kafka, or any other system can only be responsible for it's own
>>>>> actions. Looking from the perspective of the exec source in Flume - it
>>>>> requests from the bash to give him an output from his stout. It cannot
>>>>> control what bash will return.
>>>>> Thus, it's not a file to him, but just a stream of text.
>>>>>
>>>>> When spooling directory source is in question, it will resume from the
>>>>> file it failed with.
>>>>> That reveals two approaches to event consumption: push and pull.
>>>>>
>>>>> When push approach is used then it cannot be aware of what comes next
>>>>> and what was before it started to listen.
>>>>>
>>>>> Even so, some sources/producers, even they use pull approach, doesn't
>>>>> have to know how to return to the last read event. It's up to
>>>>> implementation.
>>>>>
>>>>> Regards,
>>>>> Ahmed
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> yes , I agree .
>>>>>>
>>>>>> I think no logging solution like source in flume/producer in kafka
>>>>>>  have  any marking feature like exact point till it consumed from logfile ,
>>>>>> to recover  incase of its failure to again start reading from the same
>>>>>> point of the logfile.(before failure)
>>>>>>
>>>>>> This is the major point where failures were difficult to ignore.Am I
>>>>>> right?
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> You can use spillable channel that will store events in memory and
>>>>>>> once it fills it, it will spill to the disk.
>>>>>>> Also, you can use file channel, but it's as fast as your disk is and
>>>>>>> it's suggested to use a separate disk for it due to high IO with it,
>>>>>>> preferably an SSD.
>>>>>>>
>>>>>>> But, that will not solve the issue you might run into - if the flume
>>>>>>> fails for whatever the reason, you'll never be able to continue from the
>>>>>>> exact point where it failed.
>>>>>>> Yes, File channel preserves the state, so it will continue with
>>>>>>> whatever he already received, but what about the time while it was down ?
>>>>>>>
>>>>>>> If you cannot change anything regarding the application that
>>>>>>> produces the logs, then such circumstance has to be taken as a trade off.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>
>>>>>>>> Yes I understand the concerns with this use case.
>>>>>>>>
>>>>>>>> If so we need to configure failover in this scenario , can we have
>>>>>>>> it like channel level ,sink channel.
>>>>>>>>
>>>>>>>> Does flume support to configure failover incase channel fills up.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> In fact, this is not the problem with Flume.
>>>>>>>>>
>>>>>>>>> No solution will function reliably for your use case, simply
>>>>>>>>> because all of them will have to do some sort of tail-f or streaming on a
>>>>>>>>> file and if they can't keep up with it (they mostly don't in high speed
>>>>>>>>> entry points), they will drop some entries.
>>>>>>>>> Please, be kind to yourself and plan for failures - if you need to
>>>>>>>>> restart Flume or any other solution then you'll face dropped entries that
>>>>>>>>> you'll not be able to re-ingest easily as in most cases you won't know
>>>>>>>>> which ones you've dropped.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ahmed
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for comments Ahmed.
>>>>>>>>>>
>>>>>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>>>>>> reliable source option for use case provided by me.
>>>>>>>>>>
>>>>>>>>>> If flume can't provide it, can you help me with any other log
>>>>>>>>>> collector solutions which can I consider here to move real time data to
>>>>>>>>>> HDFS.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Then, you're out of luck in my opinion, as there is no way other
>>>>>>>>>>> than tail -f.
>>>>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off to
>>>>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>>>>
>>>>>>>>>>> There is a possibility to hack up the tail -f into another file
>>>>>>>>>>> and then custom-rotate that duplicate file.
>>>>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>>>>
>>>>>>>>>>> Just a side note - If you're operating Java application (Tomcat
>>>>>>>>>>> or similar), then you can create multiple output files via log4j.properties
>>>>>>>>>>> configuration without application itself knowing anything about it.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ahmed
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Ahmed,
>>>>>>>>>>>>
>>>>>>>>>>>> Here in my case , the application will rename the existing file
>>>>>>>>>>>> as <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>>>>>>>>>
>>>>>>>>>>>> I can't change the log rotation policy of application for
>>>>>>>>>>>> now.So I guess I should rule out the option of using spooling directory
>>>>>>>>>>>> source in my case.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you suggest me with any other options other than spooling
>>>>>>>>>>>> dir source.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> It all depends on how log rotation is done and how application
>>>>>>>>>>>>> producing the log file handles log rotation.
>>>>>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file when it
>>>>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some applications
>>>>>>>>>>>>> might restart as a result.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If the application just reopens the log file, then you can
>>>>>>>>>>>>> change your log rotation policy to be per minute.
>>>>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so
>>>>>>>>>>>>> you'll have to make a cron job to do it.
>>>>>>>>>>>>> In such case, you would separate finished logs location and
>>>>>>>>>>>>> live log location so the spooling directory source doesn't freak out about
>>>>>>>>>>>>> active log file being appended.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyway, spooling directory source is a way to go, as it will
>>>>>>>>>>>>> leave log files in place, just renamed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It appears like exec is not a reliable source , as we may
>>>>>>>>>>>>>> data loss if channel/source is down.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So i tried with other option "spooling directory source"
>>>>>>>>>>>>>> which is mentioned as reliable source.But here I have a single logfile
>>>>>>>>>>>>>> where data gets appended in , so I dont see option of moving the file to
>>>>>>>>>>>>>> spool directory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can anyone help me with providing any other reliable source
>>>>>>>>>>>>>> option in case where logfile gets appended with data and logfile rotation
>>>>>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Saravana
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>>> ---------
>>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> ---------
>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Ahmed Vila | Senior software developer
>>>>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>>>>
>>>>>>> Office : +387 33 942 123
>>>>>>> Mobile: +387 62 139 348
>>>>>>>
>>>>>>> Website: www.devlogic.eu
>>>>>>> E-mail   : avila@devlogic.eu
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>>> communication complies with the law and company policies.
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>>> communication complies with the law and company policies.
>>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>>
>
>
> --
>
> Best regards,
> Ahmed Vila | Senior software developer
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
> Office : +387 33 942 123
> Mobile: +387 62 139 348
>
> Website: www.devlogic.eu
> E-mail   : avila@devlogic.eu
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Need suggestion on reliable source for log processing

Posted by Ahmed Vila <av...@devlogic.eu>.
You're welcome.

Well... there will be at least "failed due to burned down hardware" :)

Joke aside, there will be no solution with 100% certainty for a long time
to come.
As I see it, that is simply because maturity difference between software,
so you have to use some mumbo-jumbo techniques in order to make them to
work together without modifications.
I consider tail-f a mumbo-jumbo technique, but Flume community has been
nice enough to support level that low.

If you care, you can implement full object-level logging in your
application via Avro and utilize Flume up to his potential... as well as
handling back-offs as you find appropriate.
But for such purpose there is also Flume's implementation of the log4j
appender, so you basically send all logs directly to the flume.
Not sure how back-offs are handled, but that's the level at which
applications should communicate.

On the other hand, directory spool is mature to it's finest details,
supported by any application, altered easily... so that's why I have used
it.


On Mon, Oct 27, 2014 at 2:39 PM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> Ahmed,
>
> Thanks for your details comments.
>
> Final point, in which cases these logging solution will be considered as a
> perfect system without  any tradeoffs,
>
> On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>> Exactly up to the point.
>>
>>
>>
>>
>> On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <saran0081986@gmail.com
>> > wrote:
>>
>>> That was a good point.
>>>
>>> So if a solution mention as guarantee data delivery , it specifies that
>>>  only in the case when the event flows into the source/producers
>>> successfully by application and then from that point the system guarantee
>>> the event delivery till other end sink/consumer.
>>>
>>> It has no control over the proper flow of event reaching the
>>> source/producer.(like data loss)
>>>
>>> So there always be chances of data loss when the system goes down ,
>>> where certain tradeoff measures to be taken.
>>>
>>> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>
>>>> Hi,
>>>>
>>>> Flume, Kafka, or any other system can only be responsible for it's own
>>>> actions. Looking from the perspective of the exec source in Flume - it
>>>> requests from the bash to give him an output from his stout. It cannot
>>>> control what bash will return.
>>>> Thus, it's not a file to him, but just a stream of text.
>>>>
>>>> When spooling directory source is in question, it will resume from the
>>>> file it failed with.
>>>> That reveals two approaches to event consumption: push and pull.
>>>>
>>>> When push approach is used then it cannot be aware of what comes next
>>>> and what was before it started to listen.
>>>>
>>>> Even so, some sources/producers, even they use pull approach, doesn't
>>>> have to know how to return to the last read event. It's up to
>>>> implementation.
>>>>
>>>> Regards,
>>>> Ahmed
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>>>> saran0081986@gmail.com> wrote:
>>>>
>>>>> yes , I agree .
>>>>>
>>>>> I think no logging solution like source in flume/producer in kafka
>>>>>  have  any marking feature like exact point till it consumed from logfile ,
>>>>> to recover  incase of its failure to again start reading from the same
>>>>> point of the logfile.(before failure)
>>>>>
>>>>> This is the major point where failures were difficult to ignore.Am I
>>>>> right?
>>>>>
>>>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> You can use spillable channel that will store events in memory and
>>>>>> once it fills it, it will spill to the disk.
>>>>>> Also, you can use file channel, but it's as fast as your disk is and
>>>>>> it's suggested to use a separate disk for it due to high IO with it,
>>>>>> preferably an SSD.
>>>>>>
>>>>>> But, that will not solve the issue you might run into - if the flume
>>>>>> fails for whatever the reason, you'll never be able to continue from the
>>>>>> exact point where it failed.
>>>>>> Yes, File channel preserves the state, so it will continue with
>>>>>> whatever he already received, but what about the time while it was down ?
>>>>>>
>>>>>> If you cannot change anything regarding the application that produces
>>>>>> the logs, then such circumstance has to be taken as a trade off.
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>>>> saran0081986@gmail.com> wrote:
>>>>>>
>>>>>>> Yes I understand the concerns with this use case.
>>>>>>>
>>>>>>> If so we need to configure failover in this scenario , can we have
>>>>>>> it like channel level ,sink channel.
>>>>>>>
>>>>>>> Does flume support to configure failover incase channel fills up.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> In fact, this is not the problem with Flume.
>>>>>>>>
>>>>>>>> No solution will function reliably for your use case, simply
>>>>>>>> because all of them will have to do some sort of tail-f or streaming on a
>>>>>>>> file and if they can't keep up with it (they mostly don't in high speed
>>>>>>>> entry points), they will drop some entries.
>>>>>>>> Please, be kind to yourself and plan for failures - if you need to
>>>>>>>> restart Flume or any other solution then you'll face dropped entries that
>>>>>>>> you'll not be able to re-ingest easily as in most cases you won't know
>>>>>>>> which ones you've dropped.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ahmed
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks for comments Ahmed.
>>>>>>>>>
>>>>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>>>>> reliable source option for use case provided by me.
>>>>>>>>>
>>>>>>>>> If flume can't provide it, can you help me with any other log
>>>>>>>>> collector solutions which can I consider here to move real time data to
>>>>>>>>> HDFS.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Then, you're out of luck in my opinion, as there is no way other
>>>>>>>>>> than tail -f.
>>>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off to
>>>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>>>
>>>>>>>>>> There is a possibility to hack up the tail -f into another file
>>>>>>>>>> and then custom-rotate that duplicate file.
>>>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>>>
>>>>>>>>>> Just a side note - If you're operating Java application (Tomcat
>>>>>>>>>> or similar), then you can create multiple output files via log4j.properties
>>>>>>>>>> configuration without application itself knowing anything about it.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ahmed
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ahmed,
>>>>>>>>>>>
>>>>>>>>>>> Here in my case , the application will rename the existing file
>>>>>>>>>>> as <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>>>>>>>>
>>>>>>>>>>> I can't change the log rotation policy of application for now.So
>>>>>>>>>>> I guess I should rule out the option of using spooling directory source in
>>>>>>>>>>> my case.
>>>>>>>>>>>
>>>>>>>>>>> Can you suggest me with any other options other than spooling
>>>>>>>>>>> dir source.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> It all depends on how log rotation is done and how application
>>>>>>>>>>>> producing the log file handles log rotation.
>>>>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file when it
>>>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some applications
>>>>>>>>>>>> might restart as a result.
>>>>>>>>>>>>
>>>>>>>>>>>> If the application just reopens the log file, then you can
>>>>>>>>>>>> change your log rotation policy to be per minute.
>>>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so
>>>>>>>>>>>> you'll have to make a cron job to do it.
>>>>>>>>>>>> In such case, you would separate finished logs location and
>>>>>>>>>>>> live log location so the spooling directory source doesn't freak out about
>>>>>>>>>>>> active log file being appended.
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, spooling directory source is a way to go, as it will
>>>>>>>>>>>> leave log files in place, just renamed.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Ahmed
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>>
>>>>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It appears like exec is not a reliable source , as we may data
>>>>>>>>>>>>> loss if channel/source is down.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So i tried with other option "spooling directory source" which
>>>>>>>>>>>>> is mentioned as reliable source.But here I have a single logfile where data
>>>>>>>>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>>>>>>>>> directory.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can anyone help me with providing any other reliable source
>>>>>>>>>>>>> option in case where logfile gets appended with data and logfile rotation
>>>>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Saravana
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> ---------
>>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>> ---------
>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Best regards,
>>>>>> Ahmed Vila | Senior software developer
>>>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>>>
>>>>>> Office : +387 33 942 123
>>>>>> Mobile: +387 62 139 348
>>>>>>
>>>>>> Website: www.devlogic.eu
>>>>>> E-mail   : avila@devlogic.eu
>>>>>> ---------------------------------------------------------------------
>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>> communication complies with the law and company policies.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>> communication complies with the law and company policies.
>>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>
>


-- 

Best regards,
Ahmed Vila | Senior software developer
DevLogic | Sarajevo | Bosnia and Herzegovina

Office : +387 33 942 123
Mobile: +387 62 139 348

Website: www.devlogic.eu
E-mail   : avila@devlogic.eu
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended
recipient(s) only. This email contains confidential information. It should
not be copied, disclosed to, retained or used by, any party other than the
intended recipient. Any unauthorised distribution, dissemination or copying
of this E-mail or its attachments, and/or any use of any information
contained in them, is strictly prohibited and may be illegal. If you are
not an intended recipient then please promptly delete this e-mail and any
attachment and all copies and inform the sender directly via email. Any
emails that you send to us may be monitored by systems or persons other
than the named communicant for the purposes of ascertaining whether the
communication complies with the law and company policies.

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
Ahmed,

Thanks for your details comments.

Final point, in which cases these logging solution will be considered as a
perfect system without  any tradeoffs,

On Mon, Oct 27, 2014 at 6:47 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> Exactly up to the point.
>
>
>
>
> On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <sa...@gmail.com>
> wrote:
>
>> That was a good point.
>>
>> So if a solution mention as guarantee data delivery , it specifies that
>>  only in the case when the event flows into the source/producers
>> successfully by application and then from that point the system guarantee
>> the event delivery till other end sink/consumer.
>>
>> It has no control over the proper flow of event reaching the
>> source/producer.(like data loss)
>>
>> So there always be chances of data loss when the system goes down , where
>> certain tradeoff measures to be taken.
>>
>> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> Flume, Kafka, or any other system can only be responsible for it's own
>>> actions. Looking from the perspective of the exec source in Flume - it
>>> requests from the bash to give him an output from his stout. It cannot
>>> control what bash will return.
>>> Thus, it's not a file to him, but just a stream of text.
>>>
>>> When spooling directory source is in question, it will resume from the
>>> file it failed with.
>>> That reveals two approaches to event consumption: push and pull.
>>>
>>> When push approach is used then it cannot be aware of what comes next
>>> and what was before it started to listen.
>>>
>>> Even so, some sources/producers, even they use pull approach, doesn't
>>> have to know how to return to the last read event. It's up to
>>> implementation.
>>>
>>> Regards,
>>> Ahmed
>>>
>>>
>>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> yes , I agree .
>>>>
>>>> I think no logging solution like source in flume/producer in kafka
>>>>  have  any marking feature like exact point till it consumed from logfile ,
>>>> to recover  incase of its failure to again start reading from the same
>>>> point of the logfile.(before failure)
>>>>
>>>> This is the major point where failures were difficult to ignore.Am I
>>>> right?
>>>>
>>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> You can use spillable channel that will store events in memory and
>>>>> once it fills it, it will spill to the disk.
>>>>> Also, you can use file channel, but it's as fast as your disk is and
>>>>> it's suggested to use a separate disk for it due to high IO with it,
>>>>> preferably an SSD.
>>>>>
>>>>> But, that will not solve the issue you might run into - if the flume
>>>>> fails for whatever the reason, you'll never be able to continue from the
>>>>> exact point where it failed.
>>>>> Yes, File channel preserves the state, so it will continue with
>>>>> whatever he already received, but what about the time while it was down ?
>>>>>
>>>>> If you cannot change anything regarding the application that produces
>>>>> the logs, then such circumstance has to be taken as a trade off.
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> Yes I understand the concerns with this use case.
>>>>>>
>>>>>> If so we need to configure failover in this scenario , can we have it
>>>>>> like channel level ,sink channel.
>>>>>>
>>>>>> Does flume support to configure failover incase channel fills up.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> In fact, this is not the problem with Flume.
>>>>>>>
>>>>>>> No solution will function reliably for your use case, simply because
>>>>>>> all of them will have to do some sort of tail-f or streaming on a file and
>>>>>>> if they can't keep up with it (they mostly don't in high speed entry
>>>>>>> points), they will drop some entries.
>>>>>>> Please, be kind to yourself and plan for failures - if you need to
>>>>>>> restart Flume or any other solution then you'll face dropped entries that
>>>>>>> you'll not be able to re-ingest easily as in most cases you won't know
>>>>>>> which ones you've dropped.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ahmed
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks for comments Ahmed.
>>>>>>>>
>>>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>>>> reliable source option for use case provided by me.
>>>>>>>>
>>>>>>>> If flume can't provide it, can you help me with any other log
>>>>>>>> collector solutions which can I consider here to move real time data to
>>>>>>>> HDFS.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Then, you're out of luck in my opinion, as there is no way other
>>>>>>>>> than tail -f.
>>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off to
>>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>>
>>>>>>>>> There is a possibility to hack up the tail -f into another file
>>>>>>>>> and then custom-rotate that duplicate file.
>>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>>
>>>>>>>>> Just a side note - If you're operating Java application (Tomcat or
>>>>>>>>> similar), then you can create multiple output files via log4j.properties
>>>>>>>>> configuration without application itself knowing anything about it.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ahmed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Ahmed,
>>>>>>>>>>
>>>>>>>>>> Here in my case , the application will rename the existing file
>>>>>>>>>> as <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>>>>>>>
>>>>>>>>>> I can't change the log rotation policy of application for now.So
>>>>>>>>>> I guess I should rule out the option of using spooling directory source in
>>>>>>>>>> my case.
>>>>>>>>>>
>>>>>>>>>> Can you suggest me with any other options other than spooling dir
>>>>>>>>>> source.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> It all depends on how log rotation is done and how application
>>>>>>>>>>> producing the log file handles log rotation.
>>>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file when it
>>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some applications
>>>>>>>>>>> might restart as a result.
>>>>>>>>>>>
>>>>>>>>>>> If the application just reopens the log file, then you can
>>>>>>>>>>> change your log rotation policy to be per minute.
>>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll
>>>>>>>>>>> have to make a cron job to do it.
>>>>>>>>>>> In such case, you would separate finished logs location and live
>>>>>>>>>>> log location so the spooling directory source doesn't freak out about
>>>>>>>>>>> active log file being appended.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, spooling directory source is a way to go, as it will
>>>>>>>>>>> leave log files in place, just renamed.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ahmed
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>>>>
>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>
>>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>>
>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>
>>>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It appears like exec is not a reliable source , as we may data
>>>>>>>>>>>> loss if channel/source is down.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So i tried with other option "spooling directory source" which
>>>>>>>>>>>> is mentioned as reliable source.But here I have a single logfile where data
>>>>>>>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>>>>>>>> directory.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Can anyone help me with providing any other reliable source
>>>>>>>>>>>> option in case where logfile gets appended with data and logfile rotation
>>>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Saravana
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> ---------
>>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>>> communication complies with the law and company policies.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Best regards,
>>>>> Ahmed Vila | Senior software developer
>>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>>
>>>>> Office : +387 33 942 123
>>>>> Mobile: +387 62 139 348
>>>>>
>>>>> Website: www.devlogic.eu
>>>>> E-mail   : avila@devlogic.eu
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Need suggestion on reliable source for log processing

Posted by Ahmed Vila <av...@devlogic.eu>.
Exactly up to the point.




On Mon, Oct 27, 2014 at 1:57 PM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> That was a good point.
>
> So if a solution mention as guarantee data delivery , it specifies that
>  only in the case when the event flows into the source/producers
> successfully by application and then from that point the system guarantee
> the event delivery till other end sink/consumer.
>
> It has no control over the proper flow of event reaching the
> source/producer.(like data loss)
>
> So there always be chances of data loss when the system goes down , where
> certain tradeoff measures to be taken.
>
> On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>> Hi,
>>
>> Flume, Kafka, or any other system can only be responsible for it's own
>> actions. Looking from the perspective of the exec source in Flume - it
>> requests from the bash to give him an output from his stout. It cannot
>> control what bash will return.
>> Thus, it's not a file to him, but just a stream of text.
>>
>> When spooling directory source is in question, it will resume from the
>> file it failed with.
>> That reveals two approaches to event consumption: push and pull.
>>
>> When push approach is used then it cannot be aware of what comes next and
>> what was before it started to listen.
>>
>> Even so, some sources/producers, even they use pull approach, doesn't
>> have to know how to return to the last read event. It's up to
>> implementation.
>>
>> Regards,
>> Ahmed
>>
>>
>> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <
>> saran0081986@gmail.com> wrote:
>>
>>> yes , I agree .
>>>
>>> I think no logging solution like source in flume/producer in kafka  have
>>>  any marking feature like exact point till it consumed from logfile , to
>>> recover  incase of its failure to again start reading from the same point
>>> of the logfile.(before failure)
>>>
>>> This is the major point where failures were difficult to ignore.Am I
>>> right?
>>>
>>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>
>>>> Hi,
>>>>
>>>> You can use spillable channel that will store events in memory and once
>>>> it fills it, it will spill to the disk.
>>>> Also, you can use file channel, but it's as fast as your disk is and
>>>> it's suggested to use a separate disk for it due to high IO with it,
>>>> preferably an SSD.
>>>>
>>>> But, that will not solve the issue you might run into - if the flume
>>>> fails for whatever the reason, you'll never be able to continue from the
>>>> exact point where it failed.
>>>> Yes, File channel preserves the state, so it will continue with
>>>> whatever he already received, but what about the time while it was down ?
>>>>
>>>> If you cannot change anything regarding the application that produces
>>>> the logs, then such circumstance has to be taken as a trade off.
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>>> saran0081986@gmail.com> wrote:
>>>>
>>>>> Yes I understand the concerns with this use case.
>>>>>
>>>>> If so we need to configure failover in this scenario , can we have it
>>>>> like channel level ,sink channel.
>>>>>
>>>>> Does flume support to configure failover incase channel fills up.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> In fact, this is not the problem with Flume.
>>>>>>
>>>>>> No solution will function reliably for your use case, simply because
>>>>>> all of them will have to do some sort of tail-f or streaming on a file and
>>>>>> if they can't keep up with it (they mostly don't in high speed entry
>>>>>> points), they will drop some entries.
>>>>>> Please, be kind to yourself and plan for failures - if you need to
>>>>>> restart Flume or any other solution then you'll face dropped entries that
>>>>>> you'll not be able to re-ingest easily as in most cases you won't know
>>>>>> which ones you've dropped.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Ahmed
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>>> saran0081986@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks for comments Ahmed.
>>>>>>>
>>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>>> reliable source option for use case provided by me.
>>>>>>>
>>>>>>> If flume can't provide it, can you help me with any other log
>>>>>>> collector solutions which can I consider here to move real time data to
>>>>>>> HDFS.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Then, you're out of luck in my opinion, as there is no way other
>>>>>>>> than tail -f.
>>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off to
>>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>>
>>>>>>>> There is a possibility to hack up the tail -f into another file and
>>>>>>>> then custom-rotate that duplicate file.
>>>>>>>> But, I wouldn't recommend such case.
>>>>>>>>
>>>>>>>> Just a side note - If you're operating Java application (Tomcat or
>>>>>>>> similar), then you can create multiple output files via log4j.properties
>>>>>>>> configuration without application itself knowing anything about it.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ahmed
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Ahmed,
>>>>>>>>>
>>>>>>>>> Here in my case , the application will rename the existing file as
>>>>>>>>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>>>>>>
>>>>>>>>> I can't change the log rotation policy of application for now.So I
>>>>>>>>> guess I should rule out the option of using spooling directory source in my
>>>>>>>>> case.
>>>>>>>>>
>>>>>>>>> Can you suggest me with any other options other than spooling dir
>>>>>>>>> source.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> It all depends on how log rotation is done and how application
>>>>>>>>>> producing the log file handles log rotation.
>>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>>> receives a kill signal. For example, nginx reopens the log file when it
>>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some applications
>>>>>>>>>> might restart as a result.
>>>>>>>>>>
>>>>>>>>>> If the application just reopens the log file, then you can change
>>>>>>>>>> your log rotation policy to be per minute.
>>>>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll
>>>>>>>>>> have to make a cron job to do it.
>>>>>>>>>> In such case, you would separate finished logs location and live
>>>>>>>>>> log location so the spooling directory source doesn't freak out about
>>>>>>>>>> active log file being appended.
>>>>>>>>>>
>>>>>>>>>> Anyway, spooling directory source is a way to go, as it will
>>>>>>>>>> leave log files in place, just renamed.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ahmed
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>>>
>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>
>>>>>>>>>>> Channel:  file channel
>>>>>>>>>>>
>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>
>>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It appears like exec is not a reliable source , as we may data
>>>>>>>>>>> loss if channel/source is down.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So i tried with other option "spooling directory source" which
>>>>>>>>>>> is mentioned as reliable source.But here I have a single logfile where data
>>>>>>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>>>>>>> directory.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Can anyone help me with providing any other reliable source
>>>>>>>>>>> option in case where logfile gets appended with data and logfile rotation
>>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Saravana
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>> ---------
>>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>> communication complies with the law and company policies.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>> Ahmed Vila | Senior software developer
>>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>>
>>>> Office : +387 33 942 123
>>>> Mobile: +387 62 139 348
>>>>
>>>> Website: www.devlogic.eu
>>>> E-mail   : avila@devlogic.eu
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>>
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
That was a good point.

So if a solution mention as guarantee data delivery , it specifies that
 only in the case when the event flows into the source/producers
successfully by application and then from that point the system guarantee
the event delivery till other end sink/consumer.

It has no control over the proper flow of event reaching the
source/producer.(like data loss)

So there always be chances of data loss when the system goes down , where
certain tradeoff measures to be taken.

On Mon, Oct 27, 2014 at 6:06 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> Hi,
>
> Flume, Kafka, or any other system can only be responsible for it's own
> actions. Looking from the perspective of the exec source in Flume - it
> requests from the bash to give him an output from his stout. It cannot
> control what bash will return.
> Thus, it's not a file to him, but just a stream of text.
>
> When spooling directory source is in question, it will resume from the
> file it failed with.
> That reveals two approaches to event consumption: push and pull.
>
> When push approach is used then it cannot be aware of what comes next and
> what was before it started to listen.
>
> Even so, some sources/producers, even they use pull approach, doesn't have
> to know how to return to the last read event. It's up to implementation.
>
> Regards,
> Ahmed
>
>
> On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <saran0081986@gmail.com
> > wrote:
>
>> yes , I agree .
>>
>> I think no logging solution like source in flume/producer in kafka  have
>>  any marking feature like exact point till it consumed from logfile , to
>> recover  incase of its failure to again start reading from the same point
>> of the logfile.(before failure)
>>
>> This is the major point where failures were difficult to ignore.Am I
>> right?
>>
>> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> You can use spillable channel that will store events in memory and once
>>> it fills it, it will spill to the disk.
>>> Also, you can use file channel, but it's as fast as your disk is and
>>> it's suggested to use a separate disk for it due to high IO with it,
>>> preferably an SSD.
>>>
>>> But, that will not solve the issue you might run into - if the flume
>>> fails for whatever the reason, you'll never be able to continue from the
>>> exact point where it failed.
>>> Yes, File channel preserves the state, so it will continue with whatever
>>> he already received, but what about the time while it was down ?
>>>
>>> If you cannot change anything regarding the application that produces
>>> the logs, then such circumstance has to be taken as a trade off.
>>>
>>>
>>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> Yes I understand the concerns with this use case.
>>>>
>>>> If so we need to configure failover in this scenario , can we have it
>>>> like channel level ,sink channel.
>>>>
>>>> Does flume support to configure failover incase channel fills up.
>>>>
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> In fact, this is not the problem with Flume.
>>>>>
>>>>> No solution will function reliably for your use case, simply because
>>>>> all of them will have to do some sort of tail-f or streaming on a file and
>>>>> if they can't keep up with it (they mostly don't in high speed entry
>>>>> points), they will drop some entries.
>>>>> Please, be kind to yourself and plan for failures - if you need to
>>>>> restart Flume or any other solution then you'll face dropped entries that
>>>>> you'll not be able to re-ingest easily as in most cases you won't know
>>>>> which ones you've dropped.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Ahmed
>>>>>
>>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> Thanks for comments Ahmed.
>>>>>>
>>>>>> So from your comments , I consider that flume doesn't have any
>>>>>> reliable source option for use case provided by me.
>>>>>>
>>>>>> If flume can't provide it, can you help me with any other log
>>>>>> collector solutions which can I consider here to move real time data to
>>>>>> HDFS.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Then, you're out of luck in my opinion, as there is no way other
>>>>>>> than tail -f.
>>>>>>> The problem with fail-f is that tail will not wait for
>>>>>>> source/channel to keep up with it. If Cnannel is full it will back-off to
>>>>>>> the source and then the source will just stop ingesting.
>>>>>>>
>>>>>>> There is a possibility to hack up the tail -f into another file and
>>>>>>> then custom-rotate that duplicate file.
>>>>>>> But, I wouldn't recommend such case.
>>>>>>>
>>>>>>> Just a side note - If you're operating Java application (Tomcat or
>>>>>>> similar), then you can create multiple output files via log4j.properties
>>>>>>> configuration without application itself knowing anything about it.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ahmed
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>
>>>>>>>> Ahmed,
>>>>>>>>
>>>>>>>> Here in my case , the application will rename the existing file as
>>>>>>>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>>>>>
>>>>>>>> I can't change the log rotation policy of application for now.So I
>>>>>>>> guess I should rule out the option of using spooling directory source in my
>>>>>>>> case.
>>>>>>>>
>>>>>>>> Can you suggest me with any other options other than spooling dir
>>>>>>>> source.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> It all depends on how log rotation is done and how application
>>>>>>>>> producing the log file handles log rotation.
>>>>>>>>> Most of the applications just reopens the log file when it
>>>>>>>>> receives a kill signal. For example, nginx reopens the log file when it
>>>>>>>>> receives USR1 signal, but it doesn't stop the process. Some applications
>>>>>>>>> might restart as a result.
>>>>>>>>>
>>>>>>>>> If the application just reopens the log file, then you can change
>>>>>>>>> your log rotation policy to be per minute.
>>>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll
>>>>>>>>> have to make a cron job to do it.
>>>>>>>>> In such case, you would separate finished logs location and live
>>>>>>>>> log location so the spooling directory source doesn't freak out about
>>>>>>>>> active log file being appended.
>>>>>>>>>
>>>>>>>>> Anyway, spooling directory source is a way to go, as it will leave
>>>>>>>>> log files in place, just renamed.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ahmed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>>
>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>
>>>>>>>>>> Channel:  file channel
>>>>>>>>>>
>>>>>>>>>> Sink: HDFS
>>>>>>>>>>
>>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It appears like exec is not a reliable source , as we may data
>>>>>>>>>> loss if channel/source is down.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So i tried with other option "spooling directory source" which is
>>>>>>>>>> mentioned as reliable source.But here I have a single logfile where data
>>>>>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>>>>>> directory.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can anyone help me with providing any other reliable source
>>>>>>>>>> option in case where logfile gets appended with data and logfile rotation
>>>>>>>>>> happens only at the end of the day.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Saravana
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> ---------
>>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>>> communication complies with the law and company policies.
>>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Best regards,
>>> Ahmed Vila | Senior software developer
>>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>>
>>> Office : +387 33 942 123
>>> Mobile: +387 62 139 348
>>>
>>> Website: www.devlogic.eu
>>> E-mail   : avila@devlogic.eu
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Need suggestion on reliable source for log processing

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi,

Flume, Kafka, or any other system can only be responsible for it's own
actions. Looking from the perspective of the exec source in Flume - it
requests from the bash to give him an output from his stout. It cannot
control what bash will return.
Thus, it's not a file to him, but just a stream of text.

When spooling directory source is in question, it will resume from the file
it failed with.
That reveals two approaches to event consumption: push and pull.

When push approach is used then it cannot be aware of what comes next and
what was before it started to listen.

Even so, some sources/producers, even they use pull approach, doesn't have
to know how to return to the last read event. It's up to implementation.

Regards,
Ahmed


On Mon, Oct 27, 2014 at 12:48 PM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> yes , I agree .
>
> I think no logging solution like source in flume/producer in kafka  have
>  any marking feature like exact point till it consumed from logfile , to
> recover  incase of its failure to again start reading from the same point
> of the logfile.(before failure)
>
> This is the major point where failures were difficult to ignore.Am I right?
>
> On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>> Hi,
>>
>> You can use spillable channel that will store events in memory and once
>> it fills it, it will spill to the disk.
>> Also, you can use file channel, but it's as fast as your disk is and it's
>> suggested to use a separate disk for it due to high IO with it, preferably
>> an SSD.
>>
>> But, that will not solve the issue you might run into - if the flume
>> fails for whatever the reason, you'll never be able to continue from the
>> exact point where it failed.
>> Yes, File channel preserves the state, so it will continue with whatever
>> he already received, but what about the time while it was down ?
>>
>> If you cannot change anything regarding the application that produces the
>> logs, then such circumstance has to be taken as a trade off.
>>
>>
>> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <
>> saran0081986@gmail.com> wrote:
>>
>>> Yes I understand the concerns with this use case.
>>>
>>> If so we need to configure failover in this scenario , can we have it
>>> like channel level ,sink channel.
>>>
>>> Does flume support to configure failover incase channel fills up.
>>>
>>>
>>>
>>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>
>>>> Hi,
>>>>
>>>> In fact, this is not the problem with Flume.
>>>>
>>>> No solution will function reliably for your use case, simply because
>>>> all of them will have to do some sort of tail-f or streaming on a file and
>>>> if they can't keep up with it (they mostly don't in high speed entry
>>>> points), they will drop some entries.
>>>> Please, be kind to yourself and plan for failures - if you need to
>>>> restart Flume or any other solution then you'll face dropped entries that
>>>> you'll not be able to re-ingest easily as in most cases you won't know
>>>> which ones you've dropped.
>>>>
>>>>
>>>> Regards,
>>>> Ahmed
>>>>
>>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>>> saran0081986@gmail.com> wrote:
>>>>
>>>>> Thanks for comments Ahmed.
>>>>>
>>>>> So from your comments , I consider that flume doesn't have any
>>>>> reliable source option for use case provided by me.
>>>>>
>>>>> If flume can't provide it, can you help me with any other log
>>>>> collector solutions which can I consider here to move real time data to
>>>>> HDFS.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Then, you're out of luck in my opinion, as there is no way other than
>>>>>> tail -f.
>>>>>> The problem with fail-f is that tail will not wait for source/channel
>>>>>> to keep up with it. If Cnannel is full it will back-off to the source and
>>>>>> then the source will just stop ingesting.
>>>>>>
>>>>>> There is a possibility to hack up the tail -f into another file and
>>>>>> then custom-rotate that duplicate file.
>>>>>> But, I wouldn't recommend such case.
>>>>>>
>>>>>> Just a side note - If you're operating Java application (Tomcat or
>>>>>> similar), then you can create multiple output files via log4j.properties
>>>>>> configuration without application itself knowing anything about it.
>>>>>>
>>>>>> Regards,
>>>>>> Ahmed
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>>> saran0081986@gmail.com> wrote:
>>>>>>
>>>>>>> Ahmed,
>>>>>>>
>>>>>>> Here in my case , the application will rename the existing file as
>>>>>>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>>>>
>>>>>>> I can't change the log rotation policy of application for now.So I
>>>>>>> guess I should rule out the option of using spooling directory source in my
>>>>>>> case.
>>>>>>>
>>>>>>> Can you suggest me with any other options other than spooling dir
>>>>>>> source.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> It all depends on how log rotation is done and how application
>>>>>>>> producing the log file handles log rotation.
>>>>>>>> Most of the applications just reopens the log file when it receives
>>>>>>>> a kill signal. For example, nginx reopens the log file when it receives
>>>>>>>> USR1 signal, but it doesn't stop the process. Some applications might
>>>>>>>> restart as a result.
>>>>>>>>
>>>>>>>> If the application just reopens the log file, then you can change
>>>>>>>> your log rotation policy to be per minute.
>>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll
>>>>>>>> have to make a cron job to do it.
>>>>>>>> In such case, you would separate finished logs location and live
>>>>>>>> log location so the spooling directory source doesn't freak out about
>>>>>>>> active log file being appended.
>>>>>>>>
>>>>>>>> Anyway, spooling directory source is a way to go, as it will leave
>>>>>>>> log files in place, just renamed.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ahmed
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>>
>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>
>>>>>>>>> Channel:  file channel
>>>>>>>>>
>>>>>>>>> Sink: HDFS
>>>>>>>>>
>>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It appears like exec is not a reliable source , as we may data
>>>>>>>>> loss if channel/source is down.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So i tried with other option "spooling directory source" which is
>>>>>>>>> mentioned as reliable source.But here I have a single logfile where data
>>>>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>>>>> directory.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can anyone help me with providing any other reliable source option
>>>>>>>>> in case where logfile gets appended with data and logfile rotation happens
>>>>>>>>> only at the end of the day.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Saravana
>>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> This e-mail and any attachment is for authorised use by the
>>>>>>>> intended recipient(s) only. This email contains confidential information.
>>>>>>>> It should not be copied, disclosed to, retained or used by, any party other
>>>>>>>> than the intended recipient. Any unauthorised distribution, dissemination
>>>>>>>> or copying of this E-mail or its attachments, and/or any use of any
>>>>>>>> information contained in them, is strictly prohibited and may be illegal.
>>>>>>>> If you are not an intended recipient then please promptly delete this
>>>>>>>> e-mail and any attachment and all copies and inform the sender directly via
>>>>>>>> email. Any emails that you send to us may be monitored by systems or
>>>>>>>> persons other than the named communicant for the purposes of ascertaining
>>>>>>>> whether the communication complies with the law and company policies.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>> communication complies with the law and company policies.
>>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Best regards,
>> Ahmed Vila | Senior software developer
>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>
>> Office : +387 33 942 123
>> Mobile: +387 62 139 348
>>
>> Website: www.devlogic.eu
>> E-mail   : avila@devlogic.eu
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
yes , I agree .

I think no logging solution like source in flume/producer in kafka  have
 any marking feature like exact point till it consumed from logfile , to
recover  incase of its failure to again start reading from the same point
of the logfile.(before failure)

This is the major point where failures were difficult to ignore.Am I right?

On Mon, Oct 27, 2014 at 4:51 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> Hi,
>
> You can use spillable channel that will store events in memory and once it
> fills it, it will spill to the disk.
> Also, you can use file channel, but it's as fast as your disk is and it's
> suggested to use a separate disk for it due to high IO with it, preferably
> an SSD.
>
> But, that will not solve the issue you might run into - if the flume fails
> for whatever the reason, you'll never be able to continue from the exact
> point where it failed.
> Yes, File channel preserves the state, so it will continue with whatever
> he already received, but what about the time while it was down ?
>
> If you cannot change anything regarding the application that produces the
> logs, then such circumstance has to be taken as a trade off.
>
>
> On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <saran0081986@gmail.com
> > wrote:
>
>> Yes I understand the concerns with this use case.
>>
>> If so we need to configure failover in this scenario , can we have it
>> like channel level ,sink channel.
>>
>> Does flume support to configure failover incase channel fills up.
>>
>>
>>
>> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> In fact, this is not the problem with Flume.
>>>
>>> No solution will function reliably for your use case, simply because all
>>> of them will have to do some sort of tail-f or streaming on a file and if
>>> they can't keep up with it (they mostly don't in high speed entry points),
>>> they will drop some entries.
>>> Please, be kind to yourself and plan for failures - if you need to
>>> restart Flume or any other solution then you'll face dropped entries that
>>> you'll not be able to re-ingest easily as in most cases you won't know
>>> which ones you've dropped.
>>>
>>>
>>> Regards,
>>> Ahmed
>>>
>>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> Thanks for comments Ahmed.
>>>>
>>>> So from your comments , I consider that flume doesn't have any reliable
>>>> source option for use case provided by me.
>>>>
>>>> If flume can't provide it, can you help me with any other log collector
>>>> solutions which can I consider here to move real time data to HDFS.
>>>>
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Then, you're out of luck in my opinion, as there is no way other than
>>>>> tail -f.
>>>>> The problem with fail-f is that tail will not wait for source/channel
>>>>> to keep up with it. If Cnannel is full it will back-off to the source and
>>>>> then the source will just stop ingesting.
>>>>>
>>>>> There is a possibility to hack up the tail -f into another file and
>>>>> then custom-rotate that duplicate file.
>>>>> But, I wouldn't recommend such case.
>>>>>
>>>>> Just a side note - If you're operating Java application (Tomcat or
>>>>> similar), then you can create multiple output files via log4j.properties
>>>>> configuration without application itself knowing anything about it.
>>>>>
>>>>> Regards,
>>>>> Ahmed
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> Ahmed,
>>>>>>
>>>>>> Here in my case , the application will rename the existing file as
>>>>>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>>>
>>>>>> I can't change the log rotation policy of application for now.So I
>>>>>> guess I should rule out the option of using spooling directory source in my
>>>>>> case.
>>>>>>
>>>>>> Can you suggest me with any other options other than spooling dir
>>>>>> source.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> It all depends on how log rotation is done and how application
>>>>>>> producing the log file handles log rotation.
>>>>>>> Most of the applications just reopens the log file when it receives
>>>>>>> a kill signal. For example, nginx reopens the log file when it receives
>>>>>>> USR1 signal, but it doesn't stop the process. Some applications might
>>>>>>> restart as a result.
>>>>>>>
>>>>>>> If the application just reopens the log file, then you can change
>>>>>>> your log rotation policy to be per minute.
>>>>>>> In that case logrotate daemon won't satisfy such case, so you'll
>>>>>>> have to make a cron job to do it.
>>>>>>> In such case, you would separate finished logs location and live log
>>>>>>> location so the spooling directory source doesn't freak out about active
>>>>>>> log file being appended.
>>>>>>>
>>>>>>> Anyway, spooling directory source is a way to go, as it will leave
>>>>>>> log files in place, just renamed.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ahmed
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>>> saran0081986@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>>
>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>
>>>>>>>> Channel:  file channel
>>>>>>>>
>>>>>>>> Sink: HDFS
>>>>>>>>
>>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>>
>>>>>>>>
>>>>>>>> It appears like exec is not a reliable source , as we may data loss
>>>>>>>> if channel/source is down.
>>>>>>>>
>>>>>>>>
>>>>>>>> So i tried with other option "spooling directory source" which is
>>>>>>>> mentioned as reliable source.But here I have a single logfile where data
>>>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>>>> directory.
>>>>>>>>
>>>>>>>>
>>>>>>>> Can anyone help me with providing any other reliable source option
>>>>>>>> in case where logfile gets appended with data and logfile rotation happens
>>>>>>>> only at the end of the day.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Saravana
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>>> communication complies with the law and company policies.
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>>
>
>
> --
>
> Best regards,
> Ahmed Vila | Senior software developer
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
> Office : +387 33 942 123
> Mobile: +387 62 139 348
>
> Website: www.devlogic.eu
> E-mail   : avila@devlogic.eu
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Need suggestion on reliable source for log processing

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi,

You can use spillable channel that will store events in memory and once it
fills it, it will spill to the disk.
Also, you can use file channel, but it's as fast as your disk is and it's
suggested to use a separate disk for it due to high IO with it, preferably
an SSD.

But, that will not solve the issue you might run into - if the flume fails
for whatever the reason, you'll never be able to continue from the exact
point where it failed.
Yes, File channel preserves the state, so it will continue with whatever he
already received, but what about the time while it was down ?

If you cannot change anything regarding the application that produces the
logs, then such circumstance has to be taken as a trade off.


On Mon, Oct 27, 2014 at 12:09 PM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> Yes I understand the concerns with this use case.
>
> If so we need to configure failover in this scenario , can we have it like
> channel level ,sink channel.
>
> Does flume support to configure failover incase channel fills up.
>
>
>
> On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>> Hi,
>>
>> In fact, this is not the problem with Flume.
>>
>> No solution will function reliably for your use case, simply because all
>> of them will have to do some sort of tail-f or streaming on a file and if
>> they can't keep up with it (they mostly don't in high speed entry points),
>> they will drop some entries.
>> Please, be kind to yourself and plan for failures - if you need to
>> restart Flume or any other solution then you'll face dropped entries that
>> you'll not be able to re-ingest easily as in most cases you won't know
>> which ones you've dropped.
>>
>>
>> Regards,
>> Ahmed
>>
>> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <
>> saran0081986@gmail.com> wrote:
>>
>>> Thanks for comments Ahmed.
>>>
>>> So from your comments , I consider that flume doesn't have any reliable
>>> source option for use case provided by me.
>>>
>>> If flume can't provide it, can you help me with any other log collector
>>> solutions which can I consider here to move real time data to HDFS.
>>>
>>>
>>>
>>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>
>>>> Hi,
>>>>
>>>> Then, you're out of luck in my opinion, as there is no way other than
>>>> tail -f.
>>>> The problem with fail-f is that tail will not wait for source/channel
>>>> to keep up with it. If Cnannel is full it will back-off to the source and
>>>> then the source will just stop ingesting.
>>>>
>>>> There is a possibility to hack up the tail -f into another file and
>>>> then custom-rotate that duplicate file.
>>>> But, I wouldn't recommend such case.
>>>>
>>>> Just a side note - If you're operating Java application (Tomcat or
>>>> similar), then you can create multiple output files via log4j.properties
>>>> configuration without application itself knowing anything about it.
>>>>
>>>> Regards,
>>>> Ahmed
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>>> saran0081986@gmail.com> wrote:
>>>>
>>>>> Ahmed,
>>>>>
>>>>> Here in my case , the application will rename the existing file as
>>>>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>>
>>>>> I can't change the log rotation policy of application for now.So I
>>>>> guess I should rule out the option of using spooling directory source in my
>>>>> case.
>>>>>
>>>>> Can you suggest me with any other options other than spooling dir
>>>>> source.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It all depends on how log rotation is done and how application
>>>>>> producing the log file handles log rotation.
>>>>>> Most of the applications just reopens the log file when it receives a
>>>>>> kill signal. For example, nginx reopens the log file when it receives USR1
>>>>>> signal, but it doesn't stop the process. Some applications might restart as
>>>>>> a result.
>>>>>>
>>>>>> If the application just reopens the log file, then you can change
>>>>>> your log rotation policy to be per minute.
>>>>>> In that case logrotate daemon won't satisfy such case, so you'll have
>>>>>> to make a cron job to do it.
>>>>>> In such case, you would separate finished logs location and live log
>>>>>> location so the spooling directory source doesn't freak out about active
>>>>>> log file being appended.
>>>>>>
>>>>>> Anyway, spooling directory source is a way to go, as it will leave
>>>>>> log files in place, just renamed.
>>>>>>
>>>>>> Regards,
>>>>>> Ahmed
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>>> saran0081986@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>>
>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>
>>>>>>> Channel:  file channel
>>>>>>>
>>>>>>> Sink: HDFS
>>>>>>>
>>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>>
>>>>>>>
>>>>>>> It appears like exec is not a reliable source , as we may data loss
>>>>>>> if channel/source is down.
>>>>>>>
>>>>>>>
>>>>>>> So i tried with other option "spooling directory source" which is
>>>>>>> mentioned as reliable source.But here I have a single logfile where data
>>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>>> directory.
>>>>>>>
>>>>>>>
>>>>>>> Can anyone help me with providing any other reliable source option
>>>>>>> in case where logfile gets appended with data and logfile rotation happens
>>>>>>> only at the end of the day.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Saravana
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>>> recipient(s) only. This email contains confidential information. It should
>>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>>> emails that you send to us may be monitored by systems or persons other
>>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>>> communication complies with the law and company policies.
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>
>


-- 

Best regards,
Ahmed Vila | Senior software developer
DevLogic | Sarajevo | Bosnia and Herzegovina

Office : +387 33 942 123
Mobile: +387 62 139 348

Website: www.devlogic.eu
E-mail   : avila@devlogic.eu
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended
recipient(s) only. This email contains confidential information. It should
not be copied, disclosed to, retained or used by, any party other than the
intended recipient. Any unauthorised distribution, dissemination or copying
of this E-mail or its attachments, and/or any use of any information
contained in them, is strictly prohibited and may be illegal. If you are
not an intended recipient then please promptly delete this e-mail and any
attachment and all copies and inform the sender directly via email. Any
emails that you send to us may be monitored by systems or persons other
than the named communicant for the purposes of ascertaining whether the
communication complies with the law and company policies.

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
Yes I understand the concerns with this use case.

If so we need to configure failover in this scenario , can we have it like
channel level ,sink channel.

Does flume support to configure failover incase channel fills up.



On Mon, Oct 27, 2014 at 3:54 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> Hi,
>
> In fact, this is not the problem with Flume.
>
> No solution will function reliably for your use case, simply because all
> of them will have to do some sort of tail-f or streaming on a file and if
> they can't keep up with it (they mostly don't in high speed entry points),
> they will drop some entries.
> Please, be kind to yourself and plan for failures - if you need to restart
> Flume or any other solution then you'll face dropped entries that you'll
> not be able to re-ingest easily as in most cases you won't know which ones
> you've dropped.
>
>
> Regards,
> Ahmed
>
> On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <saran0081986@gmail.com
> > wrote:
>
>> Thanks for comments Ahmed.
>>
>> So from your comments , I consider that flume doesn't have any reliable
>> source option for use case provided by me.
>>
>> If flume can't provide it, can you help me with any other log collector
>> solutions which can I consider here to move real time data to HDFS.
>>
>>
>>
>> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> Then, you're out of luck in my opinion, as there is no way other than
>>> tail -f.
>>> The problem with fail-f is that tail will not wait for source/channel to
>>> keep up with it. If Cnannel is full it will back-off to the source and then
>>> the source will just stop ingesting.
>>>
>>> There is a possibility to hack up the tail -f into another file and then
>>> custom-rotate that duplicate file.
>>> But, I wouldn't recommend such case.
>>>
>>> Just a side note - If you're operating Java application (Tomcat or
>>> similar), then you can create multiple output files via log4j.properties
>>> configuration without application itself knowing anything about it.
>>>
>>> Regards,
>>> Ahmed
>>>
>>>
>>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> Ahmed,
>>>>
>>>> Here in my case , the application will rename the existing file as
>>>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>>
>>>> I can't change the log rotation policy of application for now.So I
>>>> guess I should rule out the option of using spooling directory source in my
>>>> case.
>>>>
>>>> Can you suggest me with any other options other than spooling dir
>>>> source.
>>>>
>>>> Thanks,
>>>>
>>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> It all depends on how log rotation is done and how application
>>>>> producing the log file handles log rotation.
>>>>> Most of the applications just reopens the log file when it receives a
>>>>> kill signal. For example, nginx reopens the log file when it receives USR1
>>>>> signal, but it doesn't stop the process. Some applications might restart as
>>>>> a result.
>>>>>
>>>>> If the application just reopens the log file, then you can change your
>>>>> log rotation policy to be per minute.
>>>>> In that case logrotate daemon won't satisfy such case, so you'll have
>>>>> to make a cron job to do it.
>>>>> In such case, you would separate finished logs location and live log
>>>>> location so the spooling directory source doesn't freak out about active
>>>>> log file being appended.
>>>>>
>>>>> Anyway, spooling directory source is a way to go, as it will leave log
>>>>> files in place, just renamed.
>>>>>
>>>>> Regards,
>>>>> Ahmed
>>>>>
>>>>>
>>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>>> saran0081986@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>>
>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>
>>>>>> Channel:  file channel
>>>>>>
>>>>>> Sink: HDFS
>>>>>>
>>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>>
>>>>>>
>>>>>> It appears like exec is not a reliable source , as we may data loss
>>>>>> if channel/source is down.
>>>>>>
>>>>>>
>>>>>> So i tried with other option "spooling directory source" which is
>>>>>> mentioned as reliable source.But here I have a single logfile where data
>>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>>> directory.
>>>>>>
>>>>>>
>>>>>> Can anyone help me with providing any other reliable source option in
>>>>>> case where logfile gets appended with data and logfile rotation happens
>>>>>> only at the end of the day.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Saravana
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> This e-mail and any attachment is for authorised use by the intended
>>>>> recipient(s) only. This email contains confidential information. It should
>>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>>> of this E-mail or its attachments, and/or any use of any information
>>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>>> not an intended recipient then please promptly delete this e-mail and any
>>>>> attachment and all copies and inform the sender directly via email. Any
>>>>> emails that you send to us may be monitored by systems or persons other
>>>>> than the named communicant for the purposes of ascertaining whether the
>>>>> communication complies with the law and company policies.
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>>
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Need suggestion on reliable source for log processing

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi,

In fact, this is not the problem with Flume.

No solution will function reliably for your use case, simply because all of
them will have to do some sort of tail-f or streaming on a file and if they
can't keep up with it (they mostly don't in high speed entry points), they
will drop some entries.
Please, be kind to yourself and plan for failures - if you need to restart
Flume or any other solution then you'll face dropped entries that you'll
not be able to re-ingest easily as in most cases you won't know which ones
you've dropped.


Regards,
Ahmed

On Mon, Oct 27, 2014 at 11:13 AM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> Thanks for comments Ahmed.
>
> So from your comments , I consider that flume doesn't have any reliable
> source option for use case provided by me.
>
> If flume can't provide it, can you help me with any other log collector
> solutions which can I consider here to move real time data to HDFS.
>
>
>
> On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>> Hi,
>>
>> Then, you're out of luck in my opinion, as there is no way other than
>> tail -f.
>> The problem with fail-f is that tail will not wait for source/channel to
>> keep up with it. If Cnannel is full it will back-off to the source and then
>> the source will just stop ingesting.
>>
>> There is a possibility to hack up the tail -f into another file and then
>> custom-rotate that duplicate file.
>> But, I wouldn't recommend such case.
>>
>> Just a side note - If you're operating Java application (Tomcat or
>> similar), then you can create multiple output files via log4j.properties
>> configuration without application itself knowing anything about it.
>>
>> Regards,
>> Ahmed
>>
>>
>> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <
>> saran0081986@gmail.com> wrote:
>>
>>> Ahmed,
>>>
>>> Here in my case , the application will rename the existing file as
>>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>>
>>> I can't change the log rotation policy of application for now.So I guess
>>> I should rule out the option of using spooling directory source in my case.
>>>
>>> Can you suggest me with any other options other than spooling dir source.
>>>
>>> Thanks,
>>>
>>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>>
>>>> Hi,
>>>>
>>>> It all depends on how log rotation is done and how application
>>>> producing the log file handles log rotation.
>>>> Most of the applications just reopens the log file when it receives a
>>>> kill signal. For example, nginx reopens the log file when it receives USR1
>>>> signal, but it doesn't stop the process. Some applications might restart as
>>>> a result.
>>>>
>>>> If the application just reopens the log file, then you can change your
>>>> log rotation policy to be per minute.
>>>> In that case logrotate daemon won't satisfy such case, so you'll have
>>>> to make a cron job to do it.
>>>> In such case, you would separate finished logs location and live log
>>>> location so the spooling directory source doesn't freak out about active
>>>> log file being appended.
>>>>
>>>> Anyway, spooling directory source is a way to go, as it will leave log
>>>> files in place, just renamed.
>>>>
>>>> Regards,
>>>> Ahmed
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>>> saran0081986@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>>
>>>>> Source:exec , tail –F command for a logfile.
>>>>>
>>>>> Channel:  file channel
>>>>>
>>>>> Sink: HDFS
>>>>>
>>>>> Use case:to move real time data from logfile to HDFS.
>>>>>
>>>>>
>>>>> It appears like exec is not a reliable source , as we may data loss if
>>>>> channel/source is down.
>>>>>
>>>>>
>>>>> So i tried with other option "spooling directory source" which is
>>>>> mentioned as reliable source.But here I have a single logfile where data
>>>>> gets appended in , so I dont see option of moving the file to spool
>>>>> directory.
>>>>>
>>>>>
>>>>> Can anyone help me with providing any other reliable source option in
>>>>> case where logfile gets appended with data and logfile rotation happens
>>>>> only at the end of the day.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Saravana
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. This email contains confidential information. It should
>>>> not be copied, disclosed to, retained or used by, any party other than the
>>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>>> of this E-mail or its attachments, and/or any use of any information
>>>> contained in them, is strictly prohibited and may be illegal. If you are
>>>> not an intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender directly via email. Any
>>>> emails that you send to us may be monitored by systems or persons other
>>>> than the named communicant for the purposes of ascertaining whether the
>>>> communication complies with the law and company policies.
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
Thanks for comments Ahmed.

So from your comments , I consider that flume doesn't have any reliable
source option for use case provided by me.

If flume can't provide it, can you help me with any other log collector
solutions which can I consider here to move real time data to HDFS.



On Mon, Oct 27, 2014 at 3:37 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> Hi,
>
> Then, you're out of luck in my opinion, as there is no way other than tail
> -f.
> The problem with fail-f is that tail will not wait for source/channel to
> keep up with it. If Cnannel is full it will back-off to the source and then
> the source will just stop ingesting.
>
> There is a possibility to hack up the tail -f into another file and then
> custom-rotate that duplicate file.
> But, I wouldn't recommend such case.
>
> Just a side note - If you're operating Java application (Tomcat or
> similar), then you can create multiple output files via log4j.properties
> configuration without application itself knowing anything about it.
>
> Regards,
> Ahmed
>
>
> On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <saran0081986@gmail.com
> > wrote:
>
>> Ahmed,
>>
>> Here in my case , the application will rename the existing file as
>> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>>
>> I can't change the log rotation policy of application for now.So I guess
>> I should rule out the option of using spooling directory source in my case.
>>
>> Can you suggest me with any other options other than spooling dir source.
>>
>> Thanks,
>>
>> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>
>>> Hi,
>>>
>>> It all depends on how log rotation is done and how application producing
>>> the log file handles log rotation.
>>> Most of the applications just reopens the log file when it receives a
>>> kill signal. For example, nginx reopens the log file when it receives USR1
>>> signal, but it doesn't stop the process. Some applications might restart as
>>> a result.
>>>
>>> If the application just reopens the log file, then you can change your
>>> log rotation policy to be per minute.
>>> In that case logrotate daemon won't satisfy such case, so you'll have to
>>> make a cron job to do it.
>>> In such case, you would separate finished logs location and live log
>>> location so the spooling directory source doesn't freak out about active
>>> log file being appended.
>>>
>>> Anyway, spooling directory source is a way to go, as it will leave log
>>> files in place, just renamed.
>>>
>>> Regards,
>>> Ahmed
>>>
>>>
>>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>>> saran0081986@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>>
>>>> Source:exec , tail –F command for a logfile.
>>>>
>>>> Channel:  file channel
>>>>
>>>> Sink: HDFS
>>>>
>>>> Use case:to move real time data from logfile to HDFS.
>>>>
>>>>
>>>> It appears like exec is not a reliable source , as we may data loss if
>>>> channel/source is down.
>>>>
>>>>
>>>> So i tried with other option "spooling directory source" which is
>>>> mentioned as reliable source.But here I have a single logfile where data
>>>> gets appended in , so I dont see option of moving the file to spool
>>>> directory.
>>>>
>>>>
>>>> Can anyone help me with providing any other reliable source option in
>>>> case where logfile gets appended with data and logfile rotation happens
>>>> only at the end of the day.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Saravana
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. This email contains confidential information. It should
>>> not be copied, disclosed to, retained or used by, any party other than the
>>> intended recipient. Any unauthorised distribution, dissemination or copying
>>> of this E-mail or its attachments, and/or any use of any information
>>> contained in them, is strictly prohibited and may be illegal. If you are
>>> not an intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender directly via email. Any
>>> emails that you send to us may be monitored by systems or persons other
>>> than the named communicant for the purposes of ascertaining whether the
>>> communication complies with the law and company policies.
>>
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Need suggestion on reliable source for log processing

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi,

Then, you're out of luck in my opinion, as there is no way other than tail
-f.
The problem with fail-f is that tail will not wait for source/channel to
keep up with it. If Cnannel is full it will back-off to the source and then
the source will just stop ingesting.

There is a possibility to hack up the tail -f into another file and then
custom-rotate that duplicate file.
But, I wouldn't recommend such case.

Just a side note - If you're operating Java application (Tomcat or
similar), then you can create multiple output files via log4j.properties
configuration without application itself knowing anything about it.

Regards,
Ahmed


On Mon, Oct 27, 2014 at 10:56 AM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> Ahmed,
>
> Here in my case , the application will rename the existing file as
> <logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.
>
> I can't change the log rotation policy of application for now.So I guess I
> should rule out the option of using spooling directory source in my case.
>
> Can you suggest me with any other options other than spooling dir source.
>
> Thanks,
>
> On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>> Hi,
>>
>> It all depends on how log rotation is done and how application producing
>> the log file handles log rotation.
>> Most of the applications just reopens the log file when it receives a
>> kill signal. For example, nginx reopens the log file when it receives USR1
>> signal, but it doesn't stop the process. Some applications might restart as
>> a result.
>>
>> If the application just reopens the log file, then you can change your
>> log rotation policy to be per minute.
>> In that case logrotate daemon won't satisfy such case, so you'll have to
>> make a cron job to do it.
>> In such case, you would separate finished logs location and live log
>> location so the spooling directory source doesn't freak out about active
>> log file being appended.
>>
>> Anyway, spooling directory source is a way to go, as it will leave log
>> files in place, just renamed.
>>
>> Regards,
>> Ahmed
>>
>>
>> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <
>> saran0081986@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>>
>>> Source:exec , tail –F command for a logfile.
>>>
>>> Channel:  file channel
>>>
>>> Sink: HDFS
>>>
>>> Use case:to move real time data from logfile to HDFS.
>>>
>>>
>>> It appears like exec is not a reliable source , as we may data loss if
>>> channel/source is down.
>>>
>>>
>>> So i tried with other option "spooling directory source" which is
>>> mentioned as reliable source.But here I have a single logfile where data
>>> gets appended in , so I dont see option of moving the file to spool
>>> directory.
>>>
>>>
>>> Can anyone help me with providing any other reliable source option in
>>> case where logfile gets appended with data and logfile rotation happens
>>> only at the end of the day.
>>>
>>>
>>> Thanks,
>>>
>>> Saravana
>>>
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>
>

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

Re: Need suggestion on reliable source for log processing

Posted by SaravanaKumar TR <sa...@gmail.com>.
Ahmed,

Here in my case , the application will rename the existing file as
<logfile>.yesterdaydate and create a new file as <logfile> at 00:00 AM.

I can't change the log rotation policy of application for now.So I guess I
should rule out the option of using spooling directory source in my case.

Can you suggest me with any other options other than spooling dir source.

Thanks,

On Mon, Oct 27, 2014 at 3:10 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> Hi,
>
> It all depends on how log rotation is done and how application producing
> the log file handles log rotation.
> Most of the applications just reopens the log file when it receives a kill
> signal. For example, nginx reopens the log file when it receives USR1
> signal, but it doesn't stop the process. Some applications might restart as
> a result.
>
> If the application just reopens the log file, then you can change your log
> rotation policy to be per minute.
> In that case logrotate daemon won't satisfy such case, so you'll have to
> make a cron job to do it.
> In such case, you would separate finished logs location and live log
> location so the spooling directory source doesn't freak out about active
> log file being appended.
>
> Anyway, spooling directory source is a way to go, as it will leave log
> files in place, just renamed.
>
> Regards,
> Ahmed
>
>
> On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <saran0081986@gmail.com
> > wrote:
>
>> Hi,
>>
>> I am using Apache flume 1.5.0.Quick setup explanation here.
>>
>> Source:exec , tail –F command for a logfile.
>>
>> Channel:  file channel
>>
>> Sink: HDFS
>>
>> Use case:to move real time data from logfile to HDFS.
>>
>>
>> It appears like exec is not a reliable source , as we may data loss if
>> channel/source is down.
>>
>>
>> So i tried with other option "spooling directory source" which is
>> mentioned as reliable source.But here I have a single logfile where data
>> gets appended in , so I dont see option of moving the file to spool
>> directory.
>>
>>
>> Can anyone help me with providing any other reliable source option in
>> case where logfile gets appended with data and logfile rotation happens
>> only at the end of the day.
>>
>>
>> Thanks,
>>
>> Saravana
>>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.

Re: Need suggestion on reliable source for log processing

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi,

It all depends on how log rotation is done and how application producing
the log file handles log rotation.
Most of the applications just reopens the log file when it receives a kill
signal. For example, nginx reopens the log file when it receives USR1
signal, but it doesn't stop the process. Some applications might restart as
a result.

If the application just reopens the log file, then you can change your log
rotation policy to be per minute.
In that case logrotate daemon won't satisfy such case, so you'll have to
make a cron job to do it.
In such case, you would separate finished logs location and live log
location so the spooling directory source doesn't freak out about active
log file being appended.

Anyway, spooling directory source is a way to go, as it will leave log
files in place, just renamed.

Regards,
Ahmed


On Mon, Oct 27, 2014 at 10:21 AM, SaravanaKumar TR <sa...@gmail.com>
wrote:

> Hi,
>
> I am using Apache flume 1.5.0.Quick setup explanation here.
>
> Source:exec , tail –F command for a logfile.
>
> Channel:  file channel
>
> Sink: HDFS
>
> Use case:to move real time data from logfile to HDFS.
>
>
> It appears like exec is not a reliable source , as we may data loss if
> channel/source is down.
>
>
> So i tried with other option "spooling directory source" which is
> mentioned as reliable source.But here I have a single logfile where data
> gets appended in , so I dont see option of moving the file to spool
> directory.
>
>
> Can anyone help me with providing any other reliable source option in case
> where logfile gets appended with data and logfile rotation happens only at
> the end of the day.
>
>
> Thanks,
>
> Saravana
>

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.