You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Manohar CS <Ma...@itcinfotech.com> on 2014/12/05 13:19:01 UTC

Notification support from flume?

Hi All,

I wanted to know if there is a way of notification mechanism or some way of finding out if flume has finished transfer of certain file from spoolDir to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can assume its completed but wanted to know if there is more reliable way of call back mechanism ?


Thanks,
Manohar.
Please consider the environment before printing this e-mail

Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.

Re: Notification support from flume?

Posted by Ahmed Vila <av...@devlogic.eu>.
Unfortunately, there is no simple mechanism.
There are one thing that makes things complicated - file to transfer into
the HDFS consists of arbitrary number of events and as soon SpoolDir starts
to read file it fills the channel with them.

This solution sounds like a hack, but it can be implemented with least
amount of software development and it's probably the best one:
You can use fileHeaderKey param for SpoolDir Source that adds file's full
path to the event header.
Also, you'll need to store event header into HDFS.
Then, you can do line count for a file and compare to number of events in
HDFS that has the given header.




On Fri, Dec 5, 2014 at 1:47 PM, Manohar CS <Ma...@itcinfotech.com>
wrote:

>  Thanks for the response. With these monitoring JSON result we can basic
> metrics such as no.of events processed. But I have custom requirement to
> signify completion of transfer of file.
>
>
>
> *From:* Ahmed Vila [mailto:avila@devlogic.eu]
> *Sent:* Friday, December 5, 2014 5:59 PM
> *To:* user@flume.apache.org
> *Subject:* Re: Notification support from flume?
>
>
>
> Hi Manohar,
>
>
>
> You can turn Flume monitoring web server that reports the numbers as JSON.
>
>
> http://archive.cloudera.com/cdh/3/flume-ng/FlumeUserGuide.html#json-reporting
>
>
>
> It explains that flume-ng agentshould be started with parameters
> "-Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34545", but there is
> no info on how to configure those params in packed Flume for f.e. Ubuntu.
>
>
>
> For ubunti package that you install via apt-get, you can
> change /etc/flume-ng/conf/flume-env.sh and add those params into JAVA_OPTS
> variable export... something like this:
>
> export JAVA_OPTS="-Xms256m -Xmx512m -Dflume.monitoring.type=HTTP
> -Dflume.monitoring.port=34545"
>
>
>
>
>
>
>
> On Fri, Dec 5, 2014 at 1:19 PM, Manohar CS <Ma...@itcinfotech.com>
> wrote:
>
>  Hi All,
>
>
>
> I wanted to know if there is a way of notification mechanism or some way
> of finding out if flume has finished transfer of certain file from spoolDir
> to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can
> assume its completed but wanted to know if there is more reliable way of
> call back mechanism ?
>
>
>
>
>
> Thanks,
>
> Manohar.
>
>
>
>
> Please consider the environment before printing this e-mail
>
>
> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>
>
>
>
>
> --
>
> Best regards,
>
> Ahmed Vila | Senior software developer
>
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
>
>
> Office : +387 33 942 123
>
> Mobile: +387 62 139 348
>
>
>
> Website: www.devlogic.eu
>
> E-mail   : avila@devlogic.eu
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
>
>
> Please consider the environment before printing this e-mail
>
>
> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

RE: Notification support from flume?

Posted by Manohar CS <Ma...@itcinfotech.com>.
Thanks for the response. With these monitoring JSON result we can basic metrics such as no.of events processed. But I have custom requirement to signify completion of transfer of file.

From: Ahmed Vila [mailto:avila@devlogic.eu]
Sent: Friday, December 5, 2014 5:59 PM
To: user@flume.apache.org
Subject: Re: Notification support from flume?

Hi Manohar,

You can turn Flume monitoring web server that reports the numbers as JSON.
http://archive.cloudera.com/cdh/3/flume-ng/FlumeUserGuide.html#json-reporting

It explains that flume-ng agentshould be started with parameters "-Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34545", but there is no info on how to configure those params in packed Flume for f.e. Ubuntu.

For ubunti package that you install via apt-get, you can change /etc/flume-ng/conf/flume-env.sh and add those params into JAVA_OPTS variable export... something like this:
export JAVA_OPTS="-Xms256m -Xmx512m -Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34545"



On Fri, Dec 5, 2014 at 1:19 PM, Manohar CS <Ma...@itcinfotech.com>> wrote:
Hi All,

I wanted to know if there is a way of notification mechanism or some way of finding out if flume has finished transfer of certain file from spoolDir to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can assume its completed but wanted to know if there is more reliable way of call back mechanism ?


Thanks,
Manohar.


Please consider the environment before printing this e-mail

Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.




--

Best regards,
Ahmed Vila | Senior software developer
DevLogic | Sarajevo | Bosnia and Herzegovina

Office : +387 33 942 123
Mobile: +387 62 139 348

Website: www.devlogic.eu<http://www.devlogic.eu>
E-mail   : avila@devlogic.eu<ma...@devlogic.eu>
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended recipient(s) only. This email contains confidential information. It should not be copied, disclosed to, retained or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination or copying of this E-mail or its attachments, and/or any use of any information contained in them, is strictly prohibited and may be illegal. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender directly via email. Any emails that you send to us may be monitored by systems or persons other than the named communicant for the purposes of ascertaining whether the communication complies with the law and company policies.

---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended recipient(s) only. This email contains confidential information. It should not be copied, disclosed to, retained or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination or copying of this E-mail or its attachments, and/or any use of any information contained in them, is strictly prohibited and may be illegal. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender directly via email. Any emails that you send to us may be monitored by systems or persons other than the named communicant for the purposes of ascertaining whether the communication complies with the law and company policies.
Please consider the environment before printing this e-mail

Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.

Re: Notification support from flume?

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi Manohar,

You can turn Flume monitoring web server that reports the numbers as JSON.
http://archive.cloudera.com/cdh/3/flume-ng/FlumeUserGuide.html#json-reporting

It explains that flume-ng agentshould be started with parameters
"-Dflume.monitoring.type=HTTP -Dflume.monitoring.port=34545", but there is
no info on how to configure those params in packed Flume for f.e. Ubuntu.

For ubunti package that you install via apt-get, you can
change /etc/flume-ng/conf/flume-env.sh and add those params into JAVA_OPTS
variable export... something like this:
export JAVA_OPTS="-Xms256m -Xmx512m -Dflume.monitoring.type=HTTP
-Dflume.monitoring.port=34545"



On Fri, Dec 5, 2014 at 1:19 PM, Manohar CS <Ma...@itcinfotech.com>
wrote:

>  Hi All,
>
>
>
> I wanted to know if there is a way of notification mechanism or some way
> of finding out if flume has finished transfer of certain file from spoolDir
> to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can
> assume its completed but wanted to know if there is more reliable way of
> call back mechanism ?
>
>
>
>
>
> Thanks,
>
> Manohar.
>
>
>
> Please consider the environment before printing this e-mail
>
>
> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>



-- 

Best regards,
Ahmed Vila | Senior software developer
DevLogic | Sarajevo | Bosnia and Herzegovina

Office : +387 33 942 123
Mobile: +387 62 139 348

Website: www.devlogic.eu
E-mail   : avila@devlogic.eu
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended
recipient(s) only. This email contains confidential information. It should
not be copied, disclosed to, retained or used by, any party other than the
intended recipient. Any unauthorised distribution, dissemination or copying
of this E-mail or its attachments, and/or any use of any information
contained in them, is strictly prohibited and may be illegal. If you are
not an intended recipient then please promptly delete this e-mail and any
attachment and all copies and inform the sender directly via email. Any
emails that you send to us may be monitored by systems or persons other
than the named communicant for the purposes of ascertaining whether the
communication complies with the law and company policies.

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

Re: Notification support from flume?

Posted by Arvind Prabhakar <ar...@streamsets.com>.
>
> You mentioned “You may be better off using something like an Oozie action
> to trigger a job when the dataset is complete. ”  - But how will I know
> that flume completed its transfer (moreover we want this to happen at
> regular intervals)


You could set Flume HDFS sink to roll the files based on the size, or the
number or records entered etc. Of course this is not going to be the same
as File copy from the original to the destination - and hence my earlier
statement that Flume is not suitable out of the box for file transfers...

Regards,
Arvind





On Mon, Dec 8, 2014 at 1:41 AM, Ahmed Vila <av...@devlogic.eu> wrote:

> Hi Arvin,
>
> Thanks for pointing out what is already implemented.
>
>
>
> On Mon, Dec 8, 2014 at 7:43 AM, Manohar CS <Ma...@itcinfotech.com>
> wrote:
>
>>  Hi Aravind,
>>
>>
>>
>> You mentioned “You may be better off using something like an Oozie
>> action to trigger a job when the dataset is complete. ”  - But how will
>> I know that flume completed its transfer (moreover we want this to happen
>> at regular intervals)
>>
>>
>>
>> Thanks,
>>
>> Manohar
>>
>>
>>
>> *From:* Arvind Prabhakar [mailto:arvind@streamsets.com]
>> *Sent:* Monday, December 8, 2014 11:42 AM
>> *To:* user@flume.apache.org
>>
>> *Subject:* Re: Notification support from flume?
>>
>>
>>
>> Flume is not suited for file transfers as such. With that, please see my
>> comments below:
>>
>>
>>
>> - support for variable transaction size that could be set by the source
>> or interceptor
>>
>>
>>
>> The transactions are already variable sized. The only configuration that
>> applies on top is the maximum size of a transaction. How is this different
>> from what you are proposing?
>>
>>
>>
>>
>>
>>  - SpoolDir to support creation of one transaction per file
>>
>>
>>
>> If the file is large, you would run out of heap space quickly. Also, how
>> do you recover from intermittent failures?
>>
>>
>>
>>  - File and Memory channels to support spawning a process on transaction
>> successful commit. Such process can be a bash script, but that would be
>> implemented in plug-able class
>>
>>
>>
>> You may be better off using something like an Oozie action to trigger a
>> job when the dataset is complete.
>>
>>
>>
>> Regards,
>>
>> Arvind
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Dec 7, 2014 at 12:55 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>>
>>  Hi group,
>>
>>
>>
>> Manohar's requirements sound valid. Guess there are other cases such
>> "completion notification" could come in handy.
>>
>>
>>
>> Thus, I would propose these distinct features that would make this
>> possible via configuration:
>>
>>  - support for variable transaction size that could be set by the source
>> or interceptor
>>
>>  - SpoolDir to support creation of one transaction per file
>>
>>  - File and Memory channels to support spawning a process on transaction
>> successful commit. Such process can be a bash script, but that would be
>> implemented in plug-able class
>>
>>
>>
>> The one thing I'm not sure about until I look at the code, if HDFSSink
>> will write flush cache to the HDFS once it encounters no more events in a
>> transaction.
>>
>>
>>
>> What do you guys think ?
>>
>>
>>
>>
>>
>> On Sat, Dec 6, 2014 at 7:31 AM, Manohar CS <Ma...@itcinfotech.com>
>> wrote:
>>
>>  Thanks Hari for your response.
>>
>>
>>
>> My requirement goes like this -
>>
>>
>>
>> 1) There are bunch of files coming in at regular intervals (hourly or
>> daily) in my spoolDir
>>
>> 2) I wan tthem to be moved into HDFS via HDFS sink using reg-ex like
>> /target/%Y-%M%D so each day file gets into different destination HDFS
>>
>> 3) Now once this flume completes copying files , I want to kick off my MR
>> job.
>>
>>
>>
>> Thanks,
>>
>> Manohar
>>   ------------------------------
>>
>> *From:* Hari Shreedharan <hs...@cloudera.com>
>> *Sent:* Saturday, December 6, 2014 7:16 AM
>> *To:* user@flume.apache.org
>> *Cc:* user@flume.apache.org
>> *Subject:* Re: Notification support from flume?
>>
>>
>>
>> Looking at .COMPLETED is not an indication that the data has been written
>> out to HDFS. As of now, unfortunately there is no way to tag an event as
>> coming from a specific file. I can’t think of a way to do this in a
>> fool-proof way off the top of my mind. What is your use-case, there might
>> be another way to do the same thing?
>>
>>
>> Thanks,
>> Hari
>>
>>
>>
>> On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <Ma...@itcinfotech.com>
>> wrote:
>>
>>  Hi All,
>>
>>
>>
>> I wanted to know if there is a way of notification mechanism or some way
>> of finding out if flume has finished transfer of certain file from spoolDir
>> to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can
>> assume its completed but wanted to know if there is more reliable way of
>> call back mechanism ?
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Manohar.
>>
>>
>>
>>
>> Please consider the environment before printing this e-mail
>>
>>
>> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>>
>>
>>
>>
>>
>>
>> Please consider the environment before printing this e-mail
>>
>>
>> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>>
>>
>>
>>
>>
>> --
>>
>> Best regards,
>>
>> Ahmed Vila | Senior software developer
>>
>> DevLogic | Sarajevo | Bosnia and Herzegovina
>>
>>
>>
>> Office : +387 33 942 123
>>
>> Mobile: +387 62 139 348
>>
>>
>>
>> Website: www.devlogic.eu
>>
>> E-mail   : avila@devlogic.eu
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>>
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. This email contains confidential information. It should
>> not be copied, disclosed to, retained or used by, any party other than the
>> intended recipient. Any unauthorised distribution, dissemination or copying
>> of this E-mail or its attachments, and/or any use of any information
>> contained in them, is strictly prohibited and may be illegal. If you are
>> not an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender directly via email. Any
>> emails that you send to us may be monitored by systems or persons other
>> than the named communicant for the purposes of ascertaining whether the
>> communication complies with the law and company policies.
>>
>>
>>
>>
>>
>> Please consider the environment before printing this e-mail
>>
>>
>> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>>
>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Notification support from flume?

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi Arvin,

Thanks for pointing out what is already implemented.



On Mon, Dec 8, 2014 at 7:43 AM, Manohar CS <Ma...@itcinfotech.com>
wrote:

>  Hi Aravind,
>
>
>
> You mentioned “You may be better off using something like an Oozie action
> to trigger a job when the dataset is complete. ”  - But how will I know
> that flume completed its transfer (moreover we want this to happen at
> regular intervals)
>
>
>
> Thanks,
>
> Manohar
>
>
>
> *From:* Arvind Prabhakar [mailto:arvind@streamsets.com]
> *Sent:* Monday, December 8, 2014 11:42 AM
> *To:* user@flume.apache.org
>
> *Subject:* Re: Notification support from flume?
>
>
>
> Flume is not suited for file transfers as such. With that, please see my
> comments below:
>
>
>
> - support for variable transaction size that could be set by the source or
> interceptor
>
>
>
> The transactions are already variable sized. The only configuration that
> applies on top is the maximum size of a transaction. How is this different
> from what you are proposing?
>
>
>
>
>
>  - SpoolDir to support creation of one transaction per file
>
>
>
> If the file is large, you would run out of heap space quickly. Also, how
> do you recover from intermittent failures?
>
>
>
>  - File and Memory channels to support spawning a process on transaction
> successful commit. Such process can be a bash script, but that would be
> implemented in plug-able class
>
>
>
> You may be better off using something like an Oozie action to trigger a
> job when the dataset is complete.
>
>
>
> Regards,
>
> Arvind
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Dec 7, 2014 at 12:55 PM, Ahmed Vila <av...@devlogic.eu> wrote:
>
>  Hi group,
>
>
>
> Manohar's requirements sound valid. Guess there are other cases such
> "completion notification" could come in handy.
>
>
>
> Thus, I would propose these distinct features that would make this
> possible via configuration:
>
>  - support for variable transaction size that could be set by the source
> or interceptor
>
>  - SpoolDir to support creation of one transaction per file
>
>  - File and Memory channels to support spawning a process on transaction
> successful commit. Such process can be a bash script, but that would be
> implemented in plug-able class
>
>
>
> The one thing I'm not sure about until I look at the code, if HDFSSink
> will write flush cache to the HDFS once it encounters no more events in a
> transaction.
>
>
>
> What do you guys think ?
>
>
>
>
>
> On Sat, Dec 6, 2014 at 7:31 AM, Manohar CS <Ma...@itcinfotech.com>
> wrote:
>
>  Thanks Hari for your response.
>
>
>
> My requirement goes like this -
>
>
>
> 1) There are bunch of files coming in at regular intervals (hourly or
> daily) in my spoolDir
>
> 2) I wan tthem to be moved into HDFS via HDFS sink using reg-ex like
> /target/%Y-%M%D so each day file gets into different destination HDFS
>
> 3) Now once this flume completes copying files , I want to kick off my MR
> job.
>
>
>
> Thanks,
>
> Manohar
>   ------------------------------
>
> *From:* Hari Shreedharan <hs...@cloudera.com>
> *Sent:* Saturday, December 6, 2014 7:16 AM
> *To:* user@flume.apache.org
> *Cc:* user@flume.apache.org
> *Subject:* Re: Notification support from flume?
>
>
>
> Looking at .COMPLETED is not an indication that the data has been written
> out to HDFS. As of now, unfortunately there is no way to tag an event as
> coming from a specific file. I can’t think of a way to do this in a
> fool-proof way off the top of my mind. What is your use-case, there might
> be another way to do the same thing?
>
>
> Thanks,
> Hari
>
>
>
> On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <Ma...@itcinfotech.com>
> wrote:
>
>  Hi All,
>
>
>
> I wanted to know if there is a way of notification mechanism or some way
> of finding out if flume has finished transfer of certain file from spoolDir
> to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can
> assume its completed but wanted to know if there is more reliable way of
> call back mechanism ?
>
>
>
>
>
> Thanks,
>
> Manohar.
>
>
>
>
> Please consider the environment before printing this e-mail
>
>
> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>
>
>
>
>
>
> Please consider the environment before printing this e-mail
>
>
> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>
>
>
>
>
> --
>
> Best regards,
>
> Ahmed Vila | Senior software developer
>
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
>
>
> Office : +387 33 942 123
>
> Mobile: +387 62 139 348
>
>
>
> Website: www.devlogic.eu
>
> E-mail   : avila@devlogic.eu
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
>
>
>
>
> Please consider the environment before printing this e-mail
>
>
> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

RE: Notification support from flume?

Posted by Manohar CS <Ma...@itcinfotech.com>.
Hi Aravind,

You mentioned “You may be better off using something like an Oozie action to trigger a job when the dataset is complete. ”  - But how will I know that flume completed its transfer (moreover we want this to happen at regular intervals)

Thanks,
Manohar

From: Arvind Prabhakar [mailto:arvind@streamsets.com]
Sent: Monday, December 8, 2014 11:42 AM
To: user@flume.apache.org
Subject: Re: Notification support from flume?

Flume is not suited for file transfers as such. With that, please see my comments below:

- support for variable transaction size that could be set by the source or interceptor

The transactions are already variable sized. The only configuration that applies on top is the maximum size of a transaction. How is this different from what you are proposing?


 - SpoolDir to support creation of one transaction per file

If the file is large, you would run out of heap space quickly. Also, how do you recover from intermittent failures?

 - File and Memory channels to support spawning a process on transaction successful commit. Such process can be a bash script, but that would be implemented in plug-able class

You may be better off using something like an Oozie action to trigger a job when the dataset is complete.

Regards,
Arvind







On Sun, Dec 7, 2014 at 12:55 PM, Ahmed Vila <av...@devlogic.eu>> wrote:
Hi group,

Manohar's requirements sound valid. Guess there are other cases such "completion notification" could come in handy.

Thus, I would propose these distinct features that would make this possible via configuration:
 - support for variable transaction size that could be set by the source or interceptor
 - SpoolDir to support creation of one transaction per file
 - File and Memory channels to support spawning a process on transaction successful commit. Such process can be a bash script, but that would be implemented in plug-able class

The one thing I'm not sure about until I look at the code, if HDFSSink will write flush cache to the HDFS once it encounters no more events in a transaction.

What do you guys think ?


On Sat, Dec 6, 2014 at 7:31 AM, Manohar CS <Ma...@itcinfotech.com>> wrote:

Thanks Hari for your response.



My requirement goes like this -



1) There are bunch of files coming in at regular intervals (hourly or daily) in my spoolDir

2) I wan tthem to be moved into HDFS via HDFS sink using reg-ex like /target/%Y-%M%D so each day file gets into different destination HDFS

3) Now once this flume completes copying files , I want to kick off my MR job.



Thanks,

Manohar

________________________________
From: Hari Shreedharan <hs...@cloudera.com>>
Sent: Saturday, December 6, 2014 7:16 AM
To: user@flume.apache.org<ma...@flume.apache.org>
Cc: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Notification support from flume?

Looking at .COMPLETED is not an indication that the data has been written out to HDFS. As of now, unfortunately there is no way to tag an event as coming from a specific file. I can’t think of a way to do this in a fool-proof way off the top of my mind. What is your use-case, there might be another way to do the same thing?

Thanks,
Hari


On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <Ma...@itcinfotech.com>> wrote:
Hi All,


I wanted to know if there is a way of notification mechanism or some way of finding out if flume has finished transfer of certain file from spoolDir to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can assume its completed but wanted to know if there is more reliable way of call back mechanism ?




Thanks,
Manohar.


Please consider the environment before printing this e-mail

Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.




Please consider the environment before printing this e-mail

Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.




--

Best regards,
Ahmed Vila | Senior software developer
DevLogic | Sarajevo | Bosnia and Herzegovina

Office : +387 33 942 123<tel:%2B387%2033%20942%20123>
Mobile: +387 62 139 348<tel:%2B387%2062%20139%20348>

Website: www.devlogic.eu<http://www.devlogic.eu>
E-mail   : avila@devlogic.eu<ma...@devlogic.eu>
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended recipient(s) only. This email contains confidential information. It should not be copied, disclosed to, retained or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination or copying of this E-mail or its attachments, and/or any use of any information contained in them, is strictly prohibited and may be illegal. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender directly via email. Any emails that you send to us may be monitored by systems or persons other than the named communicant for the purposes of ascertaining whether the communication complies with the law and company policies.

---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended recipient(s) only. This email contains confidential information. It should not be copied, disclosed to, retained or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination or copying of this E-mail or its attachments, and/or any use of any information contained in them, is strictly prohibited and may be illegal. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender directly via email. Any emails that you send to us may be monitored by systems or persons other than the named communicant for the purposes of ascertaining whether the communication complies with the law and company policies.

Please consider the environment before printing this e-mail

Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.

Re: Notification support from flume?

Posted by Arvind Prabhakar <ar...@streamsets.com>.
Flume is not suited for file transfers as such. With that, please see my
comments below:

- support for variable transaction size that could be set by the source or
> interceptor
>

The transactions are already variable sized. The only configuration that
applies on top is the maximum size of a transaction. How is this different
from what you are proposing?



>  - SpoolDir to support creation of one transaction per file
>

If the file is large, you would run out of heap space quickly. Also, how do
you recover from intermittent failures?


>  - File and Memory channels to support spawning a process on transaction
> successful commit. Such process can be a bash script, but that would be
> implemented in plug-able class


You may be better off using something like an Oozie action to trigger a job
when the dataset is complete.

Regards,
Arvind







On Sun, Dec 7, 2014 at 12:55 PM, Ahmed Vila <av...@devlogic.eu> wrote:

> Hi group,
>
> Manohar's requirements sound valid. Guess there are other cases such
> "completion notification" could come in handy.
>
> Thus, I would propose these distinct features that would make this
> possible via configuration:
>  - support for variable transaction size that could be set by the source
> or interceptor
>  - SpoolDir to support creation of one transaction per file
>  - File and Memory channels to support spawning a process on transaction
> successful commit. Such process can be a bash script, but that would be
> implemented in plug-able class
>
> The one thing I'm not sure about until I look at the code, if HDFSSink
> will write flush cache to the HDFS once it encounters no more events in a
> transaction.
>
> What do you guys think ?
>
>
> On Sat, Dec 6, 2014 at 7:31 AM, Manohar CS <Ma...@itcinfotech.com>
> wrote:
>
>>  Thanks Hari for your response.
>>
>>
>>  My requirement goes like this -
>>
>>
>>  1) There are bunch of files coming in at regular intervals (hourly or
>> daily) in my spoolDir
>>
>> 2) I wan tthem to be moved into HDFS via HDFS sink using reg-ex like
>> /target/%Y-%M%D so each day file gets into different destination HDFS
>>
>> 3) Now once this flume completes copying files , I want to kick off my MR
>> job.
>>
>>
>>  Thanks,
>>
>> Manohar
>>  ------------------------------
>> *From:* Hari Shreedharan <hs...@cloudera.com>
>> *Sent:* Saturday, December 6, 2014 7:16 AM
>> *To:* user@flume.apache.org
>> *Cc:* user@flume.apache.org
>> *Subject:* Re: Notification support from flume?
>>
>>  Looking at .COMPLETED is not an indication that the data has been
>> written out to HDFS. As of now, unfortunately there is no way to tag an
>> event as coming from a specific file. I can’t think of a way to do this in
>> a fool-proof way off the top of my mind. What is your use-case, there might
>> be another way to do the same thing?
>>
>> Thanks,
>> Hari
>>
>>
>>  On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <Ma...@itcinfotech.com>
>> wrote:
>>
>>>  Hi All,
>>>
>>>
>>>
>>> I wanted to know if there is a way of notification mechanism or some way
>>> of finding out if flume has finished transfer of certain file from spoolDir
>>> to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can
>>> assume its completed but wanted to know if there is more reliable way of
>>> call back mechanism ?
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Manohar.
>>>
>>>
>>>
>>>
>>> Please consider the environment before printing this e-mail
>>>
>>>
>>> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>>
>>
>>
>>
>>
>> Please consider the environment before printing this e-mail
>>
>>
>> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>>
>
>
>
> --
>
> Best regards,
> Ahmed Vila | Senior software developer
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
> Office : +387 33 942 123
> Mobile: +387 62 139 348
>
> Website: www.devlogic.eu
> E-mail   : avila@devlogic.eu
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>

Re: Notification support from flume?

Posted by Ahmed Vila <av...@devlogic.eu>.
Hi group,

Manohar's requirements sound valid. Guess there are other cases such
"completion notification" could come in handy.

Thus, I would propose these distinct features that would make this possible
via configuration:
 - support for variable transaction size that could be set by the source or
interceptor
 - SpoolDir to support creation of one transaction per file
 - File and Memory channels to support spawning a process on transaction
successful commit. Such process can be a bash script, but that would be
implemented in plug-able class

The one thing I'm not sure about until I look at the code, if HDFSSink will
write flush cache to the HDFS once it encounters no more events in a
transaction.

What do you guys think ?


On Sat, Dec 6, 2014 at 7:31 AM, Manohar CS <Ma...@itcinfotech.com>
wrote:

>  Thanks Hari for your response.
>
>
>  My requirement goes like this -
>
>
>  1) There are bunch of files coming in at regular intervals (hourly or
> daily) in my spoolDir
>
> 2) I wan tthem to be moved into HDFS via HDFS sink using reg-ex like
> /target/%Y-%M%D so each day file gets into different destination HDFS
>
> 3) Now once this flume completes copying files , I want to kick off my MR
> job.
>
>
>  Thanks,
>
> Manohar
>  ------------------------------
> *From:* Hari Shreedharan <hs...@cloudera.com>
> *Sent:* Saturday, December 6, 2014 7:16 AM
> *To:* user@flume.apache.org
> *Cc:* user@flume.apache.org
> *Subject:* Re: Notification support from flume?
>
>  Looking at .COMPLETED is not an indication that the data has been
> written out to HDFS. As of now, unfortunately there is no way to tag an
> event as coming from a specific file. I can’t think of a way to do this in
> a fool-proof way off the top of my mind. What is your use-case, there might
> be another way to do the same thing?
>
> Thanks,
> Hari
>
>
>  On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <Ma...@itcinfotech.com>
> wrote:
>
>>  Hi All,
>>
>>
>>
>> I wanted to know if there is a way of notification mechanism or some way
>> of finding out if flume has finished transfer of certain file from spoolDir
>> to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can
>> assume its completed but wanted to know if there is more reliable way of
>> call back mechanism ?
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Manohar.
>>
>>
>>
>>
>> Please consider the environment before printing this e-mail
>>
>>
>> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>
>
>
>
>
> Please consider the environment before printing this e-mail
>
>
> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.
>



-- 

Best regards,
Ahmed Vila | Senior software developer
DevLogic | Sarajevo | Bosnia and Herzegovina

Office : +387 33 942 123
Mobile: +387 62 139 348

Website: www.devlogic.eu
E-mail   : avila@devlogic.eu
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended
recipient(s) only. This email contains confidential information. It should
not be copied, disclosed to, retained or used by, any party other than the
intended recipient. Any unauthorised distribution, dissemination or copying
of this E-mail or its attachments, and/or any use of any information
contained in them, is strictly prohibited and may be illegal. If you are
not an intended recipient then please promptly delete this e-mail and any
attachment and all copies and inform the sender directly via email. Any
emails that you send to us may be monitored by systems or persons other
than the named communicant for the purposes of ascertaining whether the
communication complies with the law and company policies.

-- 
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

RE: Notification support from flume?

Posted by Manohar CS <Ma...@itcinfotech.com>.
Thanks Hari for your response.


My requirement goes like this -


1) There are bunch of files coming in at regular intervals (hourly or daily) in my spoolDir

2) I wan tthem to be moved into HDFS via HDFS sink using reg-ex like /target/%Y-%M%D so each day file gets into different destination HDFS

3) Now once this flume completes copying files , I want to kick off my MR job.


Thanks,

Manohar

________________________________
From: Hari Shreedharan <hs...@cloudera.com>
Sent: Saturday, December 6, 2014 7:16 AM
To: user@flume.apache.org
Cc: user@flume.apache.org
Subject: Re: Notification support from flume?

Looking at .COMPLETED is not an indication that the data has been written out to HDFS. As of now, unfortunately there is no way to tag an event as coming from a specific file. I can't think of a way to do this in a fool-proof way off the top of my mind. What is your use-case, there might be another way to do the same thing?

Thanks,
Hari



On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <Ma...@itcinfotech.com>> wrote:
Hi All,


I wanted to know if there is a way of notification mechanism or some way of finding out if flume has finished transfer of certain file from spoolDir to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can assume its completed but wanted to know if there is more reliable way of call back mechanism ?




Thanks,
Manohar.



Please consider the environment before printing this e-mail

Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.

Please consider the environment before printing this e-mail

Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.

Re: Notification support from flume?

Posted by Hari Shreedharan <hs...@cloudera.com>.
Looking at .COMPLETED is not an indication that the data has been written out to HDFS. As of now, unfortunately there is no way to tag an event as coming from a specific file. I can’t think of a way to do this in a fool-proof way off the top of my mind. What is your use-case, there might be another way to do the same thing?


Thanks,
Hari

On Fri, Dec 5, 2014 at 4:19 AM, Manohar CS <Ma...@itcinfotech.com>
wrote:

> Hi All,
> I wanted to know if there is a way of notification mechanism or some way of finding out if flume has finished transfer of certain file from spoolDir to HDFS sink? We know by looking at .COMPLETED files in spoolDir we can assume its completed but wanted to know if there is more reliable way of call back mechanism ?
> Thanks,
> Manohar.
> Please consider the environment before printing this e-mail
> Disclaimer: This  communication  is  for the exclusive use of the intended recipient(s) and  shall  not attach any liability on the originator or ITC Infotech India Ltd./its  Holding company/ its Subsidiaries/ its Group Companies. If you are the addressee, the contents of this e-mail are intended for your use only and it shall  not be forwarded to any third party, without first obtaining written authorization from the originator or ITC Infotech India Ltd./ its Holding company/its  Subsidiaries/ its Group Companies. It may contain information which is confidential and legally privileged and the same shall not be used or dealt with  by any  third  party  in  any manner whatsoever without the specific consent  of  ITC  Infotech India Ltd./ its Holding company/ its Subsidiaries/ its Group Companies.