You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Justin Workman <ju...@gmail.com> on 2017/01/12 19:20:14 UTC

hdfs.idleTime

sorry for cross posting to user and dev. I have recently set up a flume
configuration where we are using the regex_extractor interceptor to parse
the actual event date from the record flowing through the Flume source,
then using that date to build the HDFS sink bucket path. However, it
appears that the hdfs.idleTimeout value is not honored in this
configuration. It does work when using the timestamp interceptor you build
the output path.

I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
never closed or renamed until I restart or shutdown Flume. Our flume is
configured to roll based on size or output path, and the files
rename/close/roll fine based on size, however the last file in each output
path is always left with the .tmp extension until we restart Flume. I would
expect that the file would be renamed and closed if there are no records
written to this file after the idleTimeout is reached.

Could I be missing something, or is this a known bug with the regex_extract
interceptor?

Thanks
Justin

Re: hdfs.idleTime

Posted by Justin Workman <ju...@gmail.com>.
This was posted and resolved on the user thread. Typo in my configuration was the issue. 

Thanks
Justin

Sent from my iPhone

> On Jan 17, 2017, at 12:42 AM, Tristan Stevens <tr...@cloudera.com> wrote:
> 
> Hi Justin,
> Please can you post your agent config and also any HDFS logs? Ideally you should be seeing INFO logs as follows: “Closing Idle Bucketwriter”.
> 
> Tristan
> 
> Tristan Stevens
> Senior Solutions Architect
> Cloudera, Inc. | www.cloudera.com
> m +44(0)7808 986422 | tristan@cloudera.com
> 
> <hadoop10.png>	Celebrating a decade of community accomplishments
> cloudera.com/hadoop10
> #hadoop10
> 
>> On 12 January 2017 at 19:23:18, Justin Workman (justinjworkman@gmail.com) wrote:
>> 
>> More details 
>> 
>> Flume 1.6 - Core Apache version. 
>> KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2). 
>> 
>> On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com> 
>> wrote: 
>> 
>> > sorry for cross posting to user and dev. I have recently set up a flume 
>> > configuration where we are using the regex_extractor interceptor to parse 
>> > the actual event date from the record flowing through the Flume source, 
>> > then using that date to build the HDFS sink bucket path. However, it 
>> > appears that the hdfs.idleTimeout value is not honored in this 
>> > configuration. It does work when using the timestamp interceptor you build 
>> > the output path. 
>> > 
>> > I have set the hdfs.idleTimeout value for the HDFS sink, but the files are 
>> > never closed or renamed until I restart or shutdown Flume. Our flume is 
>> > configured to roll based on size or output path, and the files 
>> > rename/close/roll fine based on size, however the last file in each output 
>> > path is always left with the .tmp extension until we restart Flume. I would 
>> > expect that the file would be renamed and closed if there are no records 
>> > written to this file after the idleTimeout is reached. 
>> > 
>> > Could I be missing something, or is this a known bug with the 
>> > regex_extract interceptor? 
>> > 
>> > Thanks 
>> > Justin 
>> > 

Re: hdfs.idleTime

Posted by Justin Workman <ju...@gmail.com>.
This was posted and resolved on the user thread. Typo in my configuration was the issue. 

Thanks
Justin

Sent from my iPhone

> On Jan 17, 2017, at 12:42 AM, Tristan Stevens <tr...@cloudera.com> wrote:
> 
> Hi Justin,
> Please can you post your agent config and also any HDFS logs? Ideally you should be seeing INFO logs as follows: “Closing Idle Bucketwriter”.
> 
> Tristan
> 
> Tristan Stevens
> Senior Solutions Architect
> Cloudera, Inc. | www.cloudera.com
> m +44(0)7808 986422 | tristan@cloudera.com
> 
> <hadoop10.png>	Celebrating a decade of community accomplishments
> cloudera.com/hadoop10
> #hadoop10
> 
>> On 12 January 2017 at 19:23:18, Justin Workman (justinjworkman@gmail.com) wrote:
>> 
>> More details 
>> 
>> Flume 1.6 - Core Apache version. 
>> KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2). 
>> 
>> On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com> 
>> wrote: 
>> 
>> > sorry for cross posting to user and dev. I have recently set up a flume 
>> > configuration where we are using the regex_extractor interceptor to parse 
>> > the actual event date from the record flowing through the Flume source, 
>> > then using that date to build the HDFS sink bucket path. However, it 
>> > appears that the hdfs.idleTimeout value is not honored in this 
>> > configuration. It does work when using the timestamp interceptor you build 
>> > the output path. 
>> > 
>> > I have set the hdfs.idleTimeout value for the HDFS sink, but the files are 
>> > never closed or renamed until I restart or shutdown Flume. Our flume is 
>> > configured to roll based on size or output path, and the files 
>> > rename/close/roll fine based on size, however the last file in each output 
>> > path is always left with the .tmp extension until we restart Flume. I would 
>> > expect that the file would be renamed and closed if there are no records 
>> > written to this file after the idleTimeout is reached. 
>> > 
>> > Could I be missing something, or is this a known bug with the 
>> > regex_extract interceptor? 
>> > 
>> > Thanks 
>> > Justin 
>> > 

Re: hdfs.idleTime

Posted by Tristan Stevens <tr...@cloudera.com>.
Hi Justin,
Please can you post your agent config and also any HDFS logs? Ideally you should be seeing INFO logs as follows: “Closing Idle Bucketwriter”.

Tristan

Tristan Stevens
Senior Solutions Architect
Cloudera, Inc. | www.cloudera.com
m +44(0)7808 986422 | tristan@cloudera.com

	Celebrating a decade of community accomplishments
cloudera.com/hadoop10
#hadoop10

On 12 January 2017 at 19:23:18, Justin Workman (justinjworkman@gmail.com) wrote:

More details  

Flume 1.6 - Core Apache version.  
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).  

On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com>  
wrote:  

> sorry for cross posting to user and dev. I have recently set up a flume  
> configuration where we are using the regex_extractor interceptor to parse  
> the actual event date from the record flowing through the Flume source,  
> then using that date to build the HDFS sink bucket path. However, it  
> appears that the hdfs.idleTimeout value is not honored in this  
> configuration. It does work when using the timestamp interceptor you build  
> the output path.  
>  
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are  
> never closed or renamed until I restart or shutdown Flume. Our flume is  
> configured to roll based on size or output path, and the files  
> rename/close/roll fine based on size, however the last file in each output  
> path is always left with the .tmp extension until we restart Flume. I would  
> expect that the file would be renamed and closed if there are no records  
> written to this file after the idleTimeout is reached.  
>  
> Could I be missing something, or is this a known bug with the  
> regex_extract interceptor?  
>  
> Thanks  
> Justin  
>  

Re: hdfs.idleTime

Posted by Justin Workman <ju...@gmail.com>.
More details

Flume 1.6 - Core Apache version.
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).

On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com>
wrote:

> sorry for cross posting to user and dev. I have recently set up a flume
> configuration where we are using the regex_extractor interceptor to parse
> the actual event date from the record flowing through the Flume source,
> then using that date to build the HDFS sink bucket path. However, it
> appears that the hdfs.idleTimeout value is not honored in this
> configuration. It does work when using the timestamp interceptor you build
> the output path.
>
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
> never closed or renamed until I restart or shutdown Flume. Our flume is
> configured to roll based on size or output path, and the files
> rename/close/roll fine based on size, however the last file in each output
> path is always left with the .tmp extension until we restart Flume. I would
> expect that the file would be renamed and closed if there are no records
> written to this file after the idleTimeout is reached.
>
> Could I be missing something, or is this a known bug with the
> regex_extract interceptor?
>
> Thanks
> Justin
>

Re: hdfs.idleTime

Posted by Justin Workman <ju...@gmail.com>.
Sorry for wasting anyones time. In reviewing my configuration, I have a
typo in the hdfs.idleTimeout configuration.

On Fri, Jan 13, 2017 at 2:14 PM, Justin Workman <ju...@gmail.com>
wrote:

> I'll try  debug again. The output /regex seems to be fine, but I never see
> a call to close/rename the last files in each directory until flume shuts
> down or restarts.
>
> I would expect to see this call when the idleTimeout value is reached.
>
> Sent from my iPhone
>
> On Jan 13, 2017, at 2:05 PM, iain wright <ia...@gmail.com> wrote:
>
> Might be worth trying the debug output (I forget exact sink name) to just
> log the headers being attached to events after the interceptor to validate
> the regex is working correctly, and for all events.
>
> I setup this exact config at previous company so I know it works.
>
> I also remember needing to escape the regex in an odd way due to how java
> was loading/parsing the config
>
> Best,
> Iain
>
> Sent from my iPhone
>
> On Jan 13, 2017, at 12:00 PM, Justin Workman <ju...@gmail.com>
> wrote:
>
> Absolutey, see below. Just to reiterate, when using the timestamp
> interceptor values to build the output path based on timestamp in the flume
> header, things roll correct. The files also roll just fine base on file
> size as well. However when using the regex_interceptor to get the actual
> events timestamp to use in the output path, the last file in each directory
> does not ever rename/close until flume is restarted.
>
>
> *flume-conf.properties*
> agent1.sources  = fpssKafkaTopic
> agent1.channels = fpssHdfsFileChannel
> agent1.sinks = fpssHdfsSink
>
> agent1.sources.fpssKafkaTopic.type = org.apache.flume.source.kafka.
> KafkaSource
> agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
> agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized
> agent1.sources.fpssKafkaTopic.groupId =  flume-first-pass-stream-
> sessionized
> agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
> agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
> agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
> agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
> agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
> agent1.sources.fpssKafkaTopic.interceptors.i2.type =
> org.apache.flume.interceptor.HostInterceptor$Builder
> agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
> agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
> agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
> agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
> agent1.sources.fpssKafkaTopic.interceptors.i3.regex =
> ^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\
> \d\\d)T(\\d\\d):.*\\"\\}.*$
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
> agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100
>
> agent1.channels.fpssHdfsFileChannel.type = file
> agent1.channels.fpssHdfsFileChannel.checkpointDir =
> /opt/flume/file-channel/fpss/checkpoint
> agent1.channels.fpssHdfsFileChannel.dataDirs =
> /opt/flume/file-channel/fpss/data
>
> agent1.sinks.fpssHdfsSink.type = hdfs
> agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
> agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
> agent1.sinks.fpssHdfsSink.hdfs.path = hdfs://prodcluster/flumedata/
> processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
> agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
> agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for
> privacy>
> agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0
> agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
> ## Account for compression. See flume-2128
> ## My calculation: 512 * 1024 * 1024 * 2.75
> agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
> # Close file if idle more than 300 seconds
> agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
> agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
> agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
> agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
> agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
> agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
> agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
> agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
> agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000
>
> *HDFS Output Since Midnight (Notice the last file is never closed/renamed)*
>  hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
> 17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Found 7 items
> -rw-r--r--   3 b2c_runtime hadoop  513710580 2017-01-13 00:09
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815397.snappy
> -rw-r--r--   3 b2c_runtime hadoop  514439844 2017-01-13 00:18
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815398.snappy
> -rw-r--r--   3 b2c_runtime hadoop  515125962 2017-01-13 00:28
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815399.snappy
> -rw-r--r--   3 b2c_runtime hadoop  513010837 2017-01-13 00:38
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815400.snappy
> -rw-r--r--   3 b2c_runtime hadoop  511315467 2017-01-13 00:49
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815401.snappy
> -rw-r--r--   3 b2c_runtime hadoop  508420966 2017-01-13 00:59
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815402.snappy
> -rw-r--r--   3 b2c_runtime hadoop    2503353 2017-01-13 00:59
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815403.snappy.tmp
> Found 6 items
> -rw-r--r--   3 b2c_runtime hadoop  509116221 2017-01-13 01:10
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415705.snappy
> -rw-r--r--   3 b2c_runtime hadoop  507800675 2017-01-13 01:21
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415706.snappy
> -rw-r--r--   3 b2c_runtime hadoop  504432110 2017-01-13 01:32
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415707.snappy
> -rw-r--r--   3 b2c_runtime hadoop  501932914 2017-01-13 01:42
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415708.snappy
> -rw-r--r--   3 b2c_runtime hadoop  498136257 2017-01-13 01:50
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415709.snappy
> -rw-r--r--   3 b2c_runtime hadoop      60539 2017-01-13 01:50
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415710.snappy.tmp
> Found 6 items
> -rw-r--r--   3 b2c_runtime hadoop  500879399 2017-01-13 02:11
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016017.snappy
> -rw-r--r--   3 b2c_runtime hadoop  501827071 2017-01-13 02:21
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016018.snappy
> -rw-r--r--   3 b2c_runtime hadoop  501489101 2017-01-13 02:32
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016019.snappy
> -rw-r--r--   3 b2c_runtime hadoop  501527838 2017-01-13 02:43
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016020.snappy
> -rw-r--r--   3 b2c_runtime hadoop  499393977 2017-01-13 02:54
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016021.snappy
> -rw-r--r--   3 b2c_runtime hadoop    1282327 2017-01-13 02:54
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016022.snappy.tmp
> Found 6 items
> -rw-r--r--   3 b2c_runtime hadoop  501033294 2017-01-13 03:10
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615579.snappy
> -rw-r--r--   3 b2c_runtime hadoop  500933906 2017-01-13 03:20
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615580.snappy
> -rw-r--r--   3 b2c_runtime hadoop  505869233 2017-01-13 03:31
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615581.snappy
> -rw-r--r--   3 b2c_runtime hadoop  502910608 2017-01-13 03:41
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615582.snappy
> -rw-r--r--   3 b2c_runtime hadoop  499561080 2017-01-13 03:52
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615583.snappy
> -rw-r--r--   3 b2c_runtime hadoop    3616826 2017-01-13 03:52
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615584.snappy.tmp
> Found 6 items
> -rw-r--r--   3 b2c_runtime hadoop  502243204 2017-01-13 04:11
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215893.snappy
> -rw-r--r--   3 b2c_runtime hadoop  508966498 2017-01-13 04:22
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215894.snappy
> -rw-r--r--   3 b2c_runtime hadoop  510972236 2017-01-13 04:34
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215895.snappy
> -rw-r--r--   3 b2c_runtime hadoop  513225577 2017-01-13 04:46
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215896.snappy
> -rw-r--r--   3 b2c_runtime hadoop  512743679 2017-01-13 04:57
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215897.snappy
> -rw-r--r--   3 b2c_runtime hadoop    3888775 2017-01-13 04:57
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215898.snappy.tmp
> Found 7 items
> -rw-r--r--   3 b2c_runtime hadoop  515832251 2017-01-13 05:11
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811983.snappy
> -rw-r--r--   3 b2c_runtime hadoop  518077964 2017-01-13 05:20
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811984.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519490676 2017-01-13 05:29
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811985.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519105563 2017-01-13 05:37
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811986.snappy
> -rw-r--r--   3 b2c_runtime hadoop  518672209 2017-01-13 05:46
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811987.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520019853 2017-01-13 05:53
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811988.snappy
> -rw-r--r--   3 b2c_runtime hadoop    1574211 2017-01-13 05:53
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811989.snappy.tmp
> Found 9 items
> -rw-r--r--   3 b2c_runtime hadoop  521428204 2017-01-13 06:07
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413743.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519885769 2017-01-13 06:15
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413744.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519050891 2017-01-13 06:21
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413745.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520691322 2017-01-13 06:29
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413746.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520902319 2017-01-13 06:36
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413747.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520831873 2017-01-13 06:42
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413748.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519785647 2017-01-13 06:49
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413749.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520590143 2017-01-13 06:55
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413750.snappy
> -rw-r--r--   3 b2c_runtime hadoop    4621367 2017-01-13 06:55
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413751.snappy.tmp
> Found 11 items
> -rw-r--r--   3 b2c_runtime hadoop  522623760 2017-01-13 07:06
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015214.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523065112 2017-01-13 07:12
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015215.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523445533 2017-01-13 07:18
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015216.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523084945 2017-01-13 07:24
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015217.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524283976 2017-01-13 07:30
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015218.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523923379 2017-01-13 07:36
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015219.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523910723 2017-01-13 07:42
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015220.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524266095 2017-01-13 07:47
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015221.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523002505 2017-01-13 07:53
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015222.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520706211 2017-01-13 07:58
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015223.snappy
> -rw-r--r--   3 b2c_runtime hadoop    8051588 2017-01-13 07:58
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015224.snappy.tmp
> Found 11 items
> -rw-r--r--   3 b2c_runtime hadoop  520528155 2017-01-13 08:05
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618433.snappy
> -rw-r--r--   3 b2c_runtime hadoop  521761390 2017-01-13 08:11
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618434.snappy
> -rw-r--r--   3 b2c_runtime hadoop  522548272 2017-01-13 08:16
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618435.snappy
> -rw-r--r--   3 b2c_runtime hadoop  522616117 2017-01-13 08:22
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618436.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525953759 2017-01-13 08:28
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618437.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524475009 2017-01-13 08:34
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618438.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523995339 2017-01-13 08:40
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618439.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524188832 2017-01-13 08:47
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618440.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525303001 2017-01-13 08:53
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618441.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525606532 2017-01-13 08:59
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618442.snappy
> -rw-r--r--   3 b2c_runtime hadoop    4486982 2017-01-13 08:59
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618443.snappy.tmp
> Found 11 items
> -rw-r--r--   3 b2c_runtime hadoop  525207364 2017-01-13 09:06
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216987.snappy
> -rw-r--r--   3 b2c_runtime hadoop  526105891 2017-01-13 09:12
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216988.snappy
> -rw-r--r--   3 b2c_runtime hadoop  526426735 2017-01-13 09:18
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216989.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525298099 2017-01-13 09:24
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216990.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525282945 2017-01-13 09:30
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216991.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523921005 2017-01-13 09:36
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216992.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524827705 2017-01-13 09:42
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216993.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524203463 2017-01-13 09:47
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216994.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524678485 2017-01-13 09:53
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216995.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524598220 2017-01-13 09:59
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216996.snappy
> -rw-r--r--   3 b2c_runtime hadoop    3877959 2017-01-13 09:59
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216997.snappy.tmp
> Found 10 items
> -rw-r--r--   3 b2c_runtime hadoop  523000460 2017-01-13 10:06
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813831.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523455154 2017-01-13 10:12
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813832.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525465618 2017-01-13 10:18
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813833.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524630955 2017-01-13 10:24
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813834.snappy
> -rw-r--r--   3 b2c_runtime hadoop  527780298 2017-01-13 10:30
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813835.snappy
> -rw-r--r--   3 b2c_runtime hadoop  526565562 2017-01-13 10:37
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813836.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524936336 2017-01-13 10:43
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813837.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524565610 2017-01-13 10:49
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813838.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524276950 2017-01-13 10:55
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813839.snappy
> -rw-r--r--   3 b2c_runtime hadoop     654810 2017-01-13 10:55
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813840.snappy.tmp
> Found 11 items
> -rw-r--r--   3 b2c_runtime hadoop  524174553 2017-01-13 11:06
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415712.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524127864 2017-01-13 11:12
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415713.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524778919 2017-01-13 11:18
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415714.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524851182 2017-01-13 11:24
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415715.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525156750 2017-01-13 11:30
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415716.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525334538 2017-01-13 11:35
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415717.snappy
> -rw-r--r--   3 b2c_runtime hadoop  527346578 2017-01-13 11:41
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415718.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525592734 2017-01-13 11:47
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415719.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525502291 2017-01-13 11:53
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415720.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523135186 2017-01-13 11:58
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415721.snappy
> -rw-r--r--   3 b2c_runtime hadoop    9967141 2017-01-13 11:58
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415722.snappy.tmp
> Found 7 items
> -rw-r--r--   3 b2c_runtime hadoop  520881970 2017-01-13 12:05
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016849.snappy
> -rw-r--r--   3 b2c_runtime hadoop  522340745 2017-01-13 12:11
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016850.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524156495 2017-01-13 12:17
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016851.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523482390 2017-01-13 12:23
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016852.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524096591 2017-01-13 12:29
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016853.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523184628 2017-01-13 12:35
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016854.snappy
> -rw-r--r--   3 b2c_runtime hadoop   10981218 2017-01-13 12:35
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016855.snappy.tmp
>
> *HDFS Stat On One Of The File (Keep in Mind the output backet is based on
> event time that is MDT/MST vs the stat date of GMT)*
>  hadoop fs -stat "%y %n"  /flumedata/processed/first-
> pass-stream/2017/01/13/10-00/flumeload100
> -log.1484326813840.snappy.tmp
> 17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp
>
> Thanks
> Justin
>
> On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <de...@cloudera.com> wrote:
>
>> Hi Justin,
>>
>> Could you please share your config file with us?
>>
>> Thanks,
>> Denes
>>
>>
>> On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com>
>> wrote:
>>
>>> sorry for cross posting to user and dev. I have recently set up a flume
>>> configuration where we are using the regex_extractor interceptor to parse
>>> the actual event date from the record flowing through the Flume source,
>>> then using that date to build the HDFS sink bucket path. However, it
>>> appears that the hdfs.idleTimeout value is not honored in this
>>> configuration. It does work when using the timestamp interceptor you build
>>> the output path.
>>>
>>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files
>>> are never closed or renamed until I restart or shutdown Flume. Our flume is
>>> configured to roll based on size or output path, and the files
>>> rename/close/roll fine based on size, however the last file in each output
>>> path is always left with the .tmp extension until we restart Flume. I would
>>> expect that the file would be renamed and closed if there are no records
>>> written to this file after the idleTimeout is reached.
>>>
>>> Could I be missing something, or is this a known bug with the
>>> regex_extract interceptor?
>>>
>>> Thanks
>>> Justin
>>>
>>
>

Re: hdfs.idleTime

Posted by Justin Workman <ju...@gmail.com>.
I'll try  debug again. The output /regex seems to be fine, but I never see a call to close/rename the last files in each directory until flume shuts down or restarts. 

I would expect to see this call when the idleTimeout value is reached. 

Sent from my iPhone

> On Jan 13, 2017, at 2:05 PM, iain wright <ia...@gmail.com> wrote:
> 
> Might be worth trying the debug output (I forget exact sink name) to just log the headers being attached to events after the interceptor to validate the regex is working correctly, and for all events. 
> 
> I setup this exact config at previous company so I know it works. 
> 
> I also remember needing to escape the regex in an odd way due to how java was loading/parsing the config 
> 
> Best,
> Iain
> 
> Sent from my iPhone
> 
>> On Jan 13, 2017, at 12:00 PM, Justin Workman <ju...@gmail.com> wrote:
>> 
>> Absolutey, see below. Just to reiterate, when using the timestamp interceptor values to build the output path based on timestamp in the flume header, things roll correct. The files also roll just fine base on file size as well. However when using the regex_interceptor to get the actual events timestamp to use in the output path, the last file in each directory does not ever rename/close until flume is restarted.
>> 
>> 
>> flume-conf.properties
>> agent1.sources  = fpssKafkaTopic
>> agent1.channels = fpssHdfsFileChannel
>> agent1.sinks = fpssHdfsSink
>> 
>> agent1.sources.fpssKafkaTopic.type = org.apache.flume.source.kafka.KafkaSource
>> agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
>> agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized 
>> agent1.sources.fpssKafkaTopic.groupId =  flume-first-pass-stream-sessionized
>> agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
>> agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
>> agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
>> agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
>> agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
>> agent1.sources.fpssKafkaTopic.interceptors.i2.type = org.apache.flume.interceptor.HostInterceptor$Builder
>> agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
>> agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
>> agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
>> agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
>> agent1.sources.fpssKafkaTopic.interceptors.i3.regex = ^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d)T(\\d\\d):.*\\"\\}.*$
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
>> agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100
>> 
>> agent1.channels.fpssHdfsFileChannel.type = file
>> agent1.channels.fpssHdfsFileChannel.checkpointDir = /opt/flume/file-channel/fpss/checkpoint
>> agent1.channels.fpssHdfsFileChannel.dataDirs = /opt/flume/file-channel/fpss/data
>> 
>> agent1.sinks.fpssHdfsSink.type = hdfs
>> agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
>> agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
>> agent1.sinks.fpssHdfsSink.hdfs.path = hdfs://prodcluster/flumedata/processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
>> agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
>> agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for privacy>
>> agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0 
>> agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
>> ## Account for compression. See flume-2128
>> ## My calculation: 512 * 1024 * 1024 * 2.75
>> agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
>> # Close file if idle more than 300 seconds
>> agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
>> agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
>> agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
>> agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
>> agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
>> agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
>> agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
>> agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
>> agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000
>> 
>> HDFS Output Since Midnight (Notice the last file is never closed/renamed)
>>  hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
>> 17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> Found 7 items
>> -rw-r--r--   3 b2c_runtime hadoop  513710580 2017-01-13 00:09 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815397.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  514439844 2017-01-13 00:18 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815398.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  515125962 2017-01-13 00:28 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815399.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  513010837 2017-01-13 00:38 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815400.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  511315467 2017-01-13 00:49 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815401.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  508420966 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815402.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    2503353 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815403.snappy.tmp
>> Found 6 items
>> -rw-r--r--   3 b2c_runtime hadoop  509116221 2017-01-13 01:10 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415705.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  507800675 2017-01-13 01:21 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415706.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  504432110 2017-01-13 01:32 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415707.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  501932914 2017-01-13 01:42 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415708.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  498136257 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415709.snappy
>> -rw-r--r--   3 b2c_runtime hadoop      60539 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415710.snappy.tmp
>> Found 6 items
>> -rw-r--r--   3 b2c_runtime hadoop  500879399 2017-01-13 02:11 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016017.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  501827071 2017-01-13 02:21 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016018.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  501489101 2017-01-13 02:32 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016019.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  501527838 2017-01-13 02:43 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016020.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  499393977 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016021.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    1282327 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016022.snappy.tmp
>> Found 6 items
>> -rw-r--r--   3 b2c_runtime hadoop  501033294 2017-01-13 03:10 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615579.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  500933906 2017-01-13 03:20 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615580.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  505869233 2017-01-13 03:31 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615581.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  502910608 2017-01-13 03:41 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615582.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  499561080 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615583.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    3616826 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615584.snappy.tmp
>> Found 6 items
>> -rw-r--r--   3 b2c_runtime hadoop  502243204 2017-01-13 04:11 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215893.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  508966498 2017-01-13 04:22 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215894.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  510972236 2017-01-13 04:34 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215895.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  513225577 2017-01-13 04:46 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215896.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  512743679 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215897.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    3888775 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215898.snappy.tmp
>> Found 7 items
>> -rw-r--r--   3 b2c_runtime hadoop  515832251 2017-01-13 05:11 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811983.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  518077964 2017-01-13 05:20 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811984.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519490676 2017-01-13 05:29 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811985.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519105563 2017-01-13 05:37 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811986.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  518672209 2017-01-13 05:46 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811987.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520019853 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811988.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    1574211 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811989.snappy.tmp
>> Found 9 items
>> -rw-r--r--   3 b2c_runtime hadoop  521428204 2017-01-13 06:07 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413743.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519885769 2017-01-13 06:15 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413744.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519050891 2017-01-13 06:21 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413745.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520691322 2017-01-13 06:29 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413746.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520902319 2017-01-13 06:36 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413747.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520831873 2017-01-13 06:42 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413748.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  519785647 2017-01-13 06:49 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413749.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520590143 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413750.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    4621367 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413751.snappy.tmp
>> Found 11 items
>> -rw-r--r--   3 b2c_runtime hadoop  522623760 2017-01-13 07:06 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015214.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523065112 2017-01-13 07:12 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015215.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523445533 2017-01-13 07:18 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015216.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523084945 2017-01-13 07:24 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015217.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524283976 2017-01-13 07:30 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015218.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523923379 2017-01-13 07:36 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015219.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523910723 2017-01-13 07:42 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015220.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524266095 2017-01-13 07:47 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015221.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523002505 2017-01-13 07:53 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015222.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  520706211 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015223.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    8051588 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015224.snappy.tmp
>> Found 11 items
>> -rw-r--r--   3 b2c_runtime hadoop  520528155 2017-01-13 08:05 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618433.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  521761390 2017-01-13 08:11 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618434.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  522548272 2017-01-13 08:16 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618435.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  522616117 2017-01-13 08:22 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618436.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525953759 2017-01-13 08:28 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618437.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524475009 2017-01-13 08:34 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618438.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523995339 2017-01-13 08:40 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618439.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524188832 2017-01-13 08:47 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618440.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525303001 2017-01-13 08:53 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618441.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525606532 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618442.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    4486982 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618443.snappy.tmp
>> Found 11 items
>> -rw-r--r--   3 b2c_runtime hadoop  525207364 2017-01-13 09:06 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216987.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  526105891 2017-01-13 09:12 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216988.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  526426735 2017-01-13 09:18 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216989.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525298099 2017-01-13 09:24 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216990.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525282945 2017-01-13 09:30 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216991.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523921005 2017-01-13 09:36 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216992.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524827705 2017-01-13 09:42 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216993.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524203463 2017-01-13 09:47 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216994.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524678485 2017-01-13 09:53 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216995.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524598220 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216996.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    3877959 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216997.snappy.tmp
>> Found 10 items
>> -rw-r--r--   3 b2c_runtime hadoop  523000460 2017-01-13 10:06 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813831.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523455154 2017-01-13 10:12 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813832.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525465618 2017-01-13 10:18 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813833.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524630955 2017-01-13 10:24 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813834.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  527780298 2017-01-13 10:30 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813835.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  526565562 2017-01-13 10:37 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813836.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524936336 2017-01-13 10:43 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813837.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524565610 2017-01-13 10:49 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813838.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524276950 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813839.snappy
>> -rw-r--r--   3 b2c_runtime hadoop     654810 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813840.snappy.tmp
>> Found 11 items
>> -rw-r--r--   3 b2c_runtime hadoop  524174553 2017-01-13 11:06 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415712.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524127864 2017-01-13 11:12 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415713.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524778919 2017-01-13 11:18 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415714.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524851182 2017-01-13 11:24 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415715.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525156750 2017-01-13 11:30 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415716.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525334538 2017-01-13 11:35 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415717.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  527346578 2017-01-13 11:41 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415718.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525592734 2017-01-13 11:47 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415719.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  525502291 2017-01-13 11:53 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415720.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523135186 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415721.snappy
>> -rw-r--r--   3 b2c_runtime hadoop    9967141 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415722.snappy.tmp
>> Found 7 items
>> -rw-r--r--   3 b2c_runtime hadoop  520881970 2017-01-13 12:05 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016849.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  522340745 2017-01-13 12:11 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016850.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524156495 2017-01-13 12:17 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016851.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523482390 2017-01-13 12:23 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016852.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  524096591 2017-01-13 12:29 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016853.snappy
>> -rw-r--r--   3 b2c_runtime hadoop  523184628 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016854.snappy
>> -rw-r--r--   3 b2c_runtime hadoop   10981218 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016855.snappy.tmp
>> 
>> HDFS Stat On One Of The File (Keep in Mind the output backet is based on event time that is MDT/MST vs the stat date of GMT)
>>  hadoop fs -stat "%y %n"  /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100
>> -log.1484326813840.snappy.tmp
>> 17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> 2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp
>> 
>> Thanks
>> Justin
>> 
>>> On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <de...@cloudera.com> wrote:
>>> Hi Justin,
>>> 
>>> Could you please share your config file with us?
>>> 
>>> Thanks,
>>> Denes
>>> 
>>> 
>>>> On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com> wrote:
>>>> sorry for cross posting to user and dev. I have recently set up a flume configuration where we are using the regex_extractor interceptor to parse the actual event date from the record flowing through the Flume source, then using that date to build the HDFS sink bucket path. However, it appears that the hdfs.idleTimeout value is not honored in this configuration. It does work when using the timestamp interceptor you build the output path.
>>>> 
>>>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are never closed or renamed until I restart or shutdown Flume. Our flume is configured to roll based on size or output path, and the files rename/close/roll fine based on size, however the last file in each output path is always left with the .tmp extension until we restart Flume. I would expect that the file would be renamed and closed if there are no records written to this file after the idleTimeout is reached.
>>>> 
>>>> Could I be missing something, or is this a known bug with the regex_extract interceptor?
>>>> 
>>>> Thanks
>>>> Justin
>> 

Re: hdfs.idleTime

Posted by iain wright <ia...@gmail.com>.
Might be worth trying the debug output (I forget exact sink name) to just log the headers being attached to events after the interceptor to validate the regex is working correctly, and for all events. 

I setup this exact config at previous company so I know it works. 

I also remember needing to escape the regex in an odd way due to how java was loading/parsing the config 

Best,
Iain

Sent from my iPhone

> On Jan 13, 2017, at 12:00 PM, Justin Workman <ju...@gmail.com> wrote:
> 
> Absolutey, see below. Just to reiterate, when using the timestamp interceptor values to build the output path based on timestamp in the flume header, things roll correct. The files also roll just fine base on file size as well. However when using the regex_interceptor to get the actual events timestamp to use in the output path, the last file in each directory does not ever rename/close until flume is restarted.
> 
> 
> flume-conf.properties
> agent1.sources  = fpssKafkaTopic
> agent1.channels = fpssHdfsFileChannel
> agent1.sinks = fpssHdfsSink
> 
> agent1.sources.fpssKafkaTopic.type = org.apache.flume.source.kafka.KafkaSource
> agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
> agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized 
> agent1.sources.fpssKafkaTopic.groupId =  flume-first-pass-stream-sessionized
> agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
> agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
> agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
> agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
> agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
> agent1.sources.fpssKafkaTopic.interceptors.i2.type = org.apache.flume.interceptor.HostInterceptor$Builder
> agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
> agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
> agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
> agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
> agent1.sources.fpssKafkaTopic.interceptors.i3.regex = ^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d)T(\\d\\d):.*\\"\\}.*$
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
> agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100
> 
> agent1.channels.fpssHdfsFileChannel.type = file
> agent1.channels.fpssHdfsFileChannel.checkpointDir = /opt/flume/file-channel/fpss/checkpoint
> agent1.channels.fpssHdfsFileChannel.dataDirs = /opt/flume/file-channel/fpss/data
> 
> agent1.sinks.fpssHdfsSink.type = hdfs
> agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
> agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
> agent1.sinks.fpssHdfsSink.hdfs.path = hdfs://prodcluster/flumedata/processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
> agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
> agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for privacy>
> agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0 
> agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
> ## Account for compression. See flume-2128
> ## My calculation: 512 * 1024 * 1024 * 2.75
> agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
> # Close file if idle more than 300 seconds
> agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
> agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
> agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
> agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
> agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
> agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
> agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
> agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
> agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000
> 
> HDFS Output Since Midnight (Notice the last file is never closed/renamed)
>  hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
> 17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Found 7 items
> -rw-r--r--   3 b2c_runtime hadoop  513710580 2017-01-13 00:09 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815397.snappy
> -rw-r--r--   3 b2c_runtime hadoop  514439844 2017-01-13 00:18 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815398.snappy
> -rw-r--r--   3 b2c_runtime hadoop  515125962 2017-01-13 00:28 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815399.snappy
> -rw-r--r--   3 b2c_runtime hadoop  513010837 2017-01-13 00:38 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815400.snappy
> -rw-r--r--   3 b2c_runtime hadoop  511315467 2017-01-13 00:49 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815401.snappy
> -rw-r--r--   3 b2c_runtime hadoop  508420966 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815402.snappy
> -rw-r--r--   3 b2c_runtime hadoop    2503353 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815403.snappy.tmp
> Found 6 items
> -rw-r--r--   3 b2c_runtime hadoop  509116221 2017-01-13 01:10 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415705.snappy
> -rw-r--r--   3 b2c_runtime hadoop  507800675 2017-01-13 01:21 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415706.snappy
> -rw-r--r--   3 b2c_runtime hadoop  504432110 2017-01-13 01:32 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415707.snappy
> -rw-r--r--   3 b2c_runtime hadoop  501932914 2017-01-13 01:42 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415708.snappy
> -rw-r--r--   3 b2c_runtime hadoop  498136257 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415709.snappy
> -rw-r--r--   3 b2c_runtime hadoop      60539 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415710.snappy.tmp
> Found 6 items
> -rw-r--r--   3 b2c_runtime hadoop  500879399 2017-01-13 02:11 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016017.snappy
> -rw-r--r--   3 b2c_runtime hadoop  501827071 2017-01-13 02:21 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016018.snappy
> -rw-r--r--   3 b2c_runtime hadoop  501489101 2017-01-13 02:32 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016019.snappy
> -rw-r--r--   3 b2c_runtime hadoop  501527838 2017-01-13 02:43 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016020.snappy
> -rw-r--r--   3 b2c_runtime hadoop  499393977 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016021.snappy
> -rw-r--r--   3 b2c_runtime hadoop    1282327 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016022.snappy.tmp
> Found 6 items
> -rw-r--r--   3 b2c_runtime hadoop  501033294 2017-01-13 03:10 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615579.snappy
> -rw-r--r--   3 b2c_runtime hadoop  500933906 2017-01-13 03:20 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615580.snappy
> -rw-r--r--   3 b2c_runtime hadoop  505869233 2017-01-13 03:31 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615581.snappy
> -rw-r--r--   3 b2c_runtime hadoop  502910608 2017-01-13 03:41 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615582.snappy
> -rw-r--r--   3 b2c_runtime hadoop  499561080 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615583.snappy
> -rw-r--r--   3 b2c_runtime hadoop    3616826 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615584.snappy.tmp
> Found 6 items
> -rw-r--r--   3 b2c_runtime hadoop  502243204 2017-01-13 04:11 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215893.snappy
> -rw-r--r--   3 b2c_runtime hadoop  508966498 2017-01-13 04:22 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215894.snappy
> -rw-r--r--   3 b2c_runtime hadoop  510972236 2017-01-13 04:34 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215895.snappy
> -rw-r--r--   3 b2c_runtime hadoop  513225577 2017-01-13 04:46 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215896.snappy
> -rw-r--r--   3 b2c_runtime hadoop  512743679 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215897.snappy
> -rw-r--r--   3 b2c_runtime hadoop    3888775 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215898.snappy.tmp
> Found 7 items
> -rw-r--r--   3 b2c_runtime hadoop  515832251 2017-01-13 05:11 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811983.snappy
> -rw-r--r--   3 b2c_runtime hadoop  518077964 2017-01-13 05:20 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811984.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519490676 2017-01-13 05:29 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811985.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519105563 2017-01-13 05:37 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811986.snappy
> -rw-r--r--   3 b2c_runtime hadoop  518672209 2017-01-13 05:46 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811987.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520019853 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811988.snappy
> -rw-r--r--   3 b2c_runtime hadoop    1574211 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811989.snappy.tmp
> Found 9 items
> -rw-r--r--   3 b2c_runtime hadoop  521428204 2017-01-13 06:07 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413743.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519885769 2017-01-13 06:15 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413744.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519050891 2017-01-13 06:21 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413745.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520691322 2017-01-13 06:29 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413746.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520902319 2017-01-13 06:36 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413747.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520831873 2017-01-13 06:42 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413748.snappy
> -rw-r--r--   3 b2c_runtime hadoop  519785647 2017-01-13 06:49 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413749.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520590143 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413750.snappy
> -rw-r--r--   3 b2c_runtime hadoop    4621367 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413751.snappy.tmp
> Found 11 items
> -rw-r--r--   3 b2c_runtime hadoop  522623760 2017-01-13 07:06 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015214.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523065112 2017-01-13 07:12 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015215.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523445533 2017-01-13 07:18 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015216.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523084945 2017-01-13 07:24 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015217.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524283976 2017-01-13 07:30 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015218.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523923379 2017-01-13 07:36 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015219.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523910723 2017-01-13 07:42 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015220.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524266095 2017-01-13 07:47 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015221.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523002505 2017-01-13 07:53 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015222.snappy
> -rw-r--r--   3 b2c_runtime hadoop  520706211 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015223.snappy
> -rw-r--r--   3 b2c_runtime hadoop    8051588 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015224.snappy.tmp
> Found 11 items
> -rw-r--r--   3 b2c_runtime hadoop  520528155 2017-01-13 08:05 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618433.snappy
> -rw-r--r--   3 b2c_runtime hadoop  521761390 2017-01-13 08:11 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618434.snappy
> -rw-r--r--   3 b2c_runtime hadoop  522548272 2017-01-13 08:16 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618435.snappy
> -rw-r--r--   3 b2c_runtime hadoop  522616117 2017-01-13 08:22 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618436.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525953759 2017-01-13 08:28 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618437.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524475009 2017-01-13 08:34 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618438.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523995339 2017-01-13 08:40 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618439.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524188832 2017-01-13 08:47 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618440.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525303001 2017-01-13 08:53 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618441.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525606532 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618442.snappy
> -rw-r--r--   3 b2c_runtime hadoop    4486982 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618443.snappy.tmp
> Found 11 items
> -rw-r--r--   3 b2c_runtime hadoop  525207364 2017-01-13 09:06 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216987.snappy
> -rw-r--r--   3 b2c_runtime hadoop  526105891 2017-01-13 09:12 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216988.snappy
> -rw-r--r--   3 b2c_runtime hadoop  526426735 2017-01-13 09:18 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216989.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525298099 2017-01-13 09:24 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216990.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525282945 2017-01-13 09:30 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216991.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523921005 2017-01-13 09:36 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216992.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524827705 2017-01-13 09:42 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216993.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524203463 2017-01-13 09:47 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216994.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524678485 2017-01-13 09:53 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216995.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524598220 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216996.snappy
> -rw-r--r--   3 b2c_runtime hadoop    3877959 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216997.snappy.tmp
> Found 10 items
> -rw-r--r--   3 b2c_runtime hadoop  523000460 2017-01-13 10:06 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813831.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523455154 2017-01-13 10:12 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813832.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525465618 2017-01-13 10:18 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813833.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524630955 2017-01-13 10:24 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813834.snappy
> -rw-r--r--   3 b2c_runtime hadoop  527780298 2017-01-13 10:30 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813835.snappy
> -rw-r--r--   3 b2c_runtime hadoop  526565562 2017-01-13 10:37 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813836.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524936336 2017-01-13 10:43 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813837.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524565610 2017-01-13 10:49 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813838.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524276950 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813839.snappy
> -rw-r--r--   3 b2c_runtime hadoop     654810 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813840.snappy.tmp
> Found 11 items
> -rw-r--r--   3 b2c_runtime hadoop  524174553 2017-01-13 11:06 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415712.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524127864 2017-01-13 11:12 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415713.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524778919 2017-01-13 11:18 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415714.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524851182 2017-01-13 11:24 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415715.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525156750 2017-01-13 11:30 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415716.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525334538 2017-01-13 11:35 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415717.snappy
> -rw-r--r--   3 b2c_runtime hadoop  527346578 2017-01-13 11:41 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415718.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525592734 2017-01-13 11:47 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415719.snappy
> -rw-r--r--   3 b2c_runtime hadoop  525502291 2017-01-13 11:53 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415720.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523135186 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415721.snappy
> -rw-r--r--   3 b2c_runtime hadoop    9967141 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415722.snappy.tmp
> Found 7 items
> -rw-r--r--   3 b2c_runtime hadoop  520881970 2017-01-13 12:05 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016849.snappy
> -rw-r--r--   3 b2c_runtime hadoop  522340745 2017-01-13 12:11 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016850.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524156495 2017-01-13 12:17 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016851.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523482390 2017-01-13 12:23 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016852.snappy
> -rw-r--r--   3 b2c_runtime hadoop  524096591 2017-01-13 12:29 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016853.snappy
> -rw-r--r--   3 b2c_runtime hadoop  523184628 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016854.snappy
> -rw-r--r--   3 b2c_runtime hadoop   10981218 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016855.snappy.tmp
> 
> HDFS Stat On One Of The File (Keep in Mind the output backet is based on event time that is MDT/MST vs the stat date of GMT)
>  hadoop fs -stat "%y %n"  /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100
> -log.1484326813840.snappy.tmp
> 17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp
> 
> Thanks
> Justin
> 
>> On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <de...@cloudera.com> wrote:
>> Hi Justin,
>> 
>> Could you please share your config file with us?
>> 
>> Thanks,
>> Denes
>> 
>> 
>>> On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com> wrote:
>>> sorry for cross posting to user and dev. I have recently set up a flume configuration where we are using the regex_extractor interceptor to parse the actual event date from the record flowing through the Flume source, then using that date to build the HDFS sink bucket path. However, it appears that the hdfs.idleTimeout value is not honored in this configuration. It does work when using the timestamp interceptor you build the output path.
>>> 
>>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are never closed or renamed until I restart or shutdown Flume. Our flume is configured to roll based on size or output path, and the files rename/close/roll fine based on size, however the last file in each output path is always left with the .tmp extension until we restart Flume. I would expect that the file would be renamed and closed if there are no records written to this file after the idleTimeout is reached.
>>> 
>>> Could I be missing something, or is this a known bug with the regex_extract interceptor?
>>> 
>>> Thanks
>>> Justin
> 

Re: hdfs.idleTime

Posted by Justin Workman <ju...@gmail.com>.
Absolutey, see below. Just to reiterate, when using the timestamp
interceptor values to build the output path based on timestamp in the flume
header, things roll correct. The files also roll just fine base on file
size as well. However when using the regex_interceptor to get the actual
events timestamp to use in the output path, the last file in each directory
does not ever rename/close until flume is restarted.


*flume-conf.properties*
agent1.sources  = fpssKafkaTopic
agent1.channels = fpssHdfsFileChannel
agent1.sinks = fpssHdfsSink

agent1.sources.fpssKafkaTopic.type =
org.apache.flume.source.kafka.KafkaSource
agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized
agent1.sources.fpssKafkaTopic.groupId =  flume-first-pass-stream-sessionized
agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
agent1.sources.fpssKafkaTopic.interceptors.i2.type =
org.apache.flume.interceptor.HostInterceptor$Builder
agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
agent1.sources.fpssKafkaTopic.interceptors.i3.regex =
^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d)T(\\d\\d):.*\\"\\}.*$
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100

agent1.channels.fpssHdfsFileChannel.type = file
agent1.channels.fpssHdfsFileChannel.checkpointDir =
/opt/flume/file-channel/fpss/checkpoint
agent1.channels.fpssHdfsFileChannel.dataDirs =
/opt/flume/file-channel/fpss/data

agent1.sinks.fpssHdfsSink.type = hdfs
agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
agent1.sinks.fpssHdfsSink.hdfs.path =
hdfs://prodcluster/flumedata/processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for
privacy>
agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0
agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
## Account for compression. See flume-2128
## My calculation: 512 * 1024 * 1024 * 2.75
agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
# Close file if idle more than 300 seconds
agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000

*HDFS Output Since Midnight (Notice the last file is never closed/renamed)*
 hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 7 items
-rw-r--r--   3 b2c_runtime hadoop  513710580 2017-01-13 00:09
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815397.snappy
-rw-r--r--   3 b2c_runtime hadoop  514439844 2017-01-13 00:18
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815398.snappy
-rw-r--r--   3 b2c_runtime hadoop  515125962 2017-01-13 00:28
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815399.snappy
-rw-r--r--   3 b2c_runtime hadoop  513010837 2017-01-13 00:38
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815400.snappy
-rw-r--r--   3 b2c_runtime hadoop  511315467 2017-01-13 00:49
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815401.snappy
-rw-r--r--   3 b2c_runtime hadoop  508420966 2017-01-13 00:59
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815402.snappy
-rw-r--r--   3 b2c_runtime hadoop    2503353 2017-01-13 00:59
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815403.snappy.tmp
Found 6 items
-rw-r--r--   3 b2c_runtime hadoop  509116221 2017-01-13 01:10
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415705.snappy
-rw-r--r--   3 b2c_runtime hadoop  507800675 2017-01-13 01:21
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415706.snappy
-rw-r--r--   3 b2c_runtime hadoop  504432110 2017-01-13 01:32
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415707.snappy
-rw-r--r--   3 b2c_runtime hadoop  501932914 2017-01-13 01:42
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415708.snappy
-rw-r--r--   3 b2c_runtime hadoop  498136257 2017-01-13 01:50
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415709.snappy
-rw-r--r--   3 b2c_runtime hadoop      60539 2017-01-13 01:50
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415710.snappy.tmp
Found 6 items
-rw-r--r--   3 b2c_runtime hadoop  500879399 2017-01-13 02:11
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016017.snappy
-rw-r--r--   3 b2c_runtime hadoop  501827071 2017-01-13 02:21
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016018.snappy
-rw-r--r--   3 b2c_runtime hadoop  501489101 2017-01-13 02:32
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016019.snappy
-rw-r--r--   3 b2c_runtime hadoop  501527838 2017-01-13 02:43
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016020.snappy
-rw-r--r--   3 b2c_runtime hadoop  499393977 2017-01-13 02:54
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016021.snappy
-rw-r--r--   3 b2c_runtime hadoop    1282327 2017-01-13 02:54
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016022.snappy.tmp
Found 6 items
-rw-r--r--   3 b2c_runtime hadoop  501033294 2017-01-13 03:10
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615579.snappy
-rw-r--r--   3 b2c_runtime hadoop  500933906 2017-01-13 03:20
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615580.snappy
-rw-r--r--   3 b2c_runtime hadoop  505869233 2017-01-13 03:31
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615581.snappy
-rw-r--r--   3 b2c_runtime hadoop  502910608 2017-01-13 03:41
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615582.snappy
-rw-r--r--   3 b2c_runtime hadoop  499561080 2017-01-13 03:52
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615583.snappy
-rw-r--r--   3 b2c_runtime hadoop    3616826 2017-01-13 03:52
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615584.snappy.tmp
Found 6 items
-rw-r--r--   3 b2c_runtime hadoop  502243204 2017-01-13 04:11
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215893.snappy
-rw-r--r--   3 b2c_runtime hadoop  508966498 2017-01-13 04:22
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215894.snappy
-rw-r--r--   3 b2c_runtime hadoop  510972236 2017-01-13 04:34
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215895.snappy
-rw-r--r--   3 b2c_runtime hadoop  513225577 2017-01-13 04:46
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215896.snappy
-rw-r--r--   3 b2c_runtime hadoop  512743679 2017-01-13 04:57
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215897.snappy
-rw-r--r--   3 b2c_runtime hadoop    3888775 2017-01-13 04:57
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215898.snappy.tmp
Found 7 items
-rw-r--r--   3 b2c_runtime hadoop  515832251 2017-01-13 05:11
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811983.snappy
-rw-r--r--   3 b2c_runtime hadoop  518077964 2017-01-13 05:20
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811984.snappy
-rw-r--r--   3 b2c_runtime hadoop  519490676 2017-01-13 05:29
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811985.snappy
-rw-r--r--   3 b2c_runtime hadoop  519105563 2017-01-13 05:37
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811986.snappy
-rw-r--r--   3 b2c_runtime hadoop  518672209 2017-01-13 05:46
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811987.snappy
-rw-r--r--   3 b2c_runtime hadoop  520019853 2017-01-13 05:53
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811988.snappy
-rw-r--r--   3 b2c_runtime hadoop    1574211 2017-01-13 05:53
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811989.snappy.tmp
Found 9 items
-rw-r--r--   3 b2c_runtime hadoop  521428204 2017-01-13 06:07
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413743.snappy
-rw-r--r--   3 b2c_runtime hadoop  519885769 2017-01-13 06:15
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413744.snappy
-rw-r--r--   3 b2c_runtime hadoop  519050891 2017-01-13 06:21
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413745.snappy
-rw-r--r--   3 b2c_runtime hadoop  520691322 2017-01-13 06:29
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413746.snappy
-rw-r--r--   3 b2c_runtime hadoop  520902319 2017-01-13 06:36
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413747.snappy
-rw-r--r--   3 b2c_runtime hadoop  520831873 2017-01-13 06:42
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413748.snappy
-rw-r--r--   3 b2c_runtime hadoop  519785647 2017-01-13 06:49
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413749.snappy
-rw-r--r--   3 b2c_runtime hadoop  520590143 2017-01-13 06:55
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413750.snappy
-rw-r--r--   3 b2c_runtime hadoop    4621367 2017-01-13 06:55
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413751.snappy.tmp
Found 11 items
-rw-r--r--   3 b2c_runtime hadoop  522623760 2017-01-13 07:06
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015214.snappy
-rw-r--r--   3 b2c_runtime hadoop  523065112 2017-01-13 07:12
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015215.snappy
-rw-r--r--   3 b2c_runtime hadoop  523445533 2017-01-13 07:18
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015216.snappy
-rw-r--r--   3 b2c_runtime hadoop  523084945 2017-01-13 07:24
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015217.snappy
-rw-r--r--   3 b2c_runtime hadoop  524283976 2017-01-13 07:30
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015218.snappy
-rw-r--r--   3 b2c_runtime hadoop  523923379 2017-01-13 07:36
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015219.snappy
-rw-r--r--   3 b2c_runtime hadoop  523910723 2017-01-13 07:42
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015220.snappy
-rw-r--r--   3 b2c_runtime hadoop  524266095 2017-01-13 07:47
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015221.snappy
-rw-r--r--   3 b2c_runtime hadoop  523002505 2017-01-13 07:53
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015222.snappy
-rw-r--r--   3 b2c_runtime hadoop  520706211 2017-01-13 07:58
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015223.snappy
-rw-r--r--   3 b2c_runtime hadoop    8051588 2017-01-13 07:58
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015224.snappy.tmp
Found 11 items
-rw-r--r--   3 b2c_runtime hadoop  520528155 2017-01-13 08:05
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618433.snappy
-rw-r--r--   3 b2c_runtime hadoop  521761390 2017-01-13 08:11
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618434.snappy
-rw-r--r--   3 b2c_runtime hadoop  522548272 2017-01-13 08:16
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618435.snappy
-rw-r--r--   3 b2c_runtime hadoop  522616117 2017-01-13 08:22
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618436.snappy
-rw-r--r--   3 b2c_runtime hadoop  525953759 2017-01-13 08:28
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618437.snappy
-rw-r--r--   3 b2c_runtime hadoop  524475009 2017-01-13 08:34
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618438.snappy
-rw-r--r--   3 b2c_runtime hadoop  523995339 2017-01-13 08:40
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618439.snappy
-rw-r--r--   3 b2c_runtime hadoop  524188832 2017-01-13 08:47
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618440.snappy
-rw-r--r--   3 b2c_runtime hadoop  525303001 2017-01-13 08:53
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618441.snappy
-rw-r--r--   3 b2c_runtime hadoop  525606532 2017-01-13 08:59
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618442.snappy
-rw-r--r--   3 b2c_runtime hadoop    4486982 2017-01-13 08:59
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618443.snappy.tmp
Found 11 items
-rw-r--r--   3 b2c_runtime hadoop  525207364 2017-01-13 09:06
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216987.snappy
-rw-r--r--   3 b2c_runtime hadoop  526105891 2017-01-13 09:12
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216988.snappy
-rw-r--r--   3 b2c_runtime hadoop  526426735 2017-01-13 09:18
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216989.snappy
-rw-r--r--   3 b2c_runtime hadoop  525298099 2017-01-13 09:24
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216990.snappy
-rw-r--r--   3 b2c_runtime hadoop  525282945 2017-01-13 09:30
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216991.snappy
-rw-r--r--   3 b2c_runtime hadoop  523921005 2017-01-13 09:36
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216992.snappy
-rw-r--r--   3 b2c_runtime hadoop  524827705 2017-01-13 09:42
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216993.snappy
-rw-r--r--   3 b2c_runtime hadoop  524203463 2017-01-13 09:47
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216994.snappy
-rw-r--r--   3 b2c_runtime hadoop  524678485 2017-01-13 09:53
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216995.snappy
-rw-r--r--   3 b2c_runtime hadoop  524598220 2017-01-13 09:59
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216996.snappy
-rw-r--r--   3 b2c_runtime hadoop    3877959 2017-01-13 09:59
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216997.snappy.tmp
Found 10 items
-rw-r--r--   3 b2c_runtime hadoop  523000460 2017-01-13 10:06
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813831.snappy
-rw-r--r--   3 b2c_runtime hadoop  523455154 2017-01-13 10:12
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813832.snappy
-rw-r--r--   3 b2c_runtime hadoop  525465618 2017-01-13 10:18
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813833.snappy
-rw-r--r--   3 b2c_runtime hadoop  524630955 2017-01-13 10:24
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813834.snappy
-rw-r--r--   3 b2c_runtime hadoop  527780298 2017-01-13 10:30
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813835.snappy
-rw-r--r--   3 b2c_runtime hadoop  526565562 2017-01-13 10:37
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813836.snappy
-rw-r--r--   3 b2c_runtime hadoop  524936336 2017-01-13 10:43
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813837.snappy
-rw-r--r--   3 b2c_runtime hadoop  524565610 2017-01-13 10:49
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813838.snappy
-rw-r--r--   3 b2c_runtime hadoop  524276950 2017-01-13 10:55
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813839.snappy
-rw-r--r--   3 b2c_runtime hadoop     654810 2017-01-13 10:55
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813840.snappy.tmp
Found 11 items
-rw-r--r--   3 b2c_runtime hadoop  524174553 2017-01-13 11:06
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415712.snappy
-rw-r--r--   3 b2c_runtime hadoop  524127864 2017-01-13 11:12
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415713.snappy
-rw-r--r--   3 b2c_runtime hadoop  524778919 2017-01-13 11:18
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415714.snappy
-rw-r--r--   3 b2c_runtime hadoop  524851182 2017-01-13 11:24
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415715.snappy
-rw-r--r--   3 b2c_runtime hadoop  525156750 2017-01-13 11:30
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415716.snappy
-rw-r--r--   3 b2c_runtime hadoop  525334538 2017-01-13 11:35
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415717.snappy
-rw-r--r--   3 b2c_runtime hadoop  527346578 2017-01-13 11:41
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415718.snappy
-rw-r--r--   3 b2c_runtime hadoop  525592734 2017-01-13 11:47
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415719.snappy
-rw-r--r--   3 b2c_runtime hadoop  525502291 2017-01-13 11:53
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415720.snappy
-rw-r--r--   3 b2c_runtime hadoop  523135186 2017-01-13 11:58
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415721.snappy
-rw-r--r--   3 b2c_runtime hadoop    9967141 2017-01-13 11:58
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415722.snappy.tmp
Found 7 items
-rw-r--r--   3 b2c_runtime hadoop  520881970 2017-01-13 12:05
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016849.snappy
-rw-r--r--   3 b2c_runtime hadoop  522340745 2017-01-13 12:11
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016850.snappy
-rw-r--r--   3 b2c_runtime hadoop  524156495 2017-01-13 12:17
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016851.snappy
-rw-r--r--   3 b2c_runtime hadoop  523482390 2017-01-13 12:23
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016852.snappy
-rw-r--r--   3 b2c_runtime hadoop  524096591 2017-01-13 12:29
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016853.snappy
-rw-r--r--   3 b2c_runtime hadoop  523184628 2017-01-13 12:35
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016854.snappy
-rw-r--r--   3 b2c_runtime hadoop   10981218 2017-01-13 12:35
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016855.snappy.tmp

*HDFS Stat On One Of The File (Keep in Mind the output backet is based on
event time that is MDT/MST vs the stat date of GMT)*
 hadoop fs -stat "%y %n"
 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100
-log.1484326813840.snappy.tmp
17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp

Thanks
Justin

On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <de...@cloudera.com> wrote:

> Hi Justin,
>
> Could you please share your config file with us?
>
> Thanks,
> Denes
>
>
> On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com>
> wrote:
>
>> sorry for cross posting to user and dev. I have recently set up a flume
>> configuration where we are using the regex_extractor interceptor to parse
>> the actual event date from the record flowing through the Flume source,
>> then using that date to build the HDFS sink bucket path. However, it
>> appears that the hdfs.idleTimeout value is not honored in this
>> configuration. It does work when using the timestamp interceptor you build
>> the output path.
>>
>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files
>> are never closed or renamed until I restart or shutdown Flume. Our flume is
>> configured to roll based on size or output path, and the files
>> rename/close/roll fine based on size, however the last file in each output
>> path is always left with the .tmp extension until we restart Flume. I would
>> expect that the file would be renamed and closed if there are no records
>> written to this file after the idleTimeout is reached.
>>
>> Could I be missing something, or is this a known bug with the
>> regex_extract interceptor?
>>
>> Thanks
>> Justin
>>
>

Re: hdfs.idleTime

Posted by Denes Arvay <de...@cloudera.com>.
Hi Justin,

Could you please share your config file with us?

Thanks,
Denes

On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com> wrote:

> sorry for cross posting to user and dev. I have recently set up a flume
> configuration where we are using the regex_extractor interceptor to parse
> the actual event date from the record flowing through the Flume source,
> then using that date to build the HDFS sink bucket path. However, it
> appears that the hdfs.idleTimeout value is not honored in this
> configuration. It does work when using the timestamp interceptor you build
> the output path.
>
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
> never closed or renamed until I restart or shutdown Flume. Our flume is
> configured to roll based on size or output path, and the files
> rename/close/roll fine based on size, however the last file in each output
> path is always left with the .tmp extension until we restart Flume. I would
> expect that the file would be renamed and closed if there are no records
> written to this file after the idleTimeout is reached.
>
> Could I be missing something, or is this a known bug with the
> regex_extract interceptor?
>
> Thanks
> Justin
>

Re: hdfs.idleTime

Posted by Justin Workman <ju...@gmail.com>.
More details

Flume 1.6 - Core Apache version.
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).

On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com>
wrote:

> sorry for cross posting to user and dev. I have recently set up a flume
> configuration where we are using the regex_extractor interceptor to parse
> the actual event date from the record flowing through the Flume source,
> then using that date to build the HDFS sink bucket path. However, it
> appears that the hdfs.idleTimeout value is not honored in this
> configuration. It does work when using the timestamp interceptor you build
> the output path.
>
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
> never closed or renamed until I restart or shutdown Flume. Our flume is
> configured to roll based on size or output path, and the files
> rename/close/roll fine based on size, however the last file in each output
> path is always left with the .tmp extension until we restart Flume. I would
> expect that the file would be renamed and closed if there are no records
> written to this file after the idleTimeout is reached.
>
> Could I be missing something, or is this a known bug with the
> regex_extract interceptor?
>
> Thanks
> Justin
>