You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Justin Workman <ju...@gmail.com> on 2017/01/12 19:20:14 UTC
hdfs.idleTime
sorry for cross posting to user and dev. I have recently set up a flume
configuration where we are using the regex_extractor interceptor to parse
the actual event date from the record flowing through the Flume source,
then using that date to build the HDFS sink bucket path. However, it
appears that the hdfs.idleTimeout value is not honored in this
configuration. It does work when using the timestamp interceptor you build
the output path.
I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
never closed or renamed until I restart or shutdown Flume. Our flume is
configured to roll based on size or output path, and the files
rename/close/roll fine based on size, however the last file in each output
path is always left with the .tmp extension until we restart Flume. I would
expect that the file would be renamed and closed if there are no records
written to this file after the idleTimeout is reached.
Could I be missing something, or is this a known bug with the regex_extract
interceptor?
Thanks
Justin
Re: hdfs.idleTime
Posted by Justin Workman <ju...@gmail.com>.
This was posted and resolved on the user thread. Typo in my configuration was the issue.
Thanks
Justin
Sent from my iPhone
> On Jan 17, 2017, at 12:42 AM, Tristan Stevens <tr...@cloudera.com> wrote:
>
> Hi Justin,
> Please can you post your agent config and also any HDFS logs? Ideally you should be seeing INFO logs as follows: “Closing Idle Bucketwriter”.
>
> Tristan
>
> Tristan Stevens
> Senior Solutions Architect
> Cloudera, Inc. | www.cloudera.com
> m +44(0)7808 986422 | tristan@cloudera.com
>
> <hadoop10.png> Celebrating a decade of community accomplishments
> cloudera.com/hadoop10
> #hadoop10
>
>> On 12 January 2017 at 19:23:18, Justin Workman (justinjworkman@gmail.com) wrote:
>>
>> More details
>>
>> Flume 1.6 - Core Apache version.
>> KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).
>>
>> On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com>
>> wrote:
>>
>> > sorry for cross posting to user and dev. I have recently set up a flume
>> > configuration where we are using the regex_extractor interceptor to parse
>> > the actual event date from the record flowing through the Flume source,
>> > then using that date to build the HDFS sink bucket path. However, it
>> > appears that the hdfs.idleTimeout value is not honored in this
>> > configuration. It does work when using the timestamp interceptor you build
>> > the output path.
>> >
>> > I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
>> > never closed or renamed until I restart or shutdown Flume. Our flume is
>> > configured to roll based on size or output path, and the files
>> > rename/close/roll fine based on size, however the last file in each output
>> > path is always left with the .tmp extension until we restart Flume. I would
>> > expect that the file would be renamed and closed if there are no records
>> > written to this file after the idleTimeout is reached.
>> >
>> > Could I be missing something, or is this a known bug with the
>> > regex_extract interceptor?
>> >
>> > Thanks
>> > Justin
>> >
Re: hdfs.idleTime
Posted by Justin Workman <ju...@gmail.com>.
This was posted and resolved on the user thread. Typo in my configuration was the issue.
Thanks
Justin
Sent from my iPhone
> On Jan 17, 2017, at 12:42 AM, Tristan Stevens <tr...@cloudera.com> wrote:
>
> Hi Justin,
> Please can you post your agent config and also any HDFS logs? Ideally you should be seeing INFO logs as follows: “Closing Idle Bucketwriter”.
>
> Tristan
>
> Tristan Stevens
> Senior Solutions Architect
> Cloudera, Inc. | www.cloudera.com
> m +44(0)7808 986422 | tristan@cloudera.com
>
> <hadoop10.png> Celebrating a decade of community accomplishments
> cloudera.com/hadoop10
> #hadoop10
>
>> On 12 January 2017 at 19:23:18, Justin Workman (justinjworkman@gmail.com) wrote:
>>
>> More details
>>
>> Flume 1.6 - Core Apache version.
>> KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).
>>
>> On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com>
>> wrote:
>>
>> > sorry for cross posting to user and dev. I have recently set up a flume
>> > configuration where we are using the regex_extractor interceptor to parse
>> > the actual event date from the record flowing through the Flume source,
>> > then using that date to build the HDFS sink bucket path. However, it
>> > appears that the hdfs.idleTimeout value is not honored in this
>> > configuration. It does work when using the timestamp interceptor you build
>> > the output path.
>> >
>> > I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
>> > never closed or renamed until I restart or shutdown Flume. Our flume is
>> > configured to roll based on size or output path, and the files
>> > rename/close/roll fine based on size, however the last file in each output
>> > path is always left with the .tmp extension until we restart Flume. I would
>> > expect that the file would be renamed and closed if there are no records
>> > written to this file after the idleTimeout is reached.
>> >
>> > Could I be missing something, or is this a known bug with the
>> > regex_extract interceptor?
>> >
>> > Thanks
>> > Justin
>> >
Re: hdfs.idleTime
Posted by Tristan Stevens <tr...@cloudera.com>.
Hi Justin,
Please can you post your agent config and also any HDFS logs? Ideally you should be seeing INFO logs as follows: “Closing Idle Bucketwriter”.
Tristan
Tristan Stevens
Senior Solutions Architect
Cloudera, Inc. | www.cloudera.com
m +44(0)7808 986422 | tristan@cloudera.com
Celebrating a decade of community accomplishments
cloudera.com/hadoop10
#hadoop10
On 12 January 2017 at 19:23:18, Justin Workman (justinjworkman@gmail.com) wrote:
More details
Flume 1.6 - Core Apache version.
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).
On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com>
wrote:
> sorry for cross posting to user and dev. I have recently set up a flume
> configuration where we are using the regex_extractor interceptor to parse
> the actual event date from the record flowing through the Flume source,
> then using that date to build the HDFS sink bucket path. However, it
> appears that the hdfs.idleTimeout value is not honored in this
> configuration. It does work when using the timestamp interceptor you build
> the output path.
>
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
> never closed or renamed until I restart or shutdown Flume. Our flume is
> configured to roll based on size or output path, and the files
> rename/close/roll fine based on size, however the last file in each output
> path is always left with the .tmp extension until we restart Flume. I would
> expect that the file would be renamed and closed if there are no records
> written to this file after the idleTimeout is reached.
>
> Could I be missing something, or is this a known bug with the
> regex_extract interceptor?
>
> Thanks
> Justin
>
Re: hdfs.idleTime
Posted by Justin Workman <ju...@gmail.com>.
More details
Flume 1.6 - Core Apache version.
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).
On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com>
wrote:
> sorry for cross posting to user and dev. I have recently set up a flume
> configuration where we are using the regex_extractor interceptor to parse
> the actual event date from the record flowing through the Flume source,
> then using that date to build the HDFS sink bucket path. However, it
> appears that the hdfs.idleTimeout value is not honored in this
> configuration. It does work when using the timestamp interceptor you build
> the output path.
>
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
> never closed or renamed until I restart or shutdown Flume. Our flume is
> configured to roll based on size or output path, and the files
> rename/close/roll fine based on size, however the last file in each output
> path is always left with the .tmp extension until we restart Flume. I would
> expect that the file would be renamed and closed if there are no records
> written to this file after the idleTimeout is reached.
>
> Could I be missing something, or is this a known bug with the
> regex_extract interceptor?
>
> Thanks
> Justin
>
Re: hdfs.idleTime
Posted by Justin Workman <ju...@gmail.com>.
Sorry for wasting anyones time. In reviewing my configuration, I have a
typo in the hdfs.idleTimeout configuration.
On Fri, Jan 13, 2017 at 2:14 PM, Justin Workman <ju...@gmail.com>
wrote:
> I'll try debug again. The output /regex seems to be fine, but I never see
> a call to close/rename the last files in each directory until flume shuts
> down or restarts.
>
> I would expect to see this call when the idleTimeout value is reached.
>
> Sent from my iPhone
>
> On Jan 13, 2017, at 2:05 PM, iain wright <ia...@gmail.com> wrote:
>
> Might be worth trying the debug output (I forget exact sink name) to just
> log the headers being attached to events after the interceptor to validate
> the regex is working correctly, and for all events.
>
> I setup this exact config at previous company so I know it works.
>
> I also remember needing to escape the regex in an odd way due to how java
> was loading/parsing the config
>
> Best,
> Iain
>
> Sent from my iPhone
>
> On Jan 13, 2017, at 12:00 PM, Justin Workman <ju...@gmail.com>
> wrote:
>
> Absolutey, see below. Just to reiterate, when using the timestamp
> interceptor values to build the output path based on timestamp in the flume
> header, things roll correct. The files also roll just fine base on file
> size as well. However when using the regex_interceptor to get the actual
> events timestamp to use in the output path, the last file in each directory
> does not ever rename/close until flume is restarted.
>
>
> *flume-conf.properties*
> agent1.sources = fpssKafkaTopic
> agent1.channels = fpssHdfsFileChannel
> agent1.sinks = fpssHdfsSink
>
> agent1.sources.fpssKafkaTopic.type = org.apache.flume.source.kafka.
> KafkaSource
> agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
> agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized
> agent1.sources.fpssKafkaTopic.groupId = flume-first-pass-stream-
> sessionized
> agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
> agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
> agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
> agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
> agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
> agent1.sources.fpssKafkaTopic.interceptors.i2.type =
> org.apache.flume.interceptor.HostInterceptor$Builder
> agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
> agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
> agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
> agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
> agent1.sources.fpssKafkaTopic.interceptors.i3.regex =
> ^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\
> \d\\d)T(\\d\\d):.*\\"\\}.*$
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
> agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100
>
> agent1.channels.fpssHdfsFileChannel.type = file
> agent1.channels.fpssHdfsFileChannel.checkpointDir =
> /opt/flume/file-channel/fpss/checkpoint
> agent1.channels.fpssHdfsFileChannel.dataDirs =
> /opt/flume/file-channel/fpss/data
>
> agent1.sinks.fpssHdfsSink.type = hdfs
> agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
> agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
> agent1.sinks.fpssHdfsSink.hdfs.path = hdfs://prodcluster/flumedata/
> processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
> agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
> agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for
> privacy>
> agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0
> agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
> ## Account for compression. See flume-2128
> ## My calculation: 512 * 1024 * 1024 * 2.75
> agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
> # Close file if idle more than 300 seconds
> agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
> agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
> agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
> agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
> agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
> agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
> agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
> agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
> agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000
>
> *HDFS Output Since Midnight (Notice the last file is never closed/renamed)*
> hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
> 17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Found 7 items
> -rw-r--r-- 3 b2c_runtime hadoop 513710580 2017-01-13 00:09
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815397.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 514439844 2017-01-13 00:18
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815398.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 515125962 2017-01-13 00:28
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815399.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 513010837 2017-01-13 00:38
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815400.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 511315467 2017-01-13 00:49
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815401.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 508420966 2017-01-13 00:59
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815402.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 2503353 2017-01-13 00:59
> /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.
> 1484290815403.snappy.tmp
> Found 6 items
> -rw-r--r-- 3 b2c_runtime hadoop 509116221 2017-01-13 01:10
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415705.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 507800675 2017-01-13 01:21
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415706.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 504432110 2017-01-13 01:32
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415707.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 501932914 2017-01-13 01:42
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415708.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 498136257 2017-01-13 01:50
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415709.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 60539 2017-01-13 01:50
> /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.
> 1484294415710.snappy.tmp
> Found 6 items
> -rw-r--r-- 3 b2c_runtime hadoop 500879399 2017-01-13 02:11
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016017.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 501827071 2017-01-13 02:21
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016018.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 501489101 2017-01-13 02:32
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016019.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 501527838 2017-01-13 02:43
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016020.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 499393977 2017-01-13 02:54
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016021.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 1282327 2017-01-13 02:54
> /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.
> 1484298016022.snappy.tmp
> Found 6 items
> -rw-r--r-- 3 b2c_runtime hadoop 501033294 2017-01-13 03:10
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615579.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 500933906 2017-01-13 03:20
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615580.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 505869233 2017-01-13 03:31
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615581.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 502910608 2017-01-13 03:41
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615582.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 499561080 2017-01-13 03:52
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615583.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 3616826 2017-01-13 03:52
> /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.
> 1484301615584.snappy.tmp
> Found 6 items
> -rw-r--r-- 3 b2c_runtime hadoop 502243204 2017-01-13 04:11
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215893.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 508966498 2017-01-13 04:22
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215894.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 510972236 2017-01-13 04:34
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215895.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 513225577 2017-01-13 04:46
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215896.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 512743679 2017-01-13 04:57
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215897.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 3888775 2017-01-13 04:57
> /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.
> 1484305215898.snappy.tmp
> Found 7 items
> -rw-r--r-- 3 b2c_runtime hadoop 515832251 2017-01-13 05:11
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811983.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 518077964 2017-01-13 05:20
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811984.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519490676 2017-01-13 05:29
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811985.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519105563 2017-01-13 05:37
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811986.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 518672209 2017-01-13 05:46
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811987.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520019853 2017-01-13 05:53
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811988.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 1574211 2017-01-13 05:53
> /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.
> 1484308811989.snappy.tmp
> Found 9 items
> -rw-r--r-- 3 b2c_runtime hadoop 521428204 2017-01-13 06:07
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413743.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519885769 2017-01-13 06:15
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413744.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519050891 2017-01-13 06:21
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413745.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520691322 2017-01-13 06:29
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413746.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520902319 2017-01-13 06:36
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413747.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520831873 2017-01-13 06:42
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413748.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519785647 2017-01-13 06:49
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413749.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520590143 2017-01-13 06:55
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413750.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 4621367 2017-01-13 06:55
> /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.
> 1484312413751.snappy.tmp
> Found 11 items
> -rw-r--r-- 3 b2c_runtime hadoop 522623760 2017-01-13 07:06
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015214.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523065112 2017-01-13 07:12
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015215.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523445533 2017-01-13 07:18
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015216.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523084945 2017-01-13 07:24
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015217.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524283976 2017-01-13 07:30
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015218.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523923379 2017-01-13 07:36
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015219.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523910723 2017-01-13 07:42
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015220.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524266095 2017-01-13 07:47
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015221.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523002505 2017-01-13 07:53
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015222.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520706211 2017-01-13 07:58
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015223.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 8051588 2017-01-13 07:58
> /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.
> 1484316015224.snappy.tmp
> Found 11 items
> -rw-r--r-- 3 b2c_runtime hadoop 520528155 2017-01-13 08:05
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618433.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 521761390 2017-01-13 08:11
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618434.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 522548272 2017-01-13 08:16
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618435.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 522616117 2017-01-13 08:22
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618436.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525953759 2017-01-13 08:28
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618437.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524475009 2017-01-13 08:34
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618438.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523995339 2017-01-13 08:40
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618439.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524188832 2017-01-13 08:47
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618440.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525303001 2017-01-13 08:53
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618441.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525606532 2017-01-13 08:59
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618442.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 4486982 2017-01-13 08:59
> /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.
> 1484319618443.snappy.tmp
> Found 11 items
> -rw-r--r-- 3 b2c_runtime hadoop 525207364 2017-01-13 09:06
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216987.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 526105891 2017-01-13 09:12
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216988.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 526426735 2017-01-13 09:18
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216989.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525298099 2017-01-13 09:24
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216990.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525282945 2017-01-13 09:30
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216991.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523921005 2017-01-13 09:36
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216992.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524827705 2017-01-13 09:42
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216993.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524203463 2017-01-13 09:47
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216994.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524678485 2017-01-13 09:53
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216995.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524598220 2017-01-13 09:59
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216996.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 3877959 2017-01-13 09:59
> /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.
> 1484323216997.snappy.tmp
> Found 10 items
> -rw-r--r-- 3 b2c_runtime hadoop 523000460 2017-01-13 10:06
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813831.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523455154 2017-01-13 10:12
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813832.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525465618 2017-01-13 10:18
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813833.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524630955 2017-01-13 10:24
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813834.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 527780298 2017-01-13 10:30
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813835.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 526565562 2017-01-13 10:37
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813836.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524936336 2017-01-13 10:43
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813837.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524565610 2017-01-13 10:49
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813838.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524276950 2017-01-13 10:55
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813839.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 654810 2017-01-13 10:55
> /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.
> 1484326813840.snappy.tmp
> Found 11 items
> -rw-r--r-- 3 b2c_runtime hadoop 524174553 2017-01-13 11:06
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415712.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524127864 2017-01-13 11:12
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415713.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524778919 2017-01-13 11:18
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415714.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524851182 2017-01-13 11:24
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415715.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525156750 2017-01-13 11:30
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415716.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525334538 2017-01-13 11:35
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415717.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 527346578 2017-01-13 11:41
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415718.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525592734 2017-01-13 11:47
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415719.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525502291 2017-01-13 11:53
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415720.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523135186 2017-01-13 11:58
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415721.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 9967141 2017-01-13 11:58
> /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.
> 1484330415722.snappy.tmp
> Found 7 items
> -rw-r--r-- 3 b2c_runtime hadoop 520881970 2017-01-13 12:05
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016849.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 522340745 2017-01-13 12:11
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016850.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524156495 2017-01-13 12:17
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016851.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523482390 2017-01-13 12:23
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016852.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524096591 2017-01-13 12:29
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016853.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523184628 2017-01-13 12:35
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016854.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 10981218 2017-01-13 12:35
> /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.
> 1484334016855.snappy.tmp
>
> *HDFS Stat On One Of The File (Keep in Mind the output backet is based on
> event time that is MDT/MST vs the stat date of GMT)*
> hadoop fs -stat "%y %n" /flumedata/processed/first-
> pass-stream/2017/01/13/10-00/flumeload100
> -log.1484326813840.snappy.tmp
> 17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp
>
> Thanks
> Justin
>
> On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <de...@cloudera.com> wrote:
>
>> Hi Justin,
>>
>> Could you please share your config file with us?
>>
>> Thanks,
>> Denes
>>
>>
>> On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com>
>> wrote:
>>
>>> sorry for cross posting to user and dev. I have recently set up a flume
>>> configuration where we are using the regex_extractor interceptor to parse
>>> the actual event date from the record flowing through the Flume source,
>>> then using that date to build the HDFS sink bucket path. However, it
>>> appears that the hdfs.idleTimeout value is not honored in this
>>> configuration. It does work when using the timestamp interceptor you build
>>> the output path.
>>>
>>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files
>>> are never closed or renamed until I restart or shutdown Flume. Our flume is
>>> configured to roll based on size or output path, and the files
>>> rename/close/roll fine based on size, however the last file in each output
>>> path is always left with the .tmp extension until we restart Flume. I would
>>> expect that the file would be renamed and closed if there are no records
>>> written to this file after the idleTimeout is reached.
>>>
>>> Could I be missing something, or is this a known bug with the
>>> regex_extract interceptor?
>>>
>>> Thanks
>>> Justin
>>>
>>
>
Re: hdfs.idleTime
Posted by Justin Workman <ju...@gmail.com>.
I'll try debug again. The output /regex seems to be fine, but I never see a call to close/rename the last files in each directory until flume shuts down or restarts.
I would expect to see this call when the idleTimeout value is reached.
Sent from my iPhone
> On Jan 13, 2017, at 2:05 PM, iain wright <ia...@gmail.com> wrote:
>
> Might be worth trying the debug output (I forget exact sink name) to just log the headers being attached to events after the interceptor to validate the regex is working correctly, and for all events.
>
> I setup this exact config at previous company so I know it works.
>
> I also remember needing to escape the regex in an odd way due to how java was loading/parsing the config
>
> Best,
> Iain
>
> Sent from my iPhone
>
>> On Jan 13, 2017, at 12:00 PM, Justin Workman <ju...@gmail.com> wrote:
>>
>> Absolutey, see below. Just to reiterate, when using the timestamp interceptor values to build the output path based on timestamp in the flume header, things roll correct. The files also roll just fine base on file size as well. However when using the regex_interceptor to get the actual events timestamp to use in the output path, the last file in each directory does not ever rename/close until flume is restarted.
>>
>>
>> flume-conf.properties
>> agent1.sources = fpssKafkaTopic
>> agent1.channels = fpssHdfsFileChannel
>> agent1.sinks = fpssHdfsSink
>>
>> agent1.sources.fpssKafkaTopic.type = org.apache.flume.source.kafka.KafkaSource
>> agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
>> agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized
>> agent1.sources.fpssKafkaTopic.groupId = flume-first-pass-stream-sessionized
>> agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
>> agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
>> agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
>> agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
>> agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
>> agent1.sources.fpssKafkaTopic.interceptors.i2.type = org.apache.flume.interceptor.HostInterceptor$Builder
>> agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
>> agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
>> agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
>> agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
>> agent1.sources.fpssKafkaTopic.interceptors.i3.regex = ^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d)T(\\d\\d):.*\\"\\}.*$
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
>> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
>> agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100
>>
>> agent1.channels.fpssHdfsFileChannel.type = file
>> agent1.channels.fpssHdfsFileChannel.checkpointDir = /opt/flume/file-channel/fpss/checkpoint
>> agent1.channels.fpssHdfsFileChannel.dataDirs = /opt/flume/file-channel/fpss/data
>>
>> agent1.sinks.fpssHdfsSink.type = hdfs
>> agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
>> agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
>> agent1.sinks.fpssHdfsSink.hdfs.path = hdfs://prodcluster/flumedata/processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
>> agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
>> agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for privacy>
>> agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0
>> agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
>> ## Account for compression. See flume-2128
>> ## My calculation: 512 * 1024 * 1024 * 2.75
>> agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
>> # Close file if idle more than 300 seconds
>> agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
>> agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
>> agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
>> agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
>> agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
>> agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
>> agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
>> agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
>> agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000
>>
>> HDFS Output Since Midnight (Notice the last file is never closed/renamed)
>> hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
>> 17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> Found 7 items
>> -rw-r--r-- 3 b2c_runtime hadoop 513710580 2017-01-13 00:09 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815397.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 514439844 2017-01-13 00:18 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815398.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 515125962 2017-01-13 00:28 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815399.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 513010837 2017-01-13 00:38 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815400.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 511315467 2017-01-13 00:49 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815401.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 508420966 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815402.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 2503353 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815403.snappy.tmp
>> Found 6 items
>> -rw-r--r-- 3 b2c_runtime hadoop 509116221 2017-01-13 01:10 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415705.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 507800675 2017-01-13 01:21 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415706.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 504432110 2017-01-13 01:32 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415707.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 501932914 2017-01-13 01:42 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415708.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 498136257 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415709.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 60539 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415710.snappy.tmp
>> Found 6 items
>> -rw-r--r-- 3 b2c_runtime hadoop 500879399 2017-01-13 02:11 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016017.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 501827071 2017-01-13 02:21 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016018.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 501489101 2017-01-13 02:32 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016019.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 501527838 2017-01-13 02:43 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016020.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 499393977 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016021.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 1282327 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016022.snappy.tmp
>> Found 6 items
>> -rw-r--r-- 3 b2c_runtime hadoop 501033294 2017-01-13 03:10 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615579.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 500933906 2017-01-13 03:20 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615580.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 505869233 2017-01-13 03:31 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615581.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 502910608 2017-01-13 03:41 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615582.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 499561080 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615583.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 3616826 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615584.snappy.tmp
>> Found 6 items
>> -rw-r--r-- 3 b2c_runtime hadoop 502243204 2017-01-13 04:11 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215893.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 508966498 2017-01-13 04:22 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215894.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 510972236 2017-01-13 04:34 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215895.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 513225577 2017-01-13 04:46 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215896.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 512743679 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215897.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 3888775 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215898.snappy.tmp
>> Found 7 items
>> -rw-r--r-- 3 b2c_runtime hadoop 515832251 2017-01-13 05:11 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811983.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 518077964 2017-01-13 05:20 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811984.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 519490676 2017-01-13 05:29 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811985.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 519105563 2017-01-13 05:37 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811986.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 518672209 2017-01-13 05:46 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811987.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 520019853 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811988.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 1574211 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811989.snappy.tmp
>> Found 9 items
>> -rw-r--r-- 3 b2c_runtime hadoop 521428204 2017-01-13 06:07 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413743.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 519885769 2017-01-13 06:15 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413744.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 519050891 2017-01-13 06:21 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413745.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 520691322 2017-01-13 06:29 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413746.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 520902319 2017-01-13 06:36 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413747.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 520831873 2017-01-13 06:42 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413748.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 519785647 2017-01-13 06:49 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413749.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 520590143 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413750.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 4621367 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413751.snappy.tmp
>> Found 11 items
>> -rw-r--r-- 3 b2c_runtime hadoop 522623760 2017-01-13 07:06 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015214.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523065112 2017-01-13 07:12 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015215.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523445533 2017-01-13 07:18 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015216.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523084945 2017-01-13 07:24 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015217.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524283976 2017-01-13 07:30 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015218.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523923379 2017-01-13 07:36 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015219.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523910723 2017-01-13 07:42 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015220.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524266095 2017-01-13 07:47 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015221.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523002505 2017-01-13 07:53 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015222.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 520706211 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015223.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 8051588 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015224.snappy.tmp
>> Found 11 items
>> -rw-r--r-- 3 b2c_runtime hadoop 520528155 2017-01-13 08:05 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618433.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 521761390 2017-01-13 08:11 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618434.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 522548272 2017-01-13 08:16 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618435.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 522616117 2017-01-13 08:22 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618436.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525953759 2017-01-13 08:28 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618437.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524475009 2017-01-13 08:34 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618438.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523995339 2017-01-13 08:40 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618439.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524188832 2017-01-13 08:47 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618440.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525303001 2017-01-13 08:53 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618441.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525606532 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618442.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 4486982 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618443.snappy.tmp
>> Found 11 items
>> -rw-r--r-- 3 b2c_runtime hadoop 525207364 2017-01-13 09:06 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216987.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 526105891 2017-01-13 09:12 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216988.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 526426735 2017-01-13 09:18 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216989.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525298099 2017-01-13 09:24 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216990.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525282945 2017-01-13 09:30 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216991.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523921005 2017-01-13 09:36 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216992.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524827705 2017-01-13 09:42 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216993.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524203463 2017-01-13 09:47 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216994.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524678485 2017-01-13 09:53 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216995.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524598220 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216996.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 3877959 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216997.snappy.tmp
>> Found 10 items
>> -rw-r--r-- 3 b2c_runtime hadoop 523000460 2017-01-13 10:06 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813831.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523455154 2017-01-13 10:12 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813832.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525465618 2017-01-13 10:18 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813833.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524630955 2017-01-13 10:24 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813834.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 527780298 2017-01-13 10:30 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813835.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 526565562 2017-01-13 10:37 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813836.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524936336 2017-01-13 10:43 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813837.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524565610 2017-01-13 10:49 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813838.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524276950 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813839.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 654810 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813840.snappy.tmp
>> Found 11 items
>> -rw-r--r-- 3 b2c_runtime hadoop 524174553 2017-01-13 11:06 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415712.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524127864 2017-01-13 11:12 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415713.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524778919 2017-01-13 11:18 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415714.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524851182 2017-01-13 11:24 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415715.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525156750 2017-01-13 11:30 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415716.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525334538 2017-01-13 11:35 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415717.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 527346578 2017-01-13 11:41 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415718.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525592734 2017-01-13 11:47 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415719.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 525502291 2017-01-13 11:53 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415720.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523135186 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415721.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 9967141 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415722.snappy.tmp
>> Found 7 items
>> -rw-r--r-- 3 b2c_runtime hadoop 520881970 2017-01-13 12:05 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016849.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 522340745 2017-01-13 12:11 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016850.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524156495 2017-01-13 12:17 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016851.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523482390 2017-01-13 12:23 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016852.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 524096591 2017-01-13 12:29 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016853.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 523184628 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016854.snappy
>> -rw-r--r-- 3 b2c_runtime hadoop 10981218 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016855.snappy.tmp
>>
>> HDFS Stat On One Of The File (Keep in Mind the output backet is based on event time that is MDT/MST vs the stat date of GMT)
>> hadoop fs -stat "%y %n" /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100
>> -log.1484326813840.snappy.tmp
>> 17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> 2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp
>>
>> Thanks
>> Justin
>>
>>> On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <de...@cloudera.com> wrote:
>>> Hi Justin,
>>>
>>> Could you please share your config file with us?
>>>
>>> Thanks,
>>> Denes
>>>
>>>
>>>> On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com> wrote:
>>>> sorry for cross posting to user and dev. I have recently set up a flume configuration where we are using the regex_extractor interceptor to parse the actual event date from the record flowing through the Flume source, then using that date to build the HDFS sink bucket path. However, it appears that the hdfs.idleTimeout value is not honored in this configuration. It does work when using the timestamp interceptor you build the output path.
>>>>
>>>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are never closed or renamed until I restart or shutdown Flume. Our flume is configured to roll based on size or output path, and the files rename/close/roll fine based on size, however the last file in each output path is always left with the .tmp extension until we restart Flume. I would expect that the file would be renamed and closed if there are no records written to this file after the idleTimeout is reached.
>>>>
>>>> Could I be missing something, or is this a known bug with the regex_extract interceptor?
>>>>
>>>> Thanks
>>>> Justin
>>
Re: hdfs.idleTime
Posted by iain wright <ia...@gmail.com>.
Might be worth trying the debug output (I forget exact sink name) to just log the headers being attached to events after the interceptor to validate the regex is working correctly, and for all events.
I setup this exact config at previous company so I know it works.
I also remember needing to escape the regex in an odd way due to how java was loading/parsing the config
Best,
Iain
Sent from my iPhone
> On Jan 13, 2017, at 12:00 PM, Justin Workman <ju...@gmail.com> wrote:
>
> Absolutey, see below. Just to reiterate, when using the timestamp interceptor values to build the output path based on timestamp in the flume header, things roll correct. The files also roll just fine base on file size as well. However when using the regex_interceptor to get the actual events timestamp to use in the output path, the last file in each directory does not ever rename/close until flume is restarted.
>
>
> flume-conf.properties
> agent1.sources = fpssKafkaTopic
> agent1.channels = fpssHdfsFileChannel
> agent1.sinks = fpssHdfsSink
>
> agent1.sources.fpssKafkaTopic.type = org.apache.flume.source.kafka.KafkaSource
> agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
> agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized
> agent1.sources.fpssKafkaTopic.groupId = flume-first-pass-stream-sessionized
> agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
> agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
> agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
> agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
> agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
> agent1.sources.fpssKafkaTopic.interceptors.i2.type = org.apache.flume.interceptor.HostInterceptor$Builder
> agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
> agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
> agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
> agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
> agent1.sources.fpssKafkaTopic.interceptors.i3.regex = ^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d)T(\\d\\d):.*\\"\\}.*$
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
> agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
> agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100
>
> agent1.channels.fpssHdfsFileChannel.type = file
> agent1.channels.fpssHdfsFileChannel.checkpointDir = /opt/flume/file-channel/fpss/checkpoint
> agent1.channels.fpssHdfsFileChannel.dataDirs = /opt/flume/file-channel/fpss/data
>
> agent1.sinks.fpssHdfsSink.type = hdfs
> agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
> agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
> agent1.sinks.fpssHdfsSink.hdfs.path = hdfs://prodcluster/flumedata/processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
> agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
> agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for privacy>
> agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0
> agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
> ## Account for compression. See flume-2128
> ## My calculation: 512 * 1024 * 1024 * 2.75
> agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
> # Close file if idle more than 300 seconds
> agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
> agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
> agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
> agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
> agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
> agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
> agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
> agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
> agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000
>
> HDFS Output Since Midnight (Notice the last file is never closed/renamed)
> hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
> 17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Found 7 items
> -rw-r--r-- 3 b2c_runtime hadoop 513710580 2017-01-13 00:09 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815397.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 514439844 2017-01-13 00:18 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815398.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 515125962 2017-01-13 00:28 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815399.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 513010837 2017-01-13 00:38 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815400.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 511315467 2017-01-13 00:49 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815401.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 508420966 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815402.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 2503353 2017-01-13 00:59 /flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815403.snappy.tmp
> Found 6 items
> -rw-r--r-- 3 b2c_runtime hadoop 509116221 2017-01-13 01:10 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415705.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 507800675 2017-01-13 01:21 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415706.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 504432110 2017-01-13 01:32 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415707.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 501932914 2017-01-13 01:42 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415708.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 498136257 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415709.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 60539 2017-01-13 01:50 /flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415710.snappy.tmp
> Found 6 items
> -rw-r--r-- 3 b2c_runtime hadoop 500879399 2017-01-13 02:11 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016017.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 501827071 2017-01-13 02:21 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016018.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 501489101 2017-01-13 02:32 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016019.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 501527838 2017-01-13 02:43 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016020.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 499393977 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016021.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 1282327 2017-01-13 02:54 /flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016022.snappy.tmp
> Found 6 items
> -rw-r--r-- 3 b2c_runtime hadoop 501033294 2017-01-13 03:10 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615579.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 500933906 2017-01-13 03:20 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615580.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 505869233 2017-01-13 03:31 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615581.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 502910608 2017-01-13 03:41 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615582.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 499561080 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615583.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 3616826 2017-01-13 03:52 /flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615584.snappy.tmp
> Found 6 items
> -rw-r--r-- 3 b2c_runtime hadoop 502243204 2017-01-13 04:11 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215893.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 508966498 2017-01-13 04:22 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215894.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 510972236 2017-01-13 04:34 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215895.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 513225577 2017-01-13 04:46 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215896.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 512743679 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215897.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 3888775 2017-01-13 04:57 /flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215898.snappy.tmp
> Found 7 items
> -rw-r--r-- 3 b2c_runtime hadoop 515832251 2017-01-13 05:11 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811983.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 518077964 2017-01-13 05:20 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811984.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519490676 2017-01-13 05:29 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811985.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519105563 2017-01-13 05:37 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811986.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 518672209 2017-01-13 05:46 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811987.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520019853 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811988.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 1574211 2017-01-13 05:53 /flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811989.snappy.tmp
> Found 9 items
> -rw-r--r-- 3 b2c_runtime hadoop 521428204 2017-01-13 06:07 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413743.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519885769 2017-01-13 06:15 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413744.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519050891 2017-01-13 06:21 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413745.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520691322 2017-01-13 06:29 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413746.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520902319 2017-01-13 06:36 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413747.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520831873 2017-01-13 06:42 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413748.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 519785647 2017-01-13 06:49 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413749.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520590143 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413750.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 4621367 2017-01-13 06:55 /flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413751.snappy.tmp
> Found 11 items
> -rw-r--r-- 3 b2c_runtime hadoop 522623760 2017-01-13 07:06 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015214.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523065112 2017-01-13 07:12 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015215.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523445533 2017-01-13 07:18 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015216.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523084945 2017-01-13 07:24 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015217.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524283976 2017-01-13 07:30 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015218.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523923379 2017-01-13 07:36 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015219.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523910723 2017-01-13 07:42 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015220.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524266095 2017-01-13 07:47 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015221.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523002505 2017-01-13 07:53 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015222.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 520706211 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015223.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 8051588 2017-01-13 07:58 /flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015224.snappy.tmp
> Found 11 items
> -rw-r--r-- 3 b2c_runtime hadoop 520528155 2017-01-13 08:05 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618433.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 521761390 2017-01-13 08:11 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618434.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 522548272 2017-01-13 08:16 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618435.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 522616117 2017-01-13 08:22 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618436.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525953759 2017-01-13 08:28 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618437.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524475009 2017-01-13 08:34 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618438.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523995339 2017-01-13 08:40 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618439.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524188832 2017-01-13 08:47 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618440.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525303001 2017-01-13 08:53 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618441.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525606532 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618442.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 4486982 2017-01-13 08:59 /flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618443.snappy.tmp
> Found 11 items
> -rw-r--r-- 3 b2c_runtime hadoop 525207364 2017-01-13 09:06 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216987.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 526105891 2017-01-13 09:12 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216988.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 526426735 2017-01-13 09:18 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216989.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525298099 2017-01-13 09:24 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216990.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525282945 2017-01-13 09:30 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216991.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523921005 2017-01-13 09:36 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216992.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524827705 2017-01-13 09:42 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216993.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524203463 2017-01-13 09:47 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216994.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524678485 2017-01-13 09:53 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216995.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524598220 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216996.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 3877959 2017-01-13 09:59 /flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216997.snappy.tmp
> Found 10 items
> -rw-r--r-- 3 b2c_runtime hadoop 523000460 2017-01-13 10:06 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813831.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523455154 2017-01-13 10:12 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813832.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525465618 2017-01-13 10:18 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813833.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524630955 2017-01-13 10:24 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813834.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 527780298 2017-01-13 10:30 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813835.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 526565562 2017-01-13 10:37 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813836.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524936336 2017-01-13 10:43 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813837.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524565610 2017-01-13 10:49 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813838.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524276950 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813839.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 654810 2017-01-13 10:55 /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813840.snappy.tmp
> Found 11 items
> -rw-r--r-- 3 b2c_runtime hadoop 524174553 2017-01-13 11:06 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415712.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524127864 2017-01-13 11:12 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415713.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524778919 2017-01-13 11:18 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415714.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524851182 2017-01-13 11:24 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415715.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525156750 2017-01-13 11:30 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415716.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525334538 2017-01-13 11:35 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415717.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 527346578 2017-01-13 11:41 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415718.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525592734 2017-01-13 11:47 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415719.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 525502291 2017-01-13 11:53 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415720.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523135186 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415721.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 9967141 2017-01-13 11:58 /flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415722.snappy.tmp
> Found 7 items
> -rw-r--r-- 3 b2c_runtime hadoop 520881970 2017-01-13 12:05 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016849.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 522340745 2017-01-13 12:11 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016850.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524156495 2017-01-13 12:17 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016851.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523482390 2017-01-13 12:23 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016852.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 524096591 2017-01-13 12:29 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016853.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 523184628 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016854.snappy
> -rw-r--r-- 3 b2c_runtime hadoop 10981218 2017-01-13 12:35 /flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016855.snappy.tmp
>
> HDFS Stat On One Of The File (Keep in Mind the output backet is based on event time that is MDT/MST vs the stat date of GMT)
> hadoop fs -stat "%y %n" /flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100
> -log.1484326813840.snappy.tmp
> 17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp
>
> Thanks
> Justin
>
>> On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <de...@cloudera.com> wrote:
>> Hi Justin,
>>
>> Could you please share your config file with us?
>>
>> Thanks,
>> Denes
>>
>>
>>> On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com> wrote:
>>> sorry for cross posting to user and dev. I have recently set up a flume configuration where we are using the regex_extractor interceptor to parse the actual event date from the record flowing through the Flume source, then using that date to build the HDFS sink bucket path. However, it appears that the hdfs.idleTimeout value is not honored in this configuration. It does work when using the timestamp interceptor you build the output path.
>>>
>>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are never closed or renamed until I restart or shutdown Flume. Our flume is configured to roll based on size or output path, and the files rename/close/roll fine based on size, however the last file in each output path is always left with the .tmp extension until we restart Flume. I would expect that the file would be renamed and closed if there are no records written to this file after the idleTimeout is reached.
>>>
>>> Could I be missing something, or is this a known bug with the regex_extract interceptor?
>>>
>>> Thanks
>>> Justin
>
Re: hdfs.idleTime
Posted by Justin Workman <ju...@gmail.com>.
Absolutey, see below. Just to reiterate, when using the timestamp
interceptor values to build the output path based on timestamp in the flume
header, things roll correct. The files also roll just fine base on file
size as well. However when using the regex_interceptor to get the actual
events timestamp to use in the output path, the last file in each directory
does not ever rename/close until flume is restarted.
*flume-conf.properties*
agent1.sources = fpssKafkaTopic
agent1.channels = fpssHdfsFileChannel
agent1.sinks = fpssHdfsSink
agent1.sources.fpssKafkaTopic.type =
org.apache.flume.source.kafka.KafkaSource
agent1.sources.fpssKafkaTopic.zookeeperConnect = zk-host:2181
agent1.sources.fpssKafkaTopic.topic = first-pass-stream-sessionized
agent1.sources.fpssKafkaTopic.groupId = flume-first-pass-stream-sessionized
agent1.sources.fpssKafkaTopic.kafka.auto.offset.reset = smallest
agent1.sources.fpssKafkaTopic.channels = fpssHdfsFileChannel
agent1.sources.fpssKafkaTopic.interceptors = i1 i2 i3
agent1.sources.fpssKafkaTopic.interceptors.i1.type = timestamp
agent1.sources.fpssKafkaTopic.interceptors.i1.preserveExisting = false
agent1.sources.fpssKafkaTopic.interceptors.i2.type =
org.apache.flume.interceptor.HostInterceptor$Builder
agent1.sources.fpssKafkaTopic.interceptors.i2.hostHeader = hostname
agent1.sources.fpssKafkaTopic.interceptors.i2.useIP= false
agent1.sources.fpssKafkaTopic.interceptors.i2.preserveExisting = true
agent1.sources.fpssKafkaTopic.interceptors.i3.type = regex_extractor
agent1.sources.fpssKafkaTopic.interceptors.i3.regex =
^.*\\"entryId\\":\\{\\"date\\":\\"(\\d\\d\\d\\d)-(\\d\\d)-(\\d\\d)T(\\d\\d):.*\\"\\}.*$
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers = s1 s2 s3 s4
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s1.name = year
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s2.name = month
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s3.name = day
agent1.sources.fpssKafkaTopic.interceptors.i3.serializers.s4.name = hour
agent1.sources.fpssKafkaTopic.kafka.consumer.timeout.ms = 100
agent1.channels.fpssHdfsFileChannel.type = file
agent1.channels.fpssHdfsFileChannel.checkpointDir =
/opt/flume/file-channel/fpss/checkpoint
agent1.channels.fpssHdfsFileChannel.dataDirs =
/opt/flume/file-channel/fpss/data
agent1.sinks.fpssHdfsSink.type = hdfs
agent1.sinks.fpssHdfsSink.hdfs.filePrefix = %{hostname}-log
agent1.sinks.fpssHdfsSink.hdfs.inUseSuffix = .tmp
agent1.sinks.fpssHdfsSink.hdfs.path =
hdfs://prodcluster/flumedata/processed/first-pass-stream/%{year}/%{month}/%{day}/%{hour}-00
agent1.sinks.fpssHdfsSink.hdfs.kerberosPrincipal = runtime@EXAMPLE.COM
agent1.sinks.fpssHdfsSink.hdfs.kerberosKeytab = <keytab path removed for
privacy>
agent1.sinks.fpssHdfsSink.hdfs.rollInterval = 0
agent1.sinks.fpssHdfsSink.hdfs.rollCount = 0
## Account for compression. See flume-2128
## My calculation: 512 * 1024 * 1024 * 2.75
agent1.sinks.fpssHdfsSink.hdfs.rollSize = 1476395008
# Close file if idle more than 300 seconds
agent1.sinks.hdfsSink.hdfs.idleTimeout = 300
agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
agent1.sinks.fpssHdfsSink.hdfs.fileType = CompressedStream
agent1.sinks.fpssHdfsSink.hdfs.codeC = snappy
agent1.sinks.fpssHdfsSink.hdfs.writeFormat = Text
agent1.sinks.fpssHdfsSink.channel = fpssHdfsFileChannel
agent1.sinks.fpssHdfsSink.hdfs.batchSize = 10000
agent1.sinks.fpssHdfsSink.hdfs.threadsPoolSize = 20
agent1.sinks.fpssHdfsSink.hdfs.callTimeout = 20000
*HDFS Output Since Midnight (Notice the last file is never closed/renamed)*
hdfs dfs -ls /flumedata/processed/first-pass-stream/2017/01/13/*/
17/01/13 12:38:52 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 7 items
-rw-r--r-- 3 b2c_runtime hadoop 513710580 2017-01-13 00:09
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815397.snappy
-rw-r--r-- 3 b2c_runtime hadoop 514439844 2017-01-13 00:18
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815398.snappy
-rw-r--r-- 3 b2c_runtime hadoop 515125962 2017-01-13 00:28
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815399.snappy
-rw-r--r-- 3 b2c_runtime hadoop 513010837 2017-01-13 00:38
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815400.snappy
-rw-r--r-- 3 b2c_runtime hadoop 511315467 2017-01-13 00:49
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815401.snappy
-rw-r--r-- 3 b2c_runtime hadoop 508420966 2017-01-13 00:59
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815402.snappy
-rw-r--r-- 3 b2c_runtime hadoop 2503353 2017-01-13 00:59
/flumedata/processed/first-pass-stream/2017/01/13/00-00/flumeload100-log.1484290815403.snappy.tmp
Found 6 items
-rw-r--r-- 3 b2c_runtime hadoop 509116221 2017-01-13 01:10
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415705.snappy
-rw-r--r-- 3 b2c_runtime hadoop 507800675 2017-01-13 01:21
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415706.snappy
-rw-r--r-- 3 b2c_runtime hadoop 504432110 2017-01-13 01:32
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415707.snappy
-rw-r--r-- 3 b2c_runtime hadoop 501932914 2017-01-13 01:42
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415708.snappy
-rw-r--r-- 3 b2c_runtime hadoop 498136257 2017-01-13 01:50
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415709.snappy
-rw-r--r-- 3 b2c_runtime hadoop 60539 2017-01-13 01:50
/flumedata/processed/first-pass-stream/2017/01/13/01-00/flumeload100-log.1484294415710.snappy.tmp
Found 6 items
-rw-r--r-- 3 b2c_runtime hadoop 500879399 2017-01-13 02:11
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016017.snappy
-rw-r--r-- 3 b2c_runtime hadoop 501827071 2017-01-13 02:21
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016018.snappy
-rw-r--r-- 3 b2c_runtime hadoop 501489101 2017-01-13 02:32
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016019.snappy
-rw-r--r-- 3 b2c_runtime hadoop 501527838 2017-01-13 02:43
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016020.snappy
-rw-r--r-- 3 b2c_runtime hadoop 499393977 2017-01-13 02:54
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016021.snappy
-rw-r--r-- 3 b2c_runtime hadoop 1282327 2017-01-13 02:54
/flumedata/processed/first-pass-stream/2017/01/13/02-00/flumeload100-log.1484298016022.snappy.tmp
Found 6 items
-rw-r--r-- 3 b2c_runtime hadoop 501033294 2017-01-13 03:10
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615579.snappy
-rw-r--r-- 3 b2c_runtime hadoop 500933906 2017-01-13 03:20
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615580.snappy
-rw-r--r-- 3 b2c_runtime hadoop 505869233 2017-01-13 03:31
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615581.snappy
-rw-r--r-- 3 b2c_runtime hadoop 502910608 2017-01-13 03:41
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615582.snappy
-rw-r--r-- 3 b2c_runtime hadoop 499561080 2017-01-13 03:52
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615583.snappy
-rw-r--r-- 3 b2c_runtime hadoop 3616826 2017-01-13 03:52
/flumedata/processed/first-pass-stream/2017/01/13/03-00/flumeload100-log.1484301615584.snappy.tmp
Found 6 items
-rw-r--r-- 3 b2c_runtime hadoop 502243204 2017-01-13 04:11
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215893.snappy
-rw-r--r-- 3 b2c_runtime hadoop 508966498 2017-01-13 04:22
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215894.snappy
-rw-r--r-- 3 b2c_runtime hadoop 510972236 2017-01-13 04:34
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215895.snappy
-rw-r--r-- 3 b2c_runtime hadoop 513225577 2017-01-13 04:46
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215896.snappy
-rw-r--r-- 3 b2c_runtime hadoop 512743679 2017-01-13 04:57
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215897.snappy
-rw-r--r-- 3 b2c_runtime hadoop 3888775 2017-01-13 04:57
/flumedata/processed/first-pass-stream/2017/01/13/04-00/flumeload100-log.1484305215898.snappy.tmp
Found 7 items
-rw-r--r-- 3 b2c_runtime hadoop 515832251 2017-01-13 05:11
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811983.snappy
-rw-r--r-- 3 b2c_runtime hadoop 518077964 2017-01-13 05:20
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811984.snappy
-rw-r--r-- 3 b2c_runtime hadoop 519490676 2017-01-13 05:29
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811985.snappy
-rw-r--r-- 3 b2c_runtime hadoop 519105563 2017-01-13 05:37
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811986.snappy
-rw-r--r-- 3 b2c_runtime hadoop 518672209 2017-01-13 05:46
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811987.snappy
-rw-r--r-- 3 b2c_runtime hadoop 520019853 2017-01-13 05:53
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811988.snappy
-rw-r--r-- 3 b2c_runtime hadoop 1574211 2017-01-13 05:53
/flumedata/processed/first-pass-stream/2017/01/13/05-00/flumeload100-log.1484308811989.snappy.tmp
Found 9 items
-rw-r--r-- 3 b2c_runtime hadoop 521428204 2017-01-13 06:07
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413743.snappy
-rw-r--r-- 3 b2c_runtime hadoop 519885769 2017-01-13 06:15
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413744.snappy
-rw-r--r-- 3 b2c_runtime hadoop 519050891 2017-01-13 06:21
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413745.snappy
-rw-r--r-- 3 b2c_runtime hadoop 520691322 2017-01-13 06:29
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413746.snappy
-rw-r--r-- 3 b2c_runtime hadoop 520902319 2017-01-13 06:36
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413747.snappy
-rw-r--r-- 3 b2c_runtime hadoop 520831873 2017-01-13 06:42
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413748.snappy
-rw-r--r-- 3 b2c_runtime hadoop 519785647 2017-01-13 06:49
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413749.snappy
-rw-r--r-- 3 b2c_runtime hadoop 520590143 2017-01-13 06:55
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413750.snappy
-rw-r--r-- 3 b2c_runtime hadoop 4621367 2017-01-13 06:55
/flumedata/processed/first-pass-stream/2017/01/13/06-00/flumeload100-log.1484312413751.snappy.tmp
Found 11 items
-rw-r--r-- 3 b2c_runtime hadoop 522623760 2017-01-13 07:06
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015214.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523065112 2017-01-13 07:12
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015215.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523445533 2017-01-13 07:18
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015216.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523084945 2017-01-13 07:24
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015217.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524283976 2017-01-13 07:30
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015218.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523923379 2017-01-13 07:36
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015219.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523910723 2017-01-13 07:42
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015220.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524266095 2017-01-13 07:47
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015221.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523002505 2017-01-13 07:53
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015222.snappy
-rw-r--r-- 3 b2c_runtime hadoop 520706211 2017-01-13 07:58
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015223.snappy
-rw-r--r-- 3 b2c_runtime hadoop 8051588 2017-01-13 07:58
/flumedata/processed/first-pass-stream/2017/01/13/07-00/flumeload100-log.1484316015224.snappy.tmp
Found 11 items
-rw-r--r-- 3 b2c_runtime hadoop 520528155 2017-01-13 08:05
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618433.snappy
-rw-r--r-- 3 b2c_runtime hadoop 521761390 2017-01-13 08:11
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618434.snappy
-rw-r--r-- 3 b2c_runtime hadoop 522548272 2017-01-13 08:16
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618435.snappy
-rw-r--r-- 3 b2c_runtime hadoop 522616117 2017-01-13 08:22
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618436.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525953759 2017-01-13 08:28
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618437.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524475009 2017-01-13 08:34
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618438.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523995339 2017-01-13 08:40
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618439.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524188832 2017-01-13 08:47
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618440.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525303001 2017-01-13 08:53
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618441.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525606532 2017-01-13 08:59
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618442.snappy
-rw-r--r-- 3 b2c_runtime hadoop 4486982 2017-01-13 08:59
/flumedata/processed/first-pass-stream/2017/01/13/08-00/flumeload100-log.1484319618443.snappy.tmp
Found 11 items
-rw-r--r-- 3 b2c_runtime hadoop 525207364 2017-01-13 09:06
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216987.snappy
-rw-r--r-- 3 b2c_runtime hadoop 526105891 2017-01-13 09:12
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216988.snappy
-rw-r--r-- 3 b2c_runtime hadoop 526426735 2017-01-13 09:18
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216989.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525298099 2017-01-13 09:24
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216990.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525282945 2017-01-13 09:30
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216991.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523921005 2017-01-13 09:36
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216992.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524827705 2017-01-13 09:42
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216993.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524203463 2017-01-13 09:47
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216994.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524678485 2017-01-13 09:53
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216995.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524598220 2017-01-13 09:59
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216996.snappy
-rw-r--r-- 3 b2c_runtime hadoop 3877959 2017-01-13 09:59
/flumedata/processed/first-pass-stream/2017/01/13/09-00/flumeload100-log.1484323216997.snappy.tmp
Found 10 items
-rw-r--r-- 3 b2c_runtime hadoop 523000460 2017-01-13 10:06
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813831.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523455154 2017-01-13 10:12
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813832.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525465618 2017-01-13 10:18
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813833.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524630955 2017-01-13 10:24
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813834.snappy
-rw-r--r-- 3 b2c_runtime hadoop 527780298 2017-01-13 10:30
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813835.snappy
-rw-r--r-- 3 b2c_runtime hadoop 526565562 2017-01-13 10:37
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813836.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524936336 2017-01-13 10:43
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813837.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524565610 2017-01-13 10:49
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813838.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524276950 2017-01-13 10:55
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813839.snappy
-rw-r--r-- 3 b2c_runtime hadoop 654810 2017-01-13 10:55
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100-log.1484326813840.snappy.tmp
Found 11 items
-rw-r--r-- 3 b2c_runtime hadoop 524174553 2017-01-13 11:06
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415712.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524127864 2017-01-13 11:12
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415713.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524778919 2017-01-13 11:18
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415714.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524851182 2017-01-13 11:24
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415715.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525156750 2017-01-13 11:30
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415716.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525334538 2017-01-13 11:35
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415717.snappy
-rw-r--r-- 3 b2c_runtime hadoop 527346578 2017-01-13 11:41
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415718.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525592734 2017-01-13 11:47
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415719.snappy
-rw-r--r-- 3 b2c_runtime hadoop 525502291 2017-01-13 11:53
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415720.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523135186 2017-01-13 11:58
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415721.snappy
-rw-r--r-- 3 b2c_runtime hadoop 9967141 2017-01-13 11:58
/flumedata/processed/first-pass-stream/2017/01/13/11-00/flumeload100-log.1484330415722.snappy.tmp
Found 7 items
-rw-r--r-- 3 b2c_runtime hadoop 520881970 2017-01-13 12:05
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016849.snappy
-rw-r--r-- 3 b2c_runtime hadoop 522340745 2017-01-13 12:11
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016850.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524156495 2017-01-13 12:17
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016851.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523482390 2017-01-13 12:23
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016852.snappy
-rw-r--r-- 3 b2c_runtime hadoop 524096591 2017-01-13 12:29
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016853.snappy
-rw-r--r-- 3 b2c_runtime hadoop 523184628 2017-01-13 12:35
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016854.snappy
-rw-r--r-- 3 b2c_runtime hadoop 10981218 2017-01-13 12:35
/flumedata/processed/first-pass-stream/2017/01/13/12-00/flumeload100-log.1484334016855.snappy.tmp
*HDFS Stat On One Of The File (Keep in Mind the output backet is based on
event time that is MDT/MST vs the stat date of GMT)*
hadoop fs -stat "%y %n"
/flumedata/processed/first-pass-stream/2017/01/13/10-00/flumeload100
-log.1484326813840.snappy.tmp
17/01/13 12:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2017-01-13 17:55:35 flumeload100-log.1484326813840.snappy.tmp
Thanks
Justin
On Thu, Jan 12, 2017 at 11:56 PM, Denes Arvay <de...@cloudera.com> wrote:
> Hi Justin,
>
> Could you please share your config file with us?
>
> Thanks,
> Denes
>
>
> On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com>
> wrote:
>
>> sorry for cross posting to user and dev. I have recently set up a flume
>> configuration where we are using the regex_extractor interceptor to parse
>> the actual event date from the record flowing through the Flume source,
>> then using that date to build the HDFS sink bucket path. However, it
>> appears that the hdfs.idleTimeout value is not honored in this
>> configuration. It does work when using the timestamp interceptor you build
>> the output path.
>>
>> I have set the hdfs.idleTimeout value for the HDFS sink, but the files
>> are never closed or renamed until I restart or shutdown Flume. Our flume is
>> configured to roll based on size or output path, and the files
>> rename/close/roll fine based on size, however the last file in each output
>> path is always left with the .tmp extension until we restart Flume. I would
>> expect that the file would be renamed and closed if there are no records
>> written to this file after the idleTimeout is reached.
>>
>> Could I be missing something, or is this a known bug with the
>> regex_extract interceptor?
>>
>> Thanks
>> Justin
>>
>
Re: hdfs.idleTime
Posted by Denes Arvay <de...@cloudera.com>.
Hi Justin,
Could you please share your config file with us?
Thanks,
Denes
On Thu, Jan 12, 2017, 20:20 Justin Workman <ju...@gmail.com> wrote:
> sorry for cross posting to user and dev. I have recently set up a flume
> configuration where we are using the regex_extractor interceptor to parse
> the actual event date from the record flowing through the Flume source,
> then using that date to build the HDFS sink bucket path. However, it
> appears that the hdfs.idleTimeout value is not honored in this
> configuration. It does work when using the timestamp interceptor you build
> the output path.
>
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
> never closed or renamed until I restart or shutdown Flume. Our flume is
> configured to roll based on size or output path, and the files
> rename/close/roll fine based on size, however the last file in each output
> path is always left with the .tmp extension until we restart Flume. I would
> expect that the file would be renamed and closed if there are no records
> written to this file after the idleTimeout is reached.
>
> Could I be missing something, or is this a known bug with the
> regex_extract interceptor?
>
> Thanks
> Justin
>
Re: hdfs.idleTime
Posted by Justin Workman <ju...@gmail.com>.
More details
Flume 1.6 - Core Apache version.
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).
On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <ju...@gmail.com>
wrote:
> sorry for cross posting to user and dev. I have recently set up a flume
> configuration where we are using the regex_extractor interceptor to parse
> the actual event date from the record flowing through the Flume source,
> then using that date to build the HDFS sink bucket path. However, it
> appears that the hdfs.idleTimeout value is not honored in this
> configuration. It does work when using the timestamp interceptor you build
> the output path.
>
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
> never closed or renamed until I restart or shutdown Flume. Our flume is
> configured to roll based on size or output path, and the files
> rename/close/roll fine based on size, however the last file in each output
> path is always left with the .tmp extension until we restart Flume. I would
> expect that the file would be renamed and closed if there are no records
> written to this file after the idleTimeout is reached.
>
> Could I be missing something, or is this a known bug with the
> regex_extract interceptor?
>
> Thanks
> Justin
>