You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Mungeol Heo <mu...@gmail.com> on 2014/12/24 10:01:51 UTC

Is there any "1 hour ago" like option at HDFS sink?

Hello,

I try to use the configuration, which listed below, to transfer logs to HDFS.

...
agent01.sources.source01.interceptors = interceptor01
agent01.sources.source01.interceptors.interceptor01.type = timestamp
...
agent01.sinks.sink01.type = hdfs
agent01.sinks.sink01.hdfs.path = /access/%Y%m%d/%H
...

The log files I have is like below, which have hourly logs.

...
access.2014122400
access.2014122401
access.2014122402
...

If the 'access.2014122402' file is transferred at 03:10, the HDFS path
will be '/access/20141224/03/"file name"
However, I need the path to be '/access/20141224/02/"file name"
instead of '/access/20141224/03/"file name".
I mean the hour information of HDFS path should be same with the hour
information of the file name or logs.

Is there any way to get it done except changing 'hdfs.timeZone' setting?
Like, "1 hour ago" option of the date command in linux.
Any help will be great.
Thanks.

- Mungeol

Re: Is there any "1 hour ago" like option at HDFS sink?

Posted by Mungeol Heo <mu...@gmail.com>.
Thank you, Paul.

It is working.

On Thu, Dec 25, 2014 at 1:52 AM, Paul Chavez <pc...@ntent.com> wrote:
> You can use a regex extractor interceptor to create the time stamp header from your data.
>
>
>> On Dec 24, 2014, at 1:03 AM, Mungeol Heo <mu...@gmail.com> wrote:
>>
>> Hello,
>>
>> I try to use the configuration, which listed below, to transfer logs to HDFS.
>>
>> ...
>> agent01.sources.source01.interceptors = interceptor01
>> agent01.sources.source01.interceptors.interceptor01.type = timestamp
>> ...
>> agent01.sinks.sink01.type = hdfs
>> agent01.sinks.sink01.hdfs.path = /access/%Y%m%d/%H
>> ...
>>
>> The log files I have is like below, which have hourly logs.
>>
>> ...
>> access.2014122400
>> access.2014122401
>> access.2014122402
>> ...
>>
>> If the 'access.2014122402' file is transferred at 03:10, the HDFS path
>> will be '/access/20141224/03/"file name"
>> However, I need the path to be '/access/20141224/02/"file name"
>> instead of '/access/20141224/03/"file name".
>> I mean the hour information of HDFS path should be same with the hour
>> information of the file name or logs.
>>
>> Is there any way to get it done except changing 'hdfs.timeZone' setting?
>> Like, "1 hour ago" option of the date command in linux.
>> Any help will be great.
>> Thanks.
>>
>> - Mungeol

Re: Is there any "1 hour ago" like option at HDFS sink?

Posted by Paul Chavez <pc...@ntent.com>.
You can use a regex extractor interceptor to create the time stamp header from your data.


> On Dec 24, 2014, at 1:03 AM, Mungeol Heo <mu...@gmail.com> wrote:
> 
> Hello,
> 
> I try to use the configuration, which listed below, to transfer logs to HDFS.
> 
> ...
> agent01.sources.source01.interceptors = interceptor01
> agent01.sources.source01.interceptors.interceptor01.type = timestamp
> ...
> agent01.sinks.sink01.type = hdfs
> agent01.sinks.sink01.hdfs.path = /access/%Y%m%d/%H
> ...
> 
> The log files I have is like below, which have hourly logs.
> 
> ...
> access.2014122400
> access.2014122401
> access.2014122402
> ...
> 
> If the 'access.2014122402' file is transferred at 03:10, the HDFS path
> will be '/access/20141224/03/"file name"
> However, I need the path to be '/access/20141224/02/"file name"
> instead of '/access/20141224/03/"file name".
> I mean the hour information of HDFS path should be same with the hour
> information of the file name or logs.
> 
> Is there any way to get it done except changing 'hdfs.timeZone' setting?
> Like, "1 hour ago" option of the date command in linux.
> Any help will be great.
> Thanks.
> 
> - Mungeol