You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Dibyajyoti Ghosh <di...@gmail.com> on 2013/03/15 20:16:25 UTC

Flume custom decorator for Rolling FileSink output bucketing

Dear flume team,

I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log to
local file system. But current implementation of FileSink doesn't have
inline decorators like in HDFS Sink where output can be stored to
directories based on event meta data e.g. hostname of the event or
timestamp or some other attribute in the message object.

How can I do the same for FileSink?


Thanks a lot,
- dib

Re: Flume custom decorator for Rolling FileSink output bucketing

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
At this time, I don't think anyone is working on that. I would like to 
do it myself, but tbh, I have a lot of other stuff to deal with right 
now, so don't see myself working on it any time in the near future.

However if someone was to post a patch I should be able to find the time 
to review and commit it.

On 03/19/2013 02:28 AM, Dibyajyoti Ghosh wrote:
> Hi Juhani,
>
> Thank you very much for clarifying the doubts I had about the documentation
> for quite some time now. I downloaded the flume source from git and now
> looking into the HDFS sink code base. Like you said it will not be a small
> patch. Will keep the community posted about the changes.
>
> Are you aware of any plan to implement the output bucketting (i.e. dynamic
> paths) to FileRoll sink in near future releases of Flume?
>
> thanks a lot,
> - dib
>
>
> On Sun, Mar 17, 2013 at 6:55 PM, Juhani Connolly<
> juhani_connolly@cyberagent.co.jp>  wrote:
>
>> Dib, that article is in reference to flume OG(0.95), it's not relevant to
>> the current release.
>>
>> I had looked in the past at fixing the file sink to use the same
>> bucketting available to the hdfs sink, but unfortunately it seemed like it
>> would take more than a quick fix. The PathManager currently only works with
>> one File at a time, and the rolling logic is connected to that. You'd
>> basically have to replace most of the logic, ideally reusing the bucketing
>> logic from the HDFS sink. As Mike said, you should probably just use the
>> HDFS sink with file:// unless you feel like improving the current sink.
>>
>>
>> On 03/16/2013 06:20 AM, Dibyajyoti Ghosh wrote:
>>
>>> Thanks Mike for the suggestion. The reason I am thinking of usual file
>>> system for log storage is to avoid latency issues for file retrieval as
>>> well as to allow users to scrape log files using grep / awk and multitude
>>> of other powerful commands available in conventional storage.
>>>
>>> I am now thinking of coming up with my own decorator classes for
>>> RollingFile sink. Any pointers on how I can get started on writing my
>>> custom decorators?
>>>
>>> Another quick question: Can you, Mike or somebody from flume community
>>> tell
>>> me how to use the commands documented here at:
>>> http://archive.cloudera.com/**cdh/3/flume/UserGuide/#_**
>>> introducing_sink_decorators<http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators>
>>>
>>>
>>> Is this available for flume-ng distributed with Cloudera solution i.e.
>>> flume 1.3.0?
>>>
>>> Best and thanks a lot again,
>>> - dib
>>>
>>>
>>> On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy<mp...@apache.org>  wrote:
>>>
>>>   Dib, you could use the HDFS sink with a file:// URL as an option.
>>>> Regards,
>>>> Mike
>>>>
>>>>
>>>>
>>>> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh<
>>>> dibyajyotighosh@gmail.com>  wrote:
>>>>
>>>>   Dear flume team,
>>>>> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
>>>>>
>>>> to
>>>>
>>>>> local file system. But current implementation of FileSink doesn't have
>>>>> inline decorators like in HDFS Sink where output can be stored to
>>>>> directories based on event meta data e.g. hostname of the event or
>>>>> timestamp or some other attribute in the message object.
>>>>>
>>>>> How can I do the same for FileSink?
>>>>>
>>>>>
>>>>> Thanks a lot,
>>>>> - dib
>>>>>
>>>>>


Re: Flume custom decorator for Rolling FileSink output bucketing

Posted by Dibyajyoti Ghosh <di...@gmail.com>.
Hi Juhani,

Thank you very much for clarifying the doubts I had about the documentation
for quite some time now. I downloaded the flume source from git and now
looking into the HDFS sink code base. Like you said it will not be a small
patch. Will keep the community posted about the changes.

Are you aware of any plan to implement the output bucketting (i.e. dynamic
paths) to FileRoll sink in near future releases of Flume?

thanks a lot,
- dib


On Sun, Mar 17, 2013 at 6:55 PM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

> Dib, that article is in reference to flume OG(0.95), it's not relevant to
> the current release.
>
> I had looked in the past at fixing the file sink to use the same
> bucketting available to the hdfs sink, but unfortunately it seemed like it
> would take more than a quick fix. The PathManager currently only works with
> one File at a time, and the rolling logic is connected to that. You'd
> basically have to replace most of the logic, ideally reusing the bucketing
> logic from the HDFS sink. As Mike said, you should probably just use the
> HDFS sink with file:// unless you feel like improving the current sink.
>
>
> On 03/16/2013 06:20 AM, Dibyajyoti Ghosh wrote:
>
>> Thanks Mike for the suggestion. The reason I am thinking of usual file
>> system for log storage is to avoid latency issues for file retrieval as
>> well as to allow users to scrape log files using grep / awk and multitude
>> of other powerful commands available in conventional storage.
>>
>> I am now thinking of coming up with my own decorator classes for
>> RollingFile sink. Any pointers on how I can get started on writing my
>> custom decorators?
>>
>> Another quick question: Can you, Mike or somebody from flume community
>> tell
>> me how to use the commands documented here at:
>> http://archive.cloudera.com/**cdh/3/flume/UserGuide/#_**
>> introducing_sink_decorators<http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators>
>>
>>
>> Is this available for flume-ng distributed with Cloudera solution i.e.
>> flume 1.3.0?
>>
>> Best and thanks a lot again,
>> - dib
>>
>>
>> On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:
>>
>>  Dib, you could use the HDFS sink with a file:// URL as an option.
>>>
>>> Regards,
>>> Mike
>>>
>>>
>>>
>>> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
>>> dibyajyotighosh@gmail.com> wrote:
>>>
>>>  Dear flume team,
>>>>
>>>> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
>>>>
>>> to
>>>
>>>> local file system. But current implementation of FileSink doesn't have
>>>> inline decorators like in HDFS Sink where output can be stored to
>>>> directories based on event meta data e.g. hostname of the event or
>>>> timestamp or some other attribute in the message object.
>>>>
>>>> How can I do the same for FileSink?
>>>>
>>>>
>>>> Thanks a lot,
>>>> - dib
>>>>
>>>>
>

Re: Flume custom decorator for Rolling FileSink output bucketing

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
Dib, that article is in reference to flume OG(0.95), it's not relevant 
to the current release.

I had looked in the past at fixing the file sink to use the same 
bucketting available to the hdfs sink, but unfortunately it seemed like 
it would take more than a quick fix. The PathManager currently only 
works with one File at a time, and the rolling logic is connected to 
that. You'd basically have to replace most of the logic, ideally reusing 
the bucketing logic from the HDFS sink. As Mike said, you should 
probably just use the HDFS sink with file:// unless you feel like 
improving the current sink.

On 03/16/2013 06:20 AM, Dibyajyoti Ghosh wrote:
> Thanks Mike for the suggestion. The reason I am thinking of usual file
> system for log storage is to avoid latency issues for file retrieval as
> well as to allow users to scrape log files using grep / awk and multitude
> of other powerful commands available in conventional storage.
>
> I am now thinking of coming up with my own decorator classes for
> RollingFile sink. Any pointers on how I can get started on writing my
> custom decorators?
>
> Another quick question: Can you, Mike or somebody from flume community tell
> me how to use the commands documented here at:
> http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators
>
>
> Is this available for flume-ng distributed with Cloudera solution i.e.
> flume 1.3.0?
>
> Best and thanks a lot again,
> - dib
>
>
> On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:
>
>> Dib, you could use the HDFS sink with a file:// URL as an option.
>>
>> Regards,
>> Mike
>>
>>
>>
>> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
>> dibyajyotighosh@gmail.com> wrote:
>>
>>> Dear flume team,
>>>
>>> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
>> to
>>> local file system. But current implementation of FileSink doesn't have
>>> inline decorators like in HDFS Sink where output can be stored to
>>> directories based on event meta data e.g. hostname of the event or
>>> timestamp or some other attribute in the message object.
>>>
>>> How can I do the same for FileSink?
>>>
>>>
>>> Thanks a lot,
>>> - dib
>>>


Re: Flume custom decorator for Rolling FileSink output bucketing

Posted by Dibyajyoti Ghosh <di...@gmail.com>.
Thanks Mike for the suggestion. The reason I am thinking of usual file
system for log storage is to avoid latency issues for file retrieval as
well as to allow users to scrape log files using grep / awk and multitude
of other powerful commands available in conventional storage.

I am now thinking of coming up with my own decorator classes for
RollingFile sink. Any pointers on how I can get started on writing my
custom decorators?

Another quick question: Can you, Mike or somebody from flume community tell
me how to use the commands documented here at:
http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators


Is this available for flume-ng distributed with Cloudera solution i.e.
flume 1.3.0?

Best and thanks a lot again,
- dib


On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:

> Dib, you could use the HDFS sink with a file:// URL as an option.
>
> Regards,
> Mike
>
>
>
> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
> dibyajyotighosh@gmail.com> wrote:
>
> > Dear flume team,
> >
> > I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
> to
> > local file system. But current implementation of FileSink doesn't have
> > inline decorators like in HDFS Sink where output can be stored to
> > directories based on event meta data e.g. hostname of the event or
> > timestamp or some other attribute in the message object.
> >
> > How can I do the same for FileSink?
> >
> >
> > Thanks a lot,
> > - dib
> >
>

Re: Flume custom decorator for Rolling FileSink output bucketing

Posted by Mike Percy <mp...@apache.org>.
Dib, you could use the HDFS sink with a file:// URL as an option.

Regards,
Mike



On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
dibyajyotighosh@gmail.com> wrote:

> Dear flume team,
>
> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log to
> local file system. But current implementation of FileSink doesn't have
> inline decorators like in HDFS Sink where output can be stored to
> directories based on event meta data e.g. hostname of the event or
> timestamp or some other attribute in the message object.
>
> How can I do the same for FileSink?
>
>
> Thanks a lot,
> - dib
>