You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Dibyajyoti Ghosh <di...@gmail.com> on 2013/03/15 20:16:25 UTC
Flume custom decorator for Rolling FileSink output bucketing
Dear flume team,
I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log to
local file system. But current implementation of FileSink doesn't have
inline decorators like in HDFS Sink where output can be stored to
directories based on event meta data e.g. hostname of the event or
timestamp or some other attribute in the message object.
How can I do the same for FileSink?
Thanks a lot,
- dib
Re: Flume custom decorator for Rolling FileSink output bucketing
Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
At this time, I don't think anyone is working on that. I would like to
do it myself, but tbh, I have a lot of other stuff to deal with right
now, so don't see myself working on it any time in the near future.
However if someone was to post a patch I should be able to find the time
to review and commit it.
On 03/19/2013 02:28 AM, Dibyajyoti Ghosh wrote:
> Hi Juhani,
>
> Thank you very much for clarifying the doubts I had about the documentation
> for quite some time now. I downloaded the flume source from git and now
> looking into the HDFS sink code base. Like you said it will not be a small
> patch. Will keep the community posted about the changes.
>
> Are you aware of any plan to implement the output bucketting (i.e. dynamic
> paths) to FileRoll sink in near future releases of Flume?
>
> thanks a lot,
> - dib
>
>
> On Sun, Mar 17, 2013 at 6:55 PM, Juhani Connolly<
> juhani_connolly@cyberagent.co.jp> wrote:
>
>> Dib, that article is in reference to flume OG(0.95), it's not relevant to
>> the current release.
>>
>> I had looked in the past at fixing the file sink to use the same
>> bucketting available to the hdfs sink, but unfortunately it seemed like it
>> would take more than a quick fix. The PathManager currently only works with
>> one File at a time, and the rolling logic is connected to that. You'd
>> basically have to replace most of the logic, ideally reusing the bucketing
>> logic from the HDFS sink. As Mike said, you should probably just use the
>> HDFS sink with file:// unless you feel like improving the current sink.
>>
>>
>> On 03/16/2013 06:20 AM, Dibyajyoti Ghosh wrote:
>>
>>> Thanks Mike for the suggestion. The reason I am thinking of usual file
>>> system for log storage is to avoid latency issues for file retrieval as
>>> well as to allow users to scrape log files using grep / awk and multitude
>>> of other powerful commands available in conventional storage.
>>>
>>> I am now thinking of coming up with my own decorator classes for
>>> RollingFile sink. Any pointers on how I can get started on writing my
>>> custom decorators?
>>>
>>> Another quick question: Can you, Mike or somebody from flume community
>>> tell
>>> me how to use the commands documented here at:
>>> http://archive.cloudera.com/**cdh/3/flume/UserGuide/#_**
>>> introducing_sink_decorators<http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators>
>>>
>>>
>>> Is this available for flume-ng distributed with Cloudera solution i.e.
>>> flume 1.3.0?
>>>
>>> Best and thanks a lot again,
>>> - dib
>>>
>>>
>>> On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy<mp...@apache.org> wrote:
>>>
>>> Dib, you could use the HDFS sink with a file:// URL as an option.
>>>> Regards,
>>>> Mike
>>>>
>>>>
>>>>
>>>> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh<
>>>> dibyajyotighosh@gmail.com> wrote:
>>>>
>>>> Dear flume team,
>>>>> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
>>>>>
>>>> to
>>>>
>>>>> local file system. But current implementation of FileSink doesn't have
>>>>> inline decorators like in HDFS Sink where output can be stored to
>>>>> directories based on event meta data e.g. hostname of the event or
>>>>> timestamp or some other attribute in the message object.
>>>>>
>>>>> How can I do the same for FileSink?
>>>>>
>>>>>
>>>>> Thanks a lot,
>>>>> - dib
>>>>>
>>>>>
Re: Flume custom decorator for Rolling FileSink output bucketing
Posted by Dibyajyoti Ghosh <di...@gmail.com>.
Hi Juhani,
Thank you very much for clarifying the doubts I had about the documentation
for quite some time now. I downloaded the flume source from git and now
looking into the HDFS sink code base. Like you said it will not be a small
patch. Will keep the community posted about the changes.
Are you aware of any plan to implement the output bucketting (i.e. dynamic
paths) to FileRoll sink in near future releases of Flume?
thanks a lot,
- dib
On Sun, Mar 17, 2013 at 6:55 PM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:
> Dib, that article is in reference to flume OG(0.95), it's not relevant to
> the current release.
>
> I had looked in the past at fixing the file sink to use the same
> bucketting available to the hdfs sink, but unfortunately it seemed like it
> would take more than a quick fix. The PathManager currently only works with
> one File at a time, and the rolling logic is connected to that. You'd
> basically have to replace most of the logic, ideally reusing the bucketing
> logic from the HDFS sink. As Mike said, you should probably just use the
> HDFS sink with file:// unless you feel like improving the current sink.
>
>
> On 03/16/2013 06:20 AM, Dibyajyoti Ghosh wrote:
>
>> Thanks Mike for the suggestion. The reason I am thinking of usual file
>> system for log storage is to avoid latency issues for file retrieval as
>> well as to allow users to scrape log files using grep / awk and multitude
>> of other powerful commands available in conventional storage.
>>
>> I am now thinking of coming up with my own decorator classes for
>> RollingFile sink. Any pointers on how I can get started on writing my
>> custom decorators?
>>
>> Another quick question: Can you, Mike or somebody from flume community
>> tell
>> me how to use the commands documented here at:
>> http://archive.cloudera.com/**cdh/3/flume/UserGuide/#_**
>> introducing_sink_decorators<http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators>
>>
>>
>> Is this available for flume-ng distributed with Cloudera solution i.e.
>> flume 1.3.0?
>>
>> Best and thanks a lot again,
>> - dib
>>
>>
>> On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:
>>
>> Dib, you could use the HDFS sink with a file:// URL as an option.
>>>
>>> Regards,
>>> Mike
>>>
>>>
>>>
>>> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
>>> dibyajyotighosh@gmail.com> wrote:
>>>
>>> Dear flume team,
>>>>
>>>> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
>>>>
>>> to
>>>
>>>> local file system. But current implementation of FileSink doesn't have
>>>> inline decorators like in HDFS Sink where output can be stored to
>>>> directories based on event meta data e.g. hostname of the event or
>>>> timestamp or some other attribute in the message object.
>>>>
>>>> How can I do the same for FileSink?
>>>>
>>>>
>>>> Thanks a lot,
>>>> - dib
>>>>
>>>>
>
Re: Flume custom decorator for Rolling FileSink output bucketing
Posted by Juhani Connolly <ju...@cyberagent.co.jp>.
Dib, that article is in reference to flume OG(0.95), it's not relevant
to the current release.
I had looked in the past at fixing the file sink to use the same
bucketting available to the hdfs sink, but unfortunately it seemed like
it would take more than a quick fix. The PathManager currently only
works with one File at a time, and the rolling logic is connected to
that. You'd basically have to replace most of the logic, ideally reusing
the bucketing logic from the HDFS sink. As Mike said, you should
probably just use the HDFS sink with file:// unless you feel like
improving the current sink.
On 03/16/2013 06:20 AM, Dibyajyoti Ghosh wrote:
> Thanks Mike for the suggestion. The reason I am thinking of usual file
> system for log storage is to avoid latency issues for file retrieval as
> well as to allow users to scrape log files using grep / awk and multitude
> of other powerful commands available in conventional storage.
>
> I am now thinking of coming up with my own decorator classes for
> RollingFile sink. Any pointers on how I can get started on writing my
> custom decorators?
>
> Another quick question: Can you, Mike or somebody from flume community tell
> me how to use the commands documented here at:
> http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators
>
>
> Is this available for flume-ng distributed with Cloudera solution i.e.
> flume 1.3.0?
>
> Best and thanks a lot again,
> - dib
>
>
> On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:
>
>> Dib, you could use the HDFS sink with a file:// URL as an option.
>>
>> Regards,
>> Mike
>>
>>
>>
>> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
>> dibyajyotighosh@gmail.com> wrote:
>>
>>> Dear flume team,
>>>
>>> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
>> to
>>> local file system. But current implementation of FileSink doesn't have
>>> inline decorators like in HDFS Sink where output can be stored to
>>> directories based on event meta data e.g. hostname of the event or
>>> timestamp or some other attribute in the message object.
>>>
>>> How can I do the same for FileSink?
>>>
>>>
>>> Thanks a lot,
>>> - dib
>>>
Re: Flume custom decorator for Rolling FileSink output bucketing
Posted by Dibyajyoti Ghosh <di...@gmail.com>.
Thanks Mike for the suggestion. The reason I am thinking of usual file
system for log storage is to avoid latency issues for file retrieval as
well as to allow users to scrape log files using grep / awk and multitude
of other powerful commands available in conventional storage.
I am now thinking of coming up with my own decorator classes for
RollingFile sink. Any pointers on how I can get started on writing my
custom decorators?
Another quick question: Can you, Mike or somebody from flume community tell
me how to use the commands documented here at:
http://archive.cloudera.com/cdh/3/flume/UserGuide/#_introducing_sink_decorators
Is this available for flume-ng distributed with Cloudera solution i.e.
flume 1.3.0?
Best and thanks a lot again,
- dib
On Fri, Mar 15, 2013 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:
> Dib, you could use the HDFS sink with a file:// URL as an option.
>
> Regards,
> Mike
>
>
>
> On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
> dibyajyotighosh@gmail.com> wrote:
>
> > Dear flume team,
> >
> > I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log
> to
> > local file system. But current implementation of FileSink doesn't have
> > inline decorators like in HDFS Sink where output can be stored to
> > directories based on event meta data e.g. hostname of the event or
> > timestamp or some other attribute in the message object.
> >
> > How can I do the same for FileSink?
> >
> >
> > Thanks a lot,
> > - dib
> >
>
Re: Flume custom decorator for Rolling FileSink output bucketing
Posted by Mike Percy <mp...@apache.org>.
Dib, you could use the HDFS sink with a file:// URL as an option.
Regards,
Mike
On Fri, Mar 15, 2013 at 12:16 PM, Dibyajyoti Ghosh <
dibyajyotighosh@gmail.com> wrote:
> Dear flume team,
>
> I am using flume 1.3.0 bundled with Cloudera 4.2.0 distribution for log to
> local file system. But current implementation of FileSink doesn't have
> inline decorators like in HDFS Sink where output can be stored to
> directories based on event meta data e.g. hostname of the event or
> timestamp or some other attribute in the message object.
>
> How can I do the same for FileSink?
>
>
> Thanks a lot,
> - dib
>