You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Shiva Ram <sh...@gmail.com> on 2015/10/03 13:28:47 UTC

With spooldir source, memory channel, hdfs sink --> Output files are very tiny files[766 bytes]

Hi

I am using spooldir source, memory channel, hdfs sink to collect log files
and store into HDFS.

When I run the flume agent, it is creating very very small files with size
766 bytes.

Input file: test.log [11.4 KB]
Output files: sales_web_log.1443871052640.log, etc.[all are very very small
files with size 766 bytes]

*How to increase the output file size?*

*Thanks & Regards,*

*Shiva Ram*
*Website: http://datamaking.com <http://datamaking.com>Facebook Page:
www.facebook.com/datamaking <http://www.facebook.com/datamaking>*

Re: With spooldir source, memory channel, hdfs sink --> Output files are very tiny files[766 bytes]

Posted by Gonzalo Herreros <gh...@gmail.com>.
Yep, you haven't configured hdfs.rollInterval and hdfs.rollSize in the sink
so it will produce tiny files. The batch size is not related with the file
size

Regards
Gonzalo
On Oct 4, 2015 8:15 AM, "Shiva Ram" <sh...@gmail.com> wrote:

> This is my conf file:
>
> flumeAgentSpoolDir.sources = spoolDirSource
> flumeAgentSpoolDir.channels = memoryChannel
> flumeAgentSpoolDir.sinks = hdfsSink
>
> # source
> flumeAgentSpoolDir.sources.spoolDirSource.type = spooldir
> flumeAgentSpoolDir.sources.spoolDirSource.channels = memoryChannel
> flumeAgentSpoolDir.sources.spoolDirSource.spoolDir =
> /home/hduser/Downloads/app_log_data
> flumeAgentSpoolDir.sources.spoolDirSource.fileHeader = true
>
> # HDFS sinks
> flumeAgentSpoolDir.sinks.hdfsSink.type = hdfs
> flumeAgentSpoolDir.sinks.hdfsSink.hdfs.fileType = DataStream
> # change to your host
> flumeAgentSpoolDir.sinks.hdfsSink.hdfs.path = hdfs://
> 192.168.234.181:8020/flume_ng/log_data
> flumeAgentSpoolDir.sinks.hdfsSink.hdfs.filePrefix = sales_web_log
> flumeAgentSpoolDir.sinks.hdfsSink.hdfs.fileSuffix = .log
> flumeAgentSpoolDir.sinks.hdfsSink.hdfs.batchSize = 50
> flumeAgentSpoolDir.sinks.hdfsSink.hdfs.bufferMaxLines = 50
>
> # Use a channel which buffers events in memory
> flumeAgentSpoolDir.channels.memoryChannel.type = memory
> flumeAgentSpoolDir.channels.memoryChannel.capacity = 10000
> flumeAgentSpoolDir.channels.memoryChannel.transactionCapacity = 1000
> flumeAgentSpoolDir.channels.memoryChannel.byteCapacity = 100000000
>
> # Bind the source and sink to the channel
> flumeAgentSpoolDir.sources.spoolDirSource.channels = memoryChannel
> flumeAgentSpoolDir.sinks.hdfsSink.channel = memoryChannel
>
> *Thanks & Regards,*
>
> *Shiva Ram*
> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>
> On Sun, Oct 4, 2015 at 12:19 PM, IT CTO <go...@gmail.com> wrote:
>
> > Sorry, I can't see attacked file.
> >
> > בתאריך יום א׳, 4 באוק׳ 2015, 08:34 מאת Shiva Ram <
> > shivaram.hadoop2015@gmail.com>:
> >
> > > Thanks for your inputs.
> > >
> > > This is my conf. file.
> > >
> > > *Thanks & Regards,*
> > >
> > > *Shiva Ram*
> > > *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> > > www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
> > >
> > > On Sat, Oct 3, 2015 at 11:09 PM, IT CTO <go...@gmail.com> wrote:
> > >
> > >> Can you share your conf file?
> > >> The size of the file can be determined by few parameters such a roll*
> or
> > >> idle-timeout.
> > >> Eran
> > >>
> > >>
> > >> On Sat, Oct 3, 2015 at 6:33 PM Shiva Ram <
> shivaram.hadoop2015@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > My flume agent conf. file.
> > >> >
> > >>
> > > > *How to increase the output file size? Thanks.*
> > >> >
> > >> > *Thanks & Regards,*
> > >> >
> > >> > *Shiva Ram*
> > >>
> > > > *Website: http://datamaking.com <http://datamaking.com>Facebook
> Page:
> > >> > www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
> > >
> > >
> > >> >
> > >> > On Sat, Oct 3, 2015 at 4:58 PM, Shiva Ram <
> > >> shivaram.hadoop2015@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hi
> > >> >>
> > >> >> I am using spooldir source, memory channel, hdfs sink to collect
> log
> > >> >> files and store into HDFS.
> > >> >>
> > >> >> When I run the flume agent, it is creating very very small files
> with
> > >> >> size 766 bytes.
> > >> >>
> > >> >> Input file: test.log [11.4 KB]
> > >> >> Output files: sales_web_log.1443871052640.log, etc.[all are very
> very
> > >> >> small files with size 766 bytes]
> > >> >>
> > >>
> > > >> *How to increase the output file size?*
> > >> >>
> > >> >> *Thanks & Regards,*
> > >> >>
> > >> >> *Shiva Ram*
> > >>
> > > >> *Website: http://datamaking.com <http://datamaking.com>Facebook
> Page:
> > >> >> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
> > >
> > >
> > >> >>
> > >> >
> > >> > --
> > >> Eran | "You don't need eyes to see, you need vision" (Faithless)
> > >>
> > > --
> > Eran | "You don't need eyes to see, you need vision" (Faithless)
> >
>

Re: With spooldir source, memory channel, hdfs sink --> Output files are very tiny files[766 bytes]

Posted by Shiva Ram <sh...@gmail.com>.
This is my conf file:

flumeAgentSpoolDir.sources = spoolDirSource
flumeAgentSpoolDir.channels = memoryChannel
flumeAgentSpoolDir.sinks = hdfsSink

# source
flumeAgentSpoolDir.sources.spoolDirSource.type = spooldir
flumeAgentSpoolDir.sources.spoolDirSource.channels = memoryChannel
flumeAgentSpoolDir.sources.spoolDirSource.spoolDir =
/home/hduser/Downloads/app_log_data
flumeAgentSpoolDir.sources.spoolDirSource.fileHeader = true

# HDFS sinks
flumeAgentSpoolDir.sinks.hdfsSink.type = hdfs
flumeAgentSpoolDir.sinks.hdfsSink.hdfs.fileType = DataStream
# change to your host
flumeAgentSpoolDir.sinks.hdfsSink.hdfs.path = hdfs://
192.168.234.181:8020/flume_ng/log_data
flumeAgentSpoolDir.sinks.hdfsSink.hdfs.filePrefix = sales_web_log
flumeAgentSpoolDir.sinks.hdfsSink.hdfs.fileSuffix = .log
flumeAgentSpoolDir.sinks.hdfsSink.hdfs.batchSize = 50
flumeAgentSpoolDir.sinks.hdfsSink.hdfs.bufferMaxLines = 50

# Use a channel which buffers events in memory
flumeAgentSpoolDir.channels.memoryChannel.type = memory
flumeAgentSpoolDir.channels.memoryChannel.capacity = 10000
flumeAgentSpoolDir.channels.memoryChannel.transactionCapacity = 1000
flumeAgentSpoolDir.channels.memoryChannel.byteCapacity = 100000000

# Bind the source and sink to the channel
flumeAgentSpoolDir.sources.spoolDirSource.channels = memoryChannel
flumeAgentSpoolDir.sinks.hdfsSink.channel = memoryChannel

*Thanks & Regards,*

*Shiva Ram*
*Website: http://datamaking.com <http://datamaking.com>Facebook Page:
www.facebook.com/datamaking <http://www.facebook.com/datamaking>*

On Sun, Oct 4, 2015 at 12:19 PM, IT CTO <go...@gmail.com> wrote:

> Sorry, I can't see attacked file.
>
> בתאריך יום א׳, 4 באוק׳ 2015, 08:34 מאת Shiva Ram <
> shivaram.hadoop2015@gmail.com>:
>
> > Thanks for your inputs.
> >
> > This is my conf. file.
> >
> > *Thanks & Regards,*
> >
> > *Shiva Ram*
> > *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> > www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
> >
> > On Sat, Oct 3, 2015 at 11:09 PM, IT CTO <go...@gmail.com> wrote:
> >
> >> Can you share your conf file?
> >> The size of the file can be determined by few parameters such a roll* or
> >> idle-timeout.
> >> Eran
> >>
> >>
> >> On Sat, Oct 3, 2015 at 6:33 PM Shiva Ram <shivaram.hadoop2015@gmail.com
> >
> >> wrote:
> >>
> >> > My flume agent conf. file.
> >> >
> >>
> > > *How to increase the output file size? Thanks.*
> >> >
> >> > *Thanks & Regards,*
> >> >
> >> > *Shiva Ram*
> >>
> > > *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> >> > www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
> >
> >
> >> >
> >> > On Sat, Oct 3, 2015 at 4:58 PM, Shiva Ram <
> >> shivaram.hadoop2015@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> I am using spooldir source, memory channel, hdfs sink to collect log
> >> >> files and store into HDFS.
> >> >>
> >> >> When I run the flume agent, it is creating very very small files with
> >> >> size 766 bytes.
> >> >>
> >> >> Input file: test.log [11.4 KB]
> >> >> Output files: sales_web_log.1443871052640.log, etc.[all are very very
> >> >> small files with size 766 bytes]
> >> >>
> >>
> > >> *How to increase the output file size?*
> >> >>
> >> >> *Thanks & Regards,*
> >> >>
> >> >> *Shiva Ram*
> >>
> > >> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> >> >> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
> >
> >
> >> >>
> >> >
> >> > --
> >> Eran | "You don't need eyes to see, you need vision" (Faithless)
> >>
> > --
> Eran | "You don't need eyes to see, you need vision" (Faithless)
>

Re: With spooldir source, memory channel, hdfs sink --> Output files are very tiny files[766 bytes]

Posted by IT CTO <go...@gmail.com>.
Sorry, I can't see attacked file.

בתאריך יום א׳, 4 באוק׳ 2015, 08:34 מאת Shiva Ram <
shivaram.hadoop2015@gmail.com>:

> Thanks for your inputs.
>
> This is my conf. file.
>
> *Thanks & Regards,*
>
> *Shiva Ram*
> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>
> On Sat, Oct 3, 2015 at 11:09 PM, IT CTO <go...@gmail.com> wrote:
>
>> Can you share your conf file?
>> The size of the file can be determined by few parameters such a roll* or
>> idle-timeout.
>> Eran
>>
>>
>> On Sat, Oct 3, 2015 at 6:33 PM Shiva Ram <sh...@gmail.com>
>> wrote:
>>
>> > My flume agent conf. file.
>> >
>>
> > *How to increase the output file size? Thanks.*
>> >
>> > *Thanks & Regards,*
>> >
>> > *Shiva Ram*
>>
> > *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
>> > www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>
>
>> >
>> > On Sat, Oct 3, 2015 at 4:58 PM, Shiva Ram <
>> shivaram.hadoop2015@gmail.com>
>> > wrote:
>> >
>> >> Hi
>> >>
>> >> I am using spooldir source, memory channel, hdfs sink to collect log
>> >> files and store into HDFS.
>> >>
>> >> When I run the flume agent, it is creating very very small files with
>> >> size 766 bytes.
>> >>
>> >> Input file: test.log [11.4 KB]
>> >> Output files: sales_web_log.1443871052640.log, etc.[all are very very
>> >> small files with size 766 bytes]
>> >>
>>
> >> *How to increase the output file size?*
>> >>
>> >> *Thanks & Regards,*
>> >>
>> >> *Shiva Ram*
>>
> >> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
>> >> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>
>
>> >>
>> >
>> > --
>> Eran | "You don't need eyes to see, you need vision" (Faithless)
>>
> --
Eran | "You don't need eyes to see, you need vision" (Faithless)

Re: With spooldir source, memory channel, hdfs sink --> Output files are very tiny files[766 bytes]

Posted by Shiva Ram <sh...@gmail.com>.
Thanks for your inputs.

This is my conf. file.

*Thanks & Regards,*

*Shiva Ram*
*Website: http://datamaking.com <http://datamaking.com>Facebook Page:
www.facebook.com/datamaking <http://www.facebook.com/datamaking>*

On Sat, Oct 3, 2015 at 11:09 PM, IT CTO <go...@gmail.com> wrote:

> Can you share your conf file?
> The size of the file can be determined by few parameters such a roll* or
> idle-timeout.
> Eran
>
>
> On Sat, Oct 3, 2015 at 6:33 PM Shiva Ram <sh...@gmail.com>
> wrote:
>
> > My flume agent conf. file.
> >
> > *How to increase the output file size? Thanks.*
> >
> > *Thanks & Regards,*
> >
> > *Shiva Ram*
> > *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> > www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
> >
> > On Sat, Oct 3, 2015 at 4:58 PM, Shiva Ram <shivaram.hadoop2015@gmail.com
> >
> > wrote:
> >
> >> Hi
> >>
> >> I am using spooldir source, memory channel, hdfs sink to collect log
> >> files and store into HDFS.
> >>
> >> When I run the flume agent, it is creating very very small files with
> >> size 766 bytes.
> >>
> >> Input file: test.log [11.4 KB]
> >> Output files: sales_web_log.1443871052640.log, etc.[all are very very
> >> small files with size 766 bytes]
> >>
> >> *How to increase the output file size?*
> >>
> >> *Thanks & Regards,*
> >>
> >> *Shiva Ram*
> >> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> >> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
> >>
> >
> > --
> Eran | "You don't need eyes to see, you need vision" (Faithless)
>

Re: With spooldir source, memory channel, hdfs sink --> Output files are very tiny files[766 bytes]

Posted by IT CTO <go...@gmail.com>.
Can you share your conf file?
The size of the file can be determined by few parameters such a roll* or
idle-timeout.
Eran


On Sat, Oct 3, 2015 at 6:33 PM Shiva Ram <sh...@gmail.com>
wrote:

> My flume agent conf. file.
>
> *How to increase the output file size? Thanks.*
>
> *Thanks & Regards,*
>
> *Shiva Ram*
> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>
> On Sat, Oct 3, 2015 at 4:58 PM, Shiva Ram <sh...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am using spooldir source, memory channel, hdfs sink to collect log
>> files and store into HDFS.
>>
>> When I run the flume agent, it is creating very very small files with
>> size 766 bytes.
>>
>> Input file: test.log [11.4 KB]
>> Output files: sales_web_log.1443871052640.log, etc.[all are very very
>> small files with size 766 bytes]
>>
>> *How to increase the output file size?*
>>
>> *Thanks & Regards,*
>>
>> *Shiva Ram*
>> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
>> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>>
>
> --
Eran | "You don't need eyes to see, you need vision" (Faithless)

Re: With spooldir source, memory channel, hdfs sink --> Output files are very tiny files[766 bytes]

Posted by Gonzalo Herreros <gh...@gmail.com>.
You forgot to send the conf file.
Anyway, hdfs files are rotate similar to those of Log4j: either by time or
size whichever is hit first.

By default it rotates every 30 seconds or 1MB, so in your case I guess it's
hitting the 30 seconds limit.
Change the properties: hdfs.rollInterval and hdfs.rollSize

See more https://flume.apache.org/FlumeUserGuide.html#hdfs-sink

Regards,
Gonzalo


On 3 October 2015 at 13:11, Shiva Ram <sh...@gmail.com> wrote:

> My flume agent conf. file.
>
> *How to increase the output file size? Thanks.*
>
> *Thanks & Regards,*
>
> *Shiva Ram*
> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>
> On Sat, Oct 3, 2015 at 4:58 PM, Shiva Ram <sh...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am using spooldir source, memory channel, hdfs sink to collect log
>> files and store into HDFS.
>>
>> When I run the flume agent, it is creating very very small files with
>> size 766 bytes.
>>
>> Input file: test.log [11.4 KB]
>> Output files: sales_web_log.1443871052640.log, etc.[all are very very
>> small files with size 766 bytes]
>>
>> *How to increase the output file size?*
>>
>> *Thanks & Regards,*
>>
>> *Shiva Ram*
>> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
>> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>>
>
>

Re: With spooldir source, memory channel, hdfs sink --> Output files are very tiny files[766 bytes]

Posted by Shiva Ram <sh...@gmail.com>.
My flume agent conf. file.

*How to increase the output file size? Thanks.*

*Thanks & Regards,*

*Shiva Ram*
*Website: http://datamaking.com <http://datamaking.com>Facebook Page:
www.facebook.com/datamaking <http://www.facebook.com/datamaking>*

On Sat, Oct 3, 2015 at 4:58 PM, Shiva Ram <sh...@gmail.com>
wrote:

> Hi
>
> I am using spooldir source, memory channel, hdfs sink to collect log files
> and store into HDFS.
>
> When I run the flume agent, it is creating very very small files with size
> 766 bytes.
>
> Input file: test.log [11.4 KB]
> Output files: sales_web_log.1443871052640.log, etc.[all are very very
> small files with size 766 bytes]
>
> *How to increase the output file size?*
>
> *Thanks & Regards,*
>
> *Shiva Ram*
> *Website: http://datamaking.com <http://datamaking.com>Facebook Page:
> www.facebook.com/datamaking <http://www.facebook.com/datamaking>*
>