You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Jagadish Bihani <ja...@pubmatic.com> on 2012/10/22 13:48:05 UTC

File Channel performance and fsync

Hi

I am writing this on top of another thread where there was discussion on 
"fsync lies" and
only file channel used fsync and not file sink. :

-- I tested the fsync performance on 2 machines  (On 1 machine I was 
getting very good throughput
using file channel and on another almost 100 times slower with almost 
same hardware configuration.)
using following code


#define PAGESIZE 4096

int main(int argc, char *argv[])
{

         char my_write_str[PAGESIZE];
         char my_read_str[PAGESIZE];
         char *read_filename= argv[1];
         int readfd,writefd;

         readfd = open(read_filename,O_RDONLY);
         writefd = open("written_file",O_WRONLY|O_CREAT,777);
         int len=lseek(readfd,0,2);
         lseek(readfd,0,0);
         int iterations = len/PAGESIZE;
         int i;
         struct timeval t0,t1;

        for(i=0;i<iterations;i++)
         {

                 read(readfd,my_read_str,PAGESIZE);
                 write(writefd,my_read_str,PAGESIZE);
*gettimeofday(&t0,0);**
**                fsync(writefd);**
**              gettimeofday(&t1,0);*
                 long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 + 
t1.tv_usec-t0.tv_usec;
                 printf("Elapsed time is= %ld \n",elapsed);
          }
         close(readfd);
         close(writefd);
}


-- As expected it requires typically 50000 microseconds for fsync to 
complete on one machine and 200 microseconds
on another machine it took 290 microseconds to complete on an average. 
So is machine with higher
performance is doing a 'fsync lie'?
i
-- If I have understood it clearly; "fsync lie" means the data is not 
actually written to disk and it is in
some disk/controller buffer.  I) Now if disk loses power due to some 
shutdown or any other disaster, data will
be lost. II) Can data be lost even without it ? (e.g. if it is keeping 
data in some disk buffer and if fsync is being
invoked continuously then will that data can also  be lost? If only part 
-I is true; then it can be acceptable
because probability of shutdown is usually less in production 
environment. But if even II is true then there is a
problem.

-- But on the machine where disk doesn't lie performance of flume using 
File channel is very low (I have seen it
maximum 100 KB/sec even with sufficient  DirectMemory allocation.) Does 
anybody have stats about throughput
of file channel ? Is anybody getting better performance with file 
channel (without fsync lies). What is the recommended
usage of it for an average scenario ? (Transferring files of few MBs to 
HDFS sink continuously on typical hardware
(16 core processors, 16 GB RAM etc.)


Regards,
Jagadish

On 10/10/2012 11:30 PM, Brock Noland wrote:
> Hi,
>
> On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
> <ja...@pubmatic.com> wrote:
>> Hi Brock
>>
>> I will surely look into 'fsync lies'.
>>
>> But as per my experiments I think "file channel" is causing the issue.
>> Because on those 2 machines (one with higher throughput and other with
>> lower)
>> I did following experiment:
>>
>> cat Source -memory channel - file sink.
>>
>> Now with this setup I got same throughput on both the machines. (around 3
>> MB/sec)
>> Now as I have used "File sink" it should also do "fsync" at some point of
>> time.
>> 'File Sink' and 'File Channel' both do disk writes.
>> So if there is differences in disk behaviour then even in the 'File Sink' it
>> should be visible.
>>
>> Am I missing something here?
> File sink does not call fsync.
>
>> Regards,
>> Jagadish
>>
>>
>>
>> On 10/10/2012 09:35 PM, Brock Noland wrote:
>>> OK your disk that is giving you 40KB/second is telling you the truth
>>> and the faster disk is lying to you. Look up "fsync lies" to see what
>>> I am referring to.
>>>
>>> A spinning disk can do 100 fsync operations per second (this is done
>>> at the end of every batch). That is how I estimated your event size,
>>> 40KB/second is doing 40KB / 100 =  409 bytes.
>>>
>>> Once again, if you want increased performance, you should increase the
>>> batch size.
>>>
>>> Brock
>>>
>>> On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
>>> <ja...@pubmatic.com> wrote:
>>>> Hi
>>>>
>>>> Yes. It is around 480 - 500 bytes.
>>>>
>>>>
>>>> On 10/10/2012 09:24 PM, Brock Noland wrote:
>>>>> How big are your events? Average about 400 bytes?
>>>>>
>>>>> Brock
>>>>>
>>>>> On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
>>>>> <ja...@pubmatic.com> wrote:
>>>>>> Hi
>>>>>>
>>>>>> Thanks for the inputs Brock. After doing several experiments
>>>>>> eventually problem boiled down to disks.
>>>>>>
>>>>>>     -- But I had used the same configuration (so all software components
>>>>>> are
>>>>>> same in all 3 machines)
>>>>>> on all 3 machines.
>>>>>> -- In User guide it is written that if multiple file channel instances
>>>>>> are
>>>>>> active on the same agent then
>>>>>> different disks are preferable. But in my case only one file channel is
>>>>>> active per agent.
>>>>>> -- Only one pattern I observed that on the machines where I got better
>>>>>> performance have multiple disks.
>>>>>> But I don't understand how that will help if I have only 1 active file
>>>>>> channel.
>>>>>> -- What is the impact of the type of disk/disk device driver on
>>>>>> performance?
>>>>>> I mean I don't understand
>>>>>> with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>>>>>>
>>>>>> Could you please elaborate on File channel and disks correlation.
>>>>>>
>>>>>> Regards,
>>>>>> Jagadish
>>>>>>
>>>>>>
>>>>>> On 10/09/2012 08:01 PM, Brock Noland wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Using file channel, in terms of performance, the number and type of
>>>>>> disks is going to be much more predictive of performance than CPU or
>>>>>> RAM. Note that consumer level drives/controllers will give you much
>>>>>> "better" performance because they lie to you about when your data is
>>>>>> actually written to the drive. If you search for "fsync lies" you'll
>>>>>> find more information on this.
>>>>>>
>>>>>> You probably want to increase the batch size to get better performance.
>>>>>>
>>>>>> Brock
>>>>>>
>>>>>> On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
>>>>>> <ja...@pubmatic.com> wrote:
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> My flume setup is:
>>>>>>
>>>>>> Source Agent : cat source - File Channel - Avro Sink
>>>>>> Dest Agent :     avro source - File Channel - HDFS Sink.
>>>>>>
>>>>>> There is only 1 source agent and 1 destination agent.
>>>>>>
>>>>>> I measure throughput as amount of data written to HDFS per second.
>>>>>> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
>>>>>> sec
>>>>>> the
>>>>>> throughput is : -- 2 MB/sec ).
>>>>>>
>>>>>> I have run source agent on various machines with different hardware
>>>>>> configurations :
>>>>>> (In all cases I run flume agent with JAVA OPTIONS as
>>>>>> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>>>>>> -XX:MaxDirectMemorySize=2g")
>>>>>>
>>>>>> JDK is 32 bit.
>>>>>>
>>>>>> Experiment 1:
>>>>>> =====
>>>>>> RAM : 16 GB
>>>>>> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>>>>>> 64 bit Processor with 64 bit Kernel.
>>>>>> Throughput: 2 MB/sec
>>>>>>
>>>>>> Experiment 2:
>>>>>> ======
>>>>>> RAM : 4 GB
>>>>>> Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>>>>>> 64 bit Processor with 32 bit Kernel.
>>>>>> Throughput : 30 KB/sec
>>>>>>
>>>>>> Experiment 3:
>>>>>> ======
>>>>>> RAM : 8 GB
>>>>>> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>>>>>> 64 bit Processor with 32 bit Kernel.
>>>>>> Throughput : 80 KB/sec
>>>>>>
>>>>>>     -- So as can be seen there is huge difference in the throughput with
>>>>>> same
>>>>>> configuration but
>>>>>> different hardware.
>>>>>> -- In the first case where throughput is more RES is around 160 MB in
>>>>>> other
>>>>>> cases it is in
>>>>>> the range of 40 MB - 50 MB.
>>>>>>
>>>>>> Can anybody please give insights that why there is this huge difference
>>>>>> in
>>>>>> the throughput?
>>>>>> What is the correlation between RAM and filechannel/HDFS sink
>>>>>> performance
>>>>>> and also
>>>>>> with 32-bit/64 bit kernel?
>>>>>>
>>>>>> Regards,
>>>>>> Jagadish
>>>>>>
>>>>>>
>>>>>>
>>>
>
>

Re: File Channel performance and fsync

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

Without the fsync guarrantees are weakened a lot more than the fsync 
lying case.

Also, you didn't mention the batch size on your avro sink that is 
sending data to the avro-source. This is a major factor on your 
throughput because each batch causes one sync. If you have big batches, 
you'll have few fsyncs and significantly better performance.

I am weirded out by the fact that Danny is getting improved performance 
by running multiple parallel file sinks... Are they each on separate 
disks or something? I can't imagine what could cause a performance gain 
if they were all on the same disk. Would likely expect more write head 
skipping around and degradation even...

On 10/23/2012 03:31 PM, Jagadish Bihani wrote:
> Hi Denny
>
> Thanks for the inputs.
> Btw when you say you tested another case without 'fsync'; I think
> you changed the file channel code to comment out 'flush' part of it.
> And if we rely on OS flushing then still it can be reasonably reliable.
> Is that right?
>
> Regards,
> Jagadish
>
> On 10/22/2012 07:08 PM, Denny Ye wrote:
>> hi Jagadish,
>>    I have tested performance of FileChannel recently. Here I can 
>> support the test report to you for your thinking and questions at 
>> this thread.
>>     Talking about the comparison between FileChannel and File Sink. 
>> FileChannel supports both sequential writer and random reader, there 
>> have so many times shift of magnetic head, it's slow than the 
>> sequential writing much more.
>>     'fsync' command has consuming much time than writing, almost 
>> 100times/sec, same as number mentioned from Brock. Also, I didn't 
>> know why there have such difference between your two servers. I think 
>> it might be related with OS version (usage between fsync and 
>> fdatasync instruction) or disk driver (RAID, caching strategy, and so 
>> on).
>>     Throughput of single FileChannel is almost 3-5MB/sec in my 
>> environment. Thus I used 5 channels with 18MB/sec. It's hard to 
>> believe the linear increasing with more channels. Meanwhile, it look 
>> like the limit of throughput with 'fsync' operation. I tested another 
>> case without 'fsync' operation after each batch, almost 
>> 35-40MB/sec(Also, I removed the pre-allocation at disk writing in 
>> this case).
>>     Hope useful for you.
>>
>>    PS : I heard that OS has demon thread to flush page cache to 
>> disk asynchronously with second latency, does it's effective for 
>> amount of data with tolerant loss?
>>
>> -Regards
>> Denny Ye
>>
>> 2012/10/22 Jagadish Bihani <jagadish.bihani@pubmatic.com 
>> <ma...@pubmatic.com>>
>>
>>     Hi
>>
>>     I am writing this on top of another thread where there was
>>     discussion on "fsync lies" and
>>     only file channel used fsync and not file sink. :
>>
>>     -- I tested the fsync performance on 2 machines  (On 1 machine I
>>     was getting very good throughput
>>     using file channel and on another almost 100 times slower with
>>     almost same hardware configuration.)
>>     using following code
>>
>>
>>     #define PAGESIZE 4096
>>
>>     int main(int argc, char *argv[])
>>     {
>>
>>             char my_write_str[PAGESIZE];
>>             char my_read_str[PAGESIZE];
>>             char *read_filename= argv[1];
>>             int readfd,writefd;
>>
>>             readfd = open(read_filename,O_RDONLY);
>>             writefd = open("written_file",O_WRONLY|O_CREAT,777);
>>             int len=lseek(readfd,0,2);
>>             lseek(readfd,0,0);
>>             int iterations = len/PAGESIZE;
>>             int i;
>>             struct timeval t0,t1;
>>
>>            for(i=0;i<iterations;i++)
>>             {
>>
>>                     read(readfd,my_read_str,PAGESIZE);
>>                     write(writefd,my_read_str,PAGESIZE);
>>     *gettimeofday(&t0,0);**
>>     **                fsync(writefd);**
>>     **              gettimeofday(&t1,0);*
>>                     long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
>>     t1.tv_usec-t0.tv_usec;
>>                     printf("Elapsed time is= %ld \n",elapsed);
>>              }
>>             close(readfd);
>>             close(writefd);
>>     }
>>
>>
>>     -- As expected it requires typically 50000 microseconds for fsync
>>     to complete on one machine and 200 microseconds
>>     on another machine it took 290 microseconds to complete on an
>>     average. So is machine with higher
>>     performance is doing a 'fsync lie'?
>>     i
>>     -- If I have understood it clearly; "fsync lie" means the data is
>>     not actually written to disk and it is in
>>     some disk/controller buffer.  I) Now if disk loses power due to
>>     some shutdown or any other disaster, data will
>>     be lost. II) Can data be lost even without it ? (e.g. if it is
>>     keeping data in some disk buffer and if fsync is being
>>     invoked continuously then will that data can also  be lost? If
>>     only part -I is true; then it can be acceptable
>>     because probability of shutdown is usually less in production
>>     environment. But if even II is true then there is a
>>     problem.
>>
>>     -- But on the machine where disk doesn't lie performance of flume
>>     using File channel is very low (I have seen it
>>     maximum 100 KB/sec even with sufficient  DirectMemory
>>     allocation.) Does anybody have stats about throughput
>>     of file channel ? Is anybody getting better performance with file
>>     channel (without fsync lies). What is the recommended
>>     usage of it for an average scenario ? (Transferring files of few
>>     MBs to HDFS sink continuously on typical hardware
>>     (16 core processors, 16 GB RAM etc.)
>>
>>
>>     Regards,
>>     Jagadish
>>
>>     On 10/10/2012 11:30 PM, Brock Noland wrote:
>>>     Hi,
>>>
>>>     On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>     Hi Brock
>>>>
>>>>     I will surely look into 'fsync lies'.
>>>>
>>>>     But as per my experiments I think "file channel" is causing the issue.
>>>>     Because on those 2 machines (one with higher throughput and other with
>>>>     lower)
>>>>     I did following experiment:
>>>>
>>>>     cat Source -memory channel - file sink.
>>>>
>>>>     Now with this setup I got same throughput on both the machines. (around 3
>>>>     MB/sec)
>>>>     Now as I have used "File sink" it should also do "fsync" at some point of
>>>>     time.
>>>>     'File Sink' and 'File Channel' both do disk writes.
>>>>     So if there is differences in disk behaviour then even in the 'File Sink' it
>>>>     should be visible.
>>>>
>>>>     Am I missing something here?
>>>     File sink does not call fsync.
>>>
>>>>     Regards,
>>>>     Jagadish
>>>>
>>>>
>>>>
>>>>     On 10/10/2012 09:35 PM, Brock Noland wrote:
>>>>>     OK your disk that is giving you 40KB/second is telling you the truth
>>>>>     and the faster disk is lying to you. Look up "fsync lies" to see what
>>>>>     I am referring to.
>>>>>
>>>>>     A spinning disk can do 100 fsync operations per second (this is done
>>>>>     at the end of every batch). That is how I estimated your event size,
>>>>>     40KB/second is doing 40KB / 100 =  409 bytes.
>>>>>
>>>>>     Once again, if you want increased performance, you should increase the
>>>>>     batch size.
>>>>>
>>>>>     Brock
>>>>>
>>>>>     On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>     Hi
>>>>>>
>>>>>>     Yes. It is around 480 - 500 bytes.
>>>>>>
>>>>>>
>>>>>>     On 10/10/2012 09:24 PM, Brock Noland wrote:
>>>>>>>     How big are your events? Average about 400 bytes?
>>>>>>>
>>>>>>>     Brock
>>>>>>>
>>>>>>>     On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>>     Hi
>>>>>>>>
>>>>>>>>     Thanks for the inputs Brock. After doing several experiments
>>>>>>>>     eventually problem boiled down to disks.
>>>>>>>>
>>>>>>>>         -- But I had used the same configuration (so all software components
>>>>>>>>     are
>>>>>>>>     same in all 3 machines)
>>>>>>>>     on all 3 machines.
>>>>>>>>     -- In User guide it is written that if multiple file channel instances
>>>>>>>>     are
>>>>>>>>     active on the same agent then
>>>>>>>>     different disks are preferable. But in my case only one file channel is
>>>>>>>>     active per agent.
>>>>>>>>     -- Only one pattern I observed that on the machines where I got better
>>>>>>>>     performance have multiple disks.
>>>>>>>>     But I don't understand how that will help if I have only 1 active file
>>>>>>>>     channel.
>>>>>>>>     -- What is the impact of the type of disk/disk device driver on
>>>>>>>>     performance?
>>>>>>>>     I mean I don't understand
>>>>>>>>     with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>>>>>>>>
>>>>>>>>     Could you please elaborate on File channel and disks correlation.
>>>>>>>>
>>>>>>>>     Regards,
>>>>>>>>     Jagadish
>>>>>>>>
>>>>>>>>
>>>>>>>>     On 10/09/2012 08:01 PM, Brock Noland wrote:
>>>>>>>>
>>>>>>>>     Hi,
>>>>>>>>
>>>>>>>>     Using file channel, in terms of performance, the number and type of
>>>>>>>>     disks is going to be much more predictive of performance than CPU or
>>>>>>>>     RAM. Note that consumer level drives/controllers will give you much
>>>>>>>>     "better" performance because they lie to you about when your data is
>>>>>>>>     actually written to the drive. If you search for "fsync lies" you'll
>>>>>>>>     find more information on this.
>>>>>>>>
>>>>>>>>     You probably want to increase the batch size to get better performance.
>>>>>>>>
>>>>>>>>     Brock
>>>>>>>>
>>>>>>>>     On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
>>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>>
>>>>>>>>     Hi
>>>>>>>>
>>>>>>>>     My flume setup is:
>>>>>>>>
>>>>>>>>     Source Agent : cat source - File Channel - Avro Sink
>>>>>>>>     Dest Agent :     avro source - File Channel - HDFS Sink.
>>>>>>>>
>>>>>>>>     There is only 1 source agent and 1 destination agent.
>>>>>>>>
>>>>>>>>     I measure throughput as amount of data written to HDFS per second.
>>>>>>>>     ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
>>>>>>>>     sec
>>>>>>>>     the
>>>>>>>>     throughput is : -- 2 MB/sec ).
>>>>>>>>
>>>>>>>>     I have run source agent on various machines with different hardware
>>>>>>>>     configurations :
>>>>>>>>     (In all cases I run flume agent with JAVA OPTIONS as
>>>>>>>>     "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>>>>>>>>     -XX:MaxDirectMemorySize=2g")
>>>>>>>>
>>>>>>>>     JDK is 32 bit.
>>>>>>>>
>>>>>>>>     Experiment 1:
>>>>>>>>     =====
>>>>>>>>     RAM : 16 GB
>>>>>>>>     Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>>>>>>>>     64 bit Processor with 64 bit Kernel.
>>>>>>>>     Throughput: 2 MB/sec
>>>>>>>>
>>>>>>>>     Experiment 2:
>>>>>>>>     ======
>>>>>>>>     RAM : 4 GB
>>>>>>>>     Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>>     Throughput : 30 KB/sec
>>>>>>>>
>>>>>>>>     Experiment 3:
>>>>>>>>     ======
>>>>>>>>     RAM : 8 GB
>>>>>>>>     Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>>     Throughput : 80 KB/sec
>>>>>>>>
>>>>>>>>         -- So as can be seen there is huge difference in the throughput with
>>>>>>>>     same
>>>>>>>>     configuration but
>>>>>>>>     different hardware.
>>>>>>>>     -- In the first case where throughput is more RES is around 160 MB in
>>>>>>>>     other
>>>>>>>>     cases it is in
>>>>>>>>     the range of 40 MB - 50 MB.
>>>>>>>>
>>>>>>>>     Can anybody please give insights that why there is this huge difference
>>>>>>>>     in
>>>>>>>>     the throughput?
>>>>>>>>     What is the correlation between RAM and filechannel/HDFS sink
>>>>>>>>     performance
>>>>>>>>     and also
>>>>>>>>     with 32-bit/64 bit kernel?
>>>>>>>>
>>>>>>>>     Regards,
>>>>>>>>     Jagadish
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>
>>
>

Re: File Channel performance and fsync

Posted by Jagadish Bihani <ja...@pubmatic.com>.

Hi Denny

Thanks for the inputs.
Btw when you say you tested another case without 'fsync'; I think
you changed the file channel code to comment out 'flush' part of it.
And if we rely on OS flushing then still it can be reasonably reliable.
Is that right?

Regards,
Jagadish

On 10/22/2012 07:08 PM, Denny Ye wrote:
> hi Jagadish,
>    I have tested performance of FileChannel recently. Here I can 
> support the test report to you for your thinking and questions at this 
> thread.
>     Talking about the comparison between FileChannel and File Sink. 
> FileChannel supports both sequential writer and random reader, there 
> have so many times shift of magnetic head, it's slow than the 
> sequential writing much more.
>     'fsync' command has consuming much time than writing, almost 
> 100times/sec, same as number mentioned from Brock. Also, I didn't know 
> why there have such difference between your two servers. I think it 
> might be related with OS version (usage between fsync and fdatasync 
> instruction) or disk driver (RAID, caching strategy, and so on).
>     Throughput of single FileChannel is almost 3-5MB/sec in my 
> environment. Thus I used 5 channels with 18MB/sec. It's hard to 
> believe the linear increasing with more channels. Meanwhile, it look 
> like the limit of throughput with 'fsync' operation. I tested another 
> case without 'fsync' operation after each batch, almost 
> 35-40MB/sec(Also, I removed the pre-allocation at disk writing in this 
> case).
>     Hope useful for you.
>
>    PS : I heard that OS has demon thread to flush page cache to 
> disk asynchronously with second latency, does it's effective for 
> amount of data with tolerant loss?
>
> -Regards
> Denny Ye
>
> 2012/10/22 Jagadish Bihani <jagadish.bihani@pubmatic.com 
> <ma...@pubmatic.com>>
>
>     Hi
>
>     I am writing this on top of another thread where there was
>     discussion on "fsync lies" and
>     only file channel used fsync and not file sink. :
>
>     -- I tested the fsync performance on 2 machines  (On 1 machine I
>     was getting very good throughput
>     using file channel and on another almost 100 times slower with
>     almost same hardware configuration.)
>     using following code
>
>
>     #define PAGESIZE 4096
>
>     int main(int argc, char *argv[])
>     {
>
>             char my_write_str[PAGESIZE];
>             char my_read_str[PAGESIZE];
>             char *read_filename= argv[1];
>             int readfd,writefd;
>
>             readfd = open(read_filename,O_RDONLY);
>             writefd = open("written_file",O_WRONLY|O_CREAT,777);
>             int len=lseek(readfd,0,2);
>             lseek(readfd,0,0);
>             int iterations = len/PAGESIZE;
>             int i;
>             struct timeval t0,t1;
>
>            for(i=0;i<iterations;i++)
>             {
>
>                     read(readfd,my_read_str,PAGESIZE);
>                     write(writefd,my_read_str,PAGESIZE);
>     *gettimeofday(&t0,0);**
>     **                fsync(writefd);**
>     **              gettimeofday(&t1,0);*
>                     long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
>     t1.tv_usec-t0.tv_usec;
>                     printf("Elapsed time is= %ld \n",elapsed);
>              }
>             close(readfd);
>             close(writefd);
>     }
>
>
>     -- As expected it requires typically 50000 microseconds for fsync
>     to complete on one machine and 200 microseconds
>     on another machine it took 290 microseconds to complete on an
>     average. So is machine with higher
>     performance is doing a 'fsync lie'?
>     i
>     -- If I have understood it clearly; "fsync lie" means the data is
>     not actually written to disk and it is in
>     some disk/controller buffer.  I) Now if disk loses power due to
>     some shutdown or any other disaster, data will
>     be lost. II) Can data be lost even without it ? (e.g. if it is
>     keeping data in some disk buffer and if fsync is being
>     invoked continuously then will that data can also  be lost? If
>     only part -I is true; then it can be acceptable
>     because probability of shutdown is usually less in production
>     environment. But if even II is true then there is a
>     problem.
>
>     -- But on the machine where disk doesn't lie performance of flume
>     using File channel is very low (I have seen it
>     maximum 100 KB/sec even with sufficient  DirectMemory allocation.)
>     Does anybody have stats about throughput
>     of file channel ? Is anybody getting better performance with file
>     channel (without fsync lies). What is the recommended
>     usage of it for an average scenario ? (Transferring files of few
>     MBs to HDFS sink continuously on typical hardware
>     (16 core processors, 16 GB RAM etc.)
>
>
>     Regards,
>     Jagadish
>
>     On 10/10/2012 11:30 PM, Brock Noland wrote:
>>     Hi,
>>
>>     On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>     Hi Brock
>>>
>>>     I will surely look into 'fsync lies'.
>>>
>>>     But as per my experiments I think "file channel" is causing the issue.
>>>     Because on those 2 machines (one with higher throughput and other with
>>>     lower)
>>>     I did following experiment:
>>>
>>>     cat Source -memory channel - file sink.
>>>
>>>     Now with this setup I got same throughput on both the machines. (around 3
>>>     MB/sec)
>>>     Now as I have used "File sink" it should also do "fsync" at some point of
>>>     time.
>>>     'File Sink' and 'File Channel' both do disk writes.
>>>     So if there is differences in disk behaviour then even in the 'File Sink' it
>>>     should be visible.
>>>
>>>     Am I missing something here?
>>     File sink does not call fsync.
>>
>>>     Regards,
>>>     Jagadish
>>>
>>>
>>>
>>>     On 10/10/2012 09:35 PM, Brock Noland wrote:
>>>>     OK your disk that is giving you 40KB/second is telling you the truth
>>>>     and the faster disk is lying to you. Look up "fsync lies" to see what
>>>>     I am referring to.
>>>>
>>>>     A spinning disk can do 100 fsync operations per second (this is done
>>>>     at the end of every batch). That is how I estimated your event size,
>>>>     40KB/second is doing 40KB / 100 =  409 bytes.
>>>>
>>>>     Once again, if you want increased performance, you should increase the
>>>>     batch size.
>>>>
>>>>     Brock
>>>>
>>>>     On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>     Hi
>>>>>
>>>>>     Yes. It is around 480 - 500 bytes.
>>>>>
>>>>>
>>>>>     On 10/10/2012 09:24 PM, Brock Noland wrote:
>>>>>>     How big are your events? Average about 400 bytes?
>>>>>>
>>>>>>     Brock
>>>>>>
>>>>>>     On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>     Hi
>>>>>>>
>>>>>>>     Thanks for the inputs Brock. After doing several experiments
>>>>>>>     eventually problem boiled down to disks.
>>>>>>>
>>>>>>>         -- But I had used the same configuration (so all software components
>>>>>>>     are
>>>>>>>     same in all 3 machines)
>>>>>>>     on all 3 machines.
>>>>>>>     -- In User guide it is written that if multiple file channel instances
>>>>>>>     are
>>>>>>>     active on the same agent then
>>>>>>>     different disks are preferable. But in my case only one file channel is
>>>>>>>     active per agent.
>>>>>>>     -- Only one pattern I observed that on the machines where I got better
>>>>>>>     performance have multiple disks.
>>>>>>>     But I don't understand how that will help if I have only 1 active file
>>>>>>>     channel.
>>>>>>>     -- What is the impact of the type of disk/disk device driver on
>>>>>>>     performance?
>>>>>>>     I mean I don't understand
>>>>>>>     with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>>>>>>>
>>>>>>>     Could you please elaborate on File channel and disks correlation.
>>>>>>>
>>>>>>>     Regards,
>>>>>>>     Jagadish
>>>>>>>
>>>>>>>
>>>>>>>     On 10/09/2012 08:01 PM, Brock Noland wrote:
>>>>>>>
>>>>>>>     Hi,
>>>>>>>
>>>>>>>     Using file channel, in terms of performance, the number and type of
>>>>>>>     disks is going to be much more predictive of performance than CPU or
>>>>>>>     RAM. Note that consumer level drives/controllers will give you much
>>>>>>>     "better" performance because they lie to you about when your data is
>>>>>>>     actually written to the drive. If you search for "fsync lies" you'll
>>>>>>>     find more information on this.
>>>>>>>
>>>>>>>     You probably want to increase the batch size to get better performance.
>>>>>>>
>>>>>>>     Brock
>>>>>>>
>>>>>>>     On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>
>>>>>>>     Hi
>>>>>>>
>>>>>>>     My flume setup is:
>>>>>>>
>>>>>>>     Source Agent : cat source - File Channel - Avro Sink
>>>>>>>     Dest Agent :     avro source - File Channel - HDFS Sink.
>>>>>>>
>>>>>>>     There is only 1 source agent and 1 destination agent.
>>>>>>>
>>>>>>>     I measure throughput as amount of data written to HDFS per second.
>>>>>>>     ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
>>>>>>>     sec
>>>>>>>     the
>>>>>>>     throughput is : -- 2 MB/sec ).
>>>>>>>
>>>>>>>     I have run source agent on various machines with different hardware
>>>>>>>     configurations :
>>>>>>>     (In all cases I run flume agent with JAVA OPTIONS as
>>>>>>>     "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>>>>>>>     -XX:MaxDirectMemorySize=2g")
>>>>>>>
>>>>>>>     JDK is 32 bit.
>>>>>>>
>>>>>>>     Experiment 1:
>>>>>>>     =====
>>>>>>>     RAM : 16 GB
>>>>>>>     Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>>>>>>>     64 bit Processor with 64 bit Kernel.
>>>>>>>     Throughput: 2 MB/sec
>>>>>>>
>>>>>>>     Experiment 2:
>>>>>>>     ======
>>>>>>>     RAM : 4 GB
>>>>>>>     Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>     Throughput : 30 KB/sec
>>>>>>>
>>>>>>>     Experiment 3:
>>>>>>>     ======
>>>>>>>     RAM : 8 GB
>>>>>>>     Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>     Throughput : 80 KB/sec
>>>>>>>
>>>>>>>         -- So as can be seen there is huge difference in the throughput with
>>>>>>>     same
>>>>>>>     configuration but
>>>>>>>     different hardware.
>>>>>>>     -- In the first case where throughput is more RES is around 160 MB in
>>>>>>>     other
>>>>>>>     cases it is in
>>>>>>>     the range of 40 MB - 50 MB.
>>>>>>>
>>>>>>>     Can anybody please give insights that why there is this huge difference
>>>>>>>     in
>>>>>>>     the throughput?
>>>>>>>     What is the correlation between RAM and filechannel/HDFS sink
>>>>>>>     performance
>>>>>>>     and also
>>>>>>>     with 32-bit/64 bit kernel?
>>>>>>>
>>>>>>>     Regards,
>>>>>>>     Jagadish
>>>>>>>
>>>>>>>
>>>>>>>
>
>

Re: File Channel performance and fsync

Posted by Denny Ye <de...@gmail.com>.

hi Jagadish,
   I have tested performance of FileChannel recently. Here I can support
the test report to you for your thinking and questions at this thread.
    Talking about the comparison between FileChannel and File Sink.
FileChannel supports both sequential writer and random reader, there have
so many times shift of magnetic head, it's slow than the sequential writing
much more.
    'fsync' command has consuming much time than writing, almost
100times/sec, same as number mentioned from Brock. Also, I didn't know why
there have such difference between your two servers. I think it might be
related with OS version (usage between fsync and fdatasync instruction) or
disk driver (RAID, caching strategy, and so on).
    Throughput of single FileChannel is almost 3-5MB/sec in my environment.
Thus I used 5 channels with 18MB/sec. It's hard to believe the linear
increasing with more channels. Meanwhile, it look like the limit of
throughput with 'fsync' operation. I tested another case without 'fsync'
operation after each batch, almost 35-40MB/sec(Also, I removed the
pre-allocation at disk writing in this case).
    Hope useful for you.

   PS : I heard that OS has demon thread to flush page cache to
disk asynchronously with second latency, does it's effective for amount of
data with tolerant loss?


-Regards
Denny Ye

2012/10/22 Jagadish Bihani <ja...@pubmatic.com>

>  Hi
>
> I am writing this on top of another thread where there was discussion on
> "fsync lies" and
> only file channel used fsync and not file sink. :
>
> -- I tested the fsync performance on 2 machines  (On 1 machine I was
> getting very good throughput
> using file channel and on another almost 100 times slower with almost same
> hardware configuration.)
> using following code
>
>
> #define PAGESIZE 4096
>
> int main(int argc, char *argv[])
> {
>
>         char my_write_str[PAGESIZE];
>         char my_read_str[PAGESIZE];
>         char *read_filename= argv[1];
>         int readfd,writefd;
>
>         readfd = open(read_filename,O_RDONLY);
>         writefd = open("written_file",O_WRONLY|O_CREAT,777);
>         int len=lseek(readfd,0,2);
>         lseek(readfd,0,0);
>         int iterations = len/PAGESIZE;
>         int i;
>         struct timeval t0,t1;
>
>        for(i=0;i<iterations;i++)
>         {
>
>                 read(readfd,my_read_str,PAGESIZE);
>                 write(writefd,my_read_str,PAGESIZE);
>                 *gettimeofday(&t0,0);**
> **                fsync(writefd);**
> **              gettimeofday(&t1,0);*
>                 long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
> t1.tv_usec-t0.tv_usec;
>                 printf("Elapsed time is= %ld \n",elapsed);
>          }
>         close(readfd);
>         close(writefd);
> }
>
>
> -- As expected it requires typically 50000 microseconds for fsync to
> complete on one machine and 200 microseconds
> on another machine it took 290 microseconds to complete on an average. So
> is machine with higher
> performance is doing a 'fsync lie'?
> i
> -- If I have understood it clearly; "fsync lie" means the data is not
> actually written to disk and it is in
> some disk/controller buffer.  I) Now if disk loses power due to some
> shutdown or any other disaster, data will
> be lost. II) Can data be lost even without it ? (e.g. if it is keeping
> data in some disk buffer and if fsync is being
> invoked continuously then will that data can also  be lost? If only part
> -I is true; then it can be acceptable
> because probability of shutdown is usually less in production environment.
> But if even II is true then there is a
> problem.
>
> -- But on the machine where disk doesn't lie performance of flume using
> File channel is very low (I have seen it
> maximum 100 KB/sec even with sufficient  DirectMemory allocation.) Does
> anybody have stats about throughput
> of file channel ? Is anybody getting better performance with file channel
> (without fsync lies). What is the recommended
> usage of it for an average scenario ? (Transferring files of few MBs to
> HDFS sink continuously on typical hardware
> (16 core processors, 16 GB RAM etc.)
>
>
> Regards,
> Jagadish
>
> On 10/10/2012 11:30 PM, Brock Noland wrote:
>
> Hi,
>
> On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani<ja...@pubmatic.com> <ja...@pubmatic.com> wrote:
>
>  Hi Brock
>
> I will surely look into 'fsync lies'.
>
> But as per my experiments I think "file channel" is causing the issue.
> Because on those 2 machines (one with higher throughput and other with
> lower)
> I did following experiment:
>
> cat Source -memory channel - file sink.
>
> Now with this setup I got same throughput on both the machines. (around 3
> MB/sec)
> Now as I have used "File sink" it should also do "fsync" at some point of
> time.
> 'File Sink' and 'File Channel' both do disk writes.
> So if there is differences in disk behaviour then even in the 'File Sink' it
> should be visible.
>
> Am I missing something here?
>
>  File sink does not call fsync.
>
>
>  Regards,
> Jagadish
>
>
>
> On 10/10/2012 09:35 PM, Brock Noland wrote:
>
>  OK your disk that is giving you 40KB/second is telling you the truth
> and the faster disk is lying to you. Look up "fsync lies" to see what
> I am referring to.
>
> A spinning disk can do 100 fsync operations per second (this is done
> at the end of every batch). That is how I estimated your event size,
> 40KB/second is doing 40KB / 100 =  409 bytes.
>
> Once again, if you want increased performance, you should increase the
> batch size.
>
> Brock
>
> On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani<ja...@pubmatic.com> <ja...@pubmatic.com> wrote:
>
>  Hi
>
> Yes. It is around 480 - 500 bytes.
>
>
> On 10/10/2012 09:24 PM, Brock Noland wrote:
>
>  How big are your events? Average about 400 bytes?
>
> Brock
>
> On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani<ja...@pubmatic.com> <ja...@pubmatic.com> wrote:
>
>  Hi
>
> Thanks for the inputs Brock. After doing several experiments
> eventually problem boiled down to disks.
>
>    -- But I had used the same configuration (so all software components
> are
> same in all 3 machines)
> on all 3 machines.
> -- In User guide it is written that if multiple file channel instances
> are
> active on the same agent then
> different disks are preferable. But in my case only one file channel is
> active per agent.
> -- Only one pattern I observed that on the machines where I got better
> performance have multiple disks.
> But I don't understand how that will help if I have only 1 active file
> channel.
> -- What is the impact of the type of disk/disk device driver on
> performance?
> I mean I don't understand
> with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>
> Could you please elaborate on File channel and disks correlation.
>
> Regards,
> Jagadish
>
>
> On 10/09/2012 08:01 PM, Brock Noland wrote:
>
> Hi,
>
> Using file channel, in terms of performance, the number and type of
> disks is going to be much more predictive of performance than CPU or
> RAM. Note that consumer level drives/controllers will give you much
> "better" performance because they lie to you about when your data is
> actually written to the drive. If you search for "fsync lies" you'll
> find more information on this.
>
> You probably want to increase the batch size to get better performance.
>
> Brock
>
> On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani<ja...@pubmatic.com> <ja...@pubmatic.com> wrote:
>
> Hi
>
> My flume setup is:
>
> Source Agent : cat source - File Channel - Avro Sink
> Dest Agent :     avro source - File Channel - HDFS Sink.
>
> There is only 1 source agent and 1 destination agent.
>
> I measure throughput as amount of data written to HDFS per second.
> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
> sec
> the
> throughput is : -- 2 MB/sec ).
>
> I have run source agent on various machines with different hardware
> configurations :
> (In all cases I run flume agent with JAVA OPTIONS as
> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
> -XX:MaxDirectMemorySize=2g")
>
> JDK is 32 bit.
>
> Experiment 1:
> =====
> RAM : 16 GB
> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
> 64 bit Processor with 64 bit Kernel.
> Throughput: 2 MB/sec
>
> Experiment 2:
> ======
> RAM : 4 GB
> Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
> 64 bit Processor with 32 bit Kernel.
> Throughput : 30 KB/sec
>
> Experiment 3:
> ======
> RAM : 8 GB
> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
> 64 bit Processor with 32 bit Kernel.
> Throughput : 80 KB/sec
>
>    -- So as can be seen there is huge difference in the throughput with
> same
> configuration but
> different hardware.
> -- In the first case where throughput is more RES is around 160 MB in
> other
> cases it is in
> the range of 40 MB - 50 MB.
>
> Can anybody please give insights that why there is this huge difference
> in
> the throughput?
> What is the correlation between RAM and filechannel/HDFS sink
> performance
> and also
> with 32-bit/64 bit kernel?
>
> Regards,
> Jagadish
>
>
>
>
>
>

Re: File Channel performance and fsync

Posted by Juhani Connolly <ju...@cyberagent.co.jp>.

I missed this initially due to filters putting the ML cced enter in a 
different folder...

Anyway, you didn't post your first tiers conf, can you post that? Is 
that also using a file channel? Regardless, what is important at the 
second tier is that the batch size that arrives at your collector node 
is *not* the batch size from your first tiers source, it is the batch 
size designated at your first tiers avro-sink(the avro-sink decides how 
many messages to pull from the channel and then dumps them to the source 
on the next tier)

So if you haven't configged that, or it is low, you will have poor 
performance on tier 2 file channel.

We get many mb/s using file channel, though I haven't checked the 
figures out lately.

Is your flume 1.2.0 the cloudera release? Or is it the raw one? I 
vaguely remember something important(to us at least) missing from it and 
we just use a packaged version I maintain. 1.3.0 should be released soon 
if you can wait for that, I don't see any major issues with that, or you 
could even just pull the current 1.3 head and compile that.

On 10/23/2012 03:40 PM, Jagadish Bihani wrote:
> Hi Brock
>
> I am using flume 1.2.0.
>
> About the batching : as per user guide "exec source" does have batch 
> option in 1.2.0 (param name:
> batchSize and default value:20) and I
> have tried it. Apparently it works fine. And file channel has 
> parameter "transactionCapacity" set
> to 1000 by default. Is that the batch size of file channel?
>
> Anyway even with increased batching I couldn't cross 110-150 KB/sec 
> with File Channel.
> Could you please help me understanding questions I asked in the 
> original mail of this thread about
> fsync lies. Because with disk which "apparently does fsync lie" I get 
> 3 MB/sec in 1 flow.
> I don't know whether that actually does "fsync lie" but there is 
> remarkable difference in fsync
> performance on 2 machines which do have almost similar hardware.
>
> Regards
> Jagadish
>
>
>
> On 10/22/2012 07:59 PM, Brock Noland wrote:
>> In this cae, it's best to think about FileChannel as if it were a 
>> database. Let's pretend we are going to insert 1 million rows. If we 
>> committed on each row, would performance be "good"?  No, everyone 
>> knows that when you are inserting rows in databases, you want to 
>> batch 100-1000 rows into a single commit, if you want "good" 
>> performance. (Quoting good because it's subjective based on 
>> the scenario, but in this case we mean lots of MB/second).
>>
>> Part of the reason behind this logic is that when a database does a 
>> commit, it does an fsync operation to ensure that all data is written 
>> to disk and that you will not lose data due to a subsequent power loss.
>>
>> FileChannel behaves *exactly* the same. If your "batch" is only a 
>> single event, file channel will:
>>
>> write single event
>> fsync
>> write single event
>> fsync
>>
>> As such, if you want "good" performance with FileChannel, you must 
>> increase your batch size, just like a database. If you have a 
>> batchSize of say 100, then FileChannel will:
>>
>> write single event 0
>> write single event 1
>> ...
>> write single event 99
>> fsync
>>
>> Which will result in much "better" performance. It's worth noting 
>> that ExecSource in Flume 1.2, does not have a batchSize and as such 
>> each event is written and then committed. ExecSource in flume 1.3, 
>> which we will release soon, does have a configurable batchSize. If 
>> you want to try that out you can build it from the flume-1.3.0 branch.
>>
>> Brock
>>
>> On Mon, Oct 22, 2012 at 8:59 AM, Brock Noland <brock@cloudera.com 
>> <ma...@cloudera.com>> wrote:
>>
>>     Which version? 1.2 or trunk?
>>
>>     On Monday, October 22, 2012 at 8:18 AM, Jagadish Bihani wrote:
>>
>>>     Hi
>>>
>>>     This is the simplistic configuration with which I am getting
>>>     lower performance.
>>>     Even with 2-tier architecture (cat source - avro sinks - avro
>>>     source- HDFS sink)
>>>     I get the similar performance with file channel.
>>>
>>>     Configuration:
>>>     =========
>>>     adServerAgent.sources = avro-collection-source
>>>     adServerAgent.channels = fileChannel
>>>     adServerAgent.sinks = hdfsSink fileSink
>>>
>>>     # For each one of the sources, the type is defined
>>>     adServerAgent.sources.avro-collection-source.type=exec
>>>     adServerAgent.sources.avro-collection-source.command= cat
>>>     /home/hadoop/file.tsf
>>>
>>>     # The channel can be defined as follows.
>>>     adServerAgent.sources.avro-collection-source.channels = fileChannel
>>>
>>>     #Define file sink
>>>     adServerAgent.sinks.fileSink.type = file_roll
>>>     adServerAgent.sinks.fileSink.sink.directory =
>>>     /home/hadoop/flume_sink*
>>>     *
>>>     adServerAgent.sinks.fileSink.channel = fileChannel
>>>     adServerAgent.channels.fileChannel.type=file
>>>     adServerAgent.channels.fileChannel.dataDirs=/home/hadoop/flume/channel/dataDir5
>>>     adServerAgent.channels.fileChannel.checkpointDir=/home/hadoop/flume/channel/checkpointDir5
>>>     adServerAgent.channels.fileChannel.maxFileSize=4000000000
>>>
>>>     And it is run with :
>>>     JAVA_OPTS = -Xms500m -Xmx700m -Dcom.sun.management.jmxremote
>>>     -XX:MaxDirectMemorySize=2g
>>>
>>>     Regards,
>>>     Jagadish
>>>
>>>     On 10/22/2012 05:42 PM, Brock Noland wrote:
>>>>     Hi,
>>>>
>>>>     I'll respond in more depth later, but it would help if you
>>>>     posted your configuration file and the version of flume you are
>>>>     using.
>>>>
>>>>     Brock
>>>>
>>>>     On Mon, Oct 22, 2012 at 6:48 AM, Jagadish Bihani
>>>>     <jagadish.bihani@pubmatic.com
>>>>     <ma...@pubmatic.com>> wrote:
>>>>>     Hi
>>>>>
>>>>>     I am writing this on top of another thread where there was
>>>>>     discussion on "fsync lies" and
>>>>>     only file channel used fsync and not file sink. :
>>>>>
>>>>>     -- I tested the fsync performance on 2 machines  (On 1 machine
>>>>>     I was getting very good throughput
>>>>>     using file channel and on another almost 100 times slower with
>>>>>     almost same hardware configuration.)
>>>>>     using following code
>>>>>
>>>>>
>>>>>     #define PAGESIZE 4096
>>>>>
>>>>>     int main(int argc, char *argv[])
>>>>>     {
>>>>>
>>>>>             char my_write_str[PAGESIZE];
>>>>>             char my_read_str[PAGESIZE];
>>>>>             char *read_filename= argv[1];
>>>>>             int readfd,writefd;
>>>>>
>>>>>             readfd = open(read_filename,O_RDONLY);
>>>>>             writefd = open("written_file",O_WRONLY|O_CREAT,777);
>>>>>             int len=lseek(readfd,0,2);
>>>>>     lseek(readfd,0,0);
>>>>>             int iterations = len/PAGESIZE;
>>>>>             int i;
>>>>>             struct timeval t0,t1;
>>>>>
>>>>>     for(i=0;i<iterations;i++)
>>>>>             {
>>>>>
>>>>>     read(readfd,my_read_str,PAGESIZE);
>>>>>     write(writefd,my_read_str,PAGESIZE);
>>>>>     *gettimeofday(&t0,0);**
>>>>>     **fsync(writefd);**
>>>>>     **gettimeofday(&t1,0);*
>>>>>                     long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
>>>>>     t1.tv_usec-t0.tv_usec;
>>>>>     printf("Elapsed time is= %ld \n",elapsed);
>>>>>              }
>>>>>             close(readfd);
>>>>>             close(writefd);
>>>>>     }
>>>>>
>>>>>
>>>>>     -- As expected it requires typically 50000 microseconds for
>>>>>     fsync to complete on one machine and 200 microseconds
>>>>>     on another machine it took 290 microseconds to complete on an
>>>>>     average. So is machine with higher
>>>>>     performance is doing a 'fsync lie'?
>>>>>     i
>>>>>     -- If I have understood it clearly; "fsync lie" means the data
>>>>>     is not actually written to disk and it is in
>>>>>     some disk/controller buffer.  I) Now if disk loses power due
>>>>>     to some shutdown or any other disaster, data will
>>>>>     be lost. II) Can data be lost even without it ? (e.g. if it is
>>>>>     keeping data in some disk buffer and if fsync is being
>>>>>     invoked continuously then will that data can also  be lost? If
>>>>>     only part -I is true; then it can be acceptable
>>>>>     because probability of shutdown is usually less in production
>>>>>     environment. But if even II is true then there is a
>>>>>     problem.
>>>>>
>>>>>     -- But on the machine where disk doesn't lie performance of
>>>>>     flume using File channel is very low (I have seen it
>>>>>     maximum 100 KB/sec even with sufficient DirectMemory
>>>>>     allocation.) Does anybody have stats about throughput
>>>>>     of file channel ? Is anybody getting better performance with
>>>>>     file channel (without fsync lies). What is the recommended
>>>>>     usage of it for an average scenario ? (Transferring files of
>>>>>     few MBs to HDFS sink continuously on typical hardware
>>>>>     (16 core processors, 16 GB RAM etc.)
>>>>>
>>>>>
>>>>>     Regards,
>>>>>     Jagadish
>>>>>
>>>>>     On 10/10/2012 11:30 PM, Brock Noland wrote:
>>>>>>     Hi,
>>>>>>
>>>>>>     On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>     Hi Brock
>>>>>>>
>>>>>>>     I will surely look into 'fsync lies'.
>>>>>>>
>>>>>>>     But as per my experiments I think "file channel" is causing the issue.
>>>>>>>     Because on those 2 machines (one with higher throughput and other with
>>>>>>>     lower)
>>>>>>>     I did following experiment:
>>>>>>>
>>>>>>>     cat Source -memory channel - file sink.
>>>>>>>
>>>>>>>     Now with this setup I got same throughput on both the machines. (around 3
>>>>>>>     MB/sec)
>>>>>>>     Now as I have used "File sink" it should also do "fsync" at some point of
>>>>>>>     time.
>>>>>>>     'File Sink' and 'File Channel' both do disk writes.
>>>>>>>     So if there is differences in disk behaviour then even in the 'File Sink' it
>>>>>>>     should be visible.
>>>>>>>
>>>>>>>     Am I missing something here?
>>>>>>     File sink does not call fsync.
>>>>>>
>>>>>>>     Regards,
>>>>>>>     Jagadish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     On 10/10/2012 09:35 PM, Brock Noland wrote:
>>>>>>>>     OK your disk that is giving you 40KB/second is telling you the truth
>>>>>>>>     and the faster disk is lying to you. Look up "fsync lies" to see what
>>>>>>>>     I am referring to.
>>>>>>>>
>>>>>>>>     A spinning disk can do 100 fsync operations per second (this is done
>>>>>>>>     at the end of every batch). That is how I estimated your event size,
>>>>>>>>     40KB/second is doing 40KB / 100 =  409 bytes.
>>>>>>>>
>>>>>>>>     Once again, if you want increased performance, you should increase the
>>>>>>>>     batch size.
>>>>>>>>
>>>>>>>>     Brock
>>>>>>>>
>>>>>>>>     On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
>>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>>>     Hi
>>>>>>>>>
>>>>>>>>>     Yes. It is around 480 - 500 bytes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     On 10/10/2012 09:24 PM, Brock Noland wrote:
>>>>>>>>>>     How big are your events? Average about 400 bytes?
>>>>>>>>>>
>>>>>>>>>>     Brock
>>>>>>>>>>
>>>>>>>>>>     On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
>>>>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>>>>>     Hi
>>>>>>>>>>>
>>>>>>>>>>>     Thanks for the inputs Brock. After doing several experiments
>>>>>>>>>>>     eventually problem boiled down to disks.
>>>>>>>>>>>
>>>>>>>>>>>         -- But I had used the same configuration (so all software components
>>>>>>>>>>>     are
>>>>>>>>>>>     same in all 3 machines)
>>>>>>>>>>>     on all 3 machines.
>>>>>>>>>>>     -- In User guide it is written that if multiple file channel instances
>>>>>>>>>>>     are
>>>>>>>>>>>     active on the same agent then
>>>>>>>>>>>     different disks are preferable. But in my case only one file channel is
>>>>>>>>>>>     active per agent.
>>>>>>>>>>>     -- Only one pattern I observed that on the machines where I got better
>>>>>>>>>>>     performance have multiple disks.
>>>>>>>>>>>     But I don't understand how that will help if I have only 1 active file
>>>>>>>>>>>     channel.
>>>>>>>>>>>     -- What is the impact of the type of disk/disk device driver on
>>>>>>>>>>>     performance?
>>>>>>>>>>>     I mean I don't understand
>>>>>>>>>>>     with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>>>>>>>>>>>
>>>>>>>>>>>     Could you please elaborate on File channel and disks correlation.
>>>>>>>>>>>
>>>>>>>>>>>     Regards,
>>>>>>>>>>>     Jagadish
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     On 10/09/2012 08:01 PM, Brock Noland wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Hi,
>>>>>>>>>>>
>>>>>>>>>>>     Using file channel, in terms of performance, the number and type of
>>>>>>>>>>>     disks is going to be much more predictive of performance than CPU or
>>>>>>>>>>>     RAM. Note that consumer level drives/controllers will give you much
>>>>>>>>>>>     "better" performance because they lie to you about when your data is
>>>>>>>>>>>     actually written to the drive. If you search for "fsync lies" you'll
>>>>>>>>>>>     find more information on this.
>>>>>>>>>>>
>>>>>>>>>>>     You probably want to increase the batch size to get better performance.
>>>>>>>>>>>
>>>>>>>>>>>     Brock
>>>>>>>>>>>
>>>>>>>>>>>     On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
>>>>>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Hi
>>>>>>>>>>>
>>>>>>>>>>>     My flume setup is:
>>>>>>>>>>>
>>>>>>>>>>>     Source Agent : cat source - File Channel - Avro Sink
>>>>>>>>>>>     Dest Agent :     avro source - File Channel - HDFS Sink.
>>>>>>>>>>>
>>>>>>>>>>>     There is only 1 source agent and 1 destination agent.
>>>>>>>>>>>
>>>>>>>>>>>     I measure throughput as amount of data written to HDFS per second.
>>>>>>>>>>>     ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
>>>>>>>>>>>     sec
>>>>>>>>>>>     the
>>>>>>>>>>>     throughput is : -- 2 MB/sec ).
>>>>>>>>>>>
>>>>>>>>>>>     I have run source agent on various machines with different hardware
>>>>>>>>>>>     configurations :
>>>>>>>>>>>     (In all cases I run flume agent with JAVA OPTIONS as
>>>>>>>>>>>     "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>>>>>>>>>>>     -XX:MaxDirectMemorySize=2g")
>>>>>>>>>>>
>>>>>>>>>>>     JDK is 32 bit.
>>>>>>>>>>>
>>>>>>>>>>>     Experiment 1:
>>>>>>>>>>>     =====
>>>>>>>>>>>     RAM : 16 GB
>>>>>>>>>>>     Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>>>>>>>>>>>     64 bit Processor with 64 bit Kernel.
>>>>>>>>>>>     Throughput: 2 MB/sec
>>>>>>>>>>>
>>>>>>>>>>>     Experiment 2:
>>>>>>>>>>>     ======
>>>>>>>>>>>     RAM : 4 GB
>>>>>>>>>>>     Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>>>>>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>>>>>     Throughput : 30 KB/sec
>>>>>>>>>>>
>>>>>>>>>>>     Experiment 3:
>>>>>>>>>>>     ======
>>>>>>>>>>>     RAM : 8 GB
>>>>>>>>>>>     Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>>>>>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>>>>>     Throughput : 80 KB/sec
>>>>>>>>>>>
>>>>>>>>>>>         -- So as can be seen there is huge difference in the throughput with
>>>>>>>>>>>     same
>>>>>>>>>>>     configuration but
>>>>>>>>>>>     different hardware.
>>>>>>>>>>>     -- In the first case where throughput is more RES is around 160 MB in
>>>>>>>>>>>     other
>>>>>>>>>>>     cases it is in
>>>>>>>>>>>     the range of 40 MB - 50 MB.
>>>>>>>>>>>
>>>>>>>>>>>     Can anybody please give insights that why there is this huge difference
>>>>>>>>>>>     in
>>>>>>>>>>>     the throughput?
>>>>>>>>>>>     What is the correlation between RAM and filechannel/HDFS sink
>>>>>>>>>>>     performance
>>>>>>>>>>>     and also
>>>>>>>>>>>     with 32-bit/64 bit kernel?
>>>>>>>>>>>
>>>>>>>>>>>     Regards,
>>>>>>>>>>>     Jagadish
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>     -- 
>>>>     Apache MRUnit - Unit testing MapReduce -
>>>>     http://incubator.apache.org/mrunit/
>>>
>>
>>
>>
>>
>> -- 
>> Apache MRUnit - Unit testing MapReduce - 
>> http://incubator.apache.org/mrunit/
>

Re: File Channel performance and fsync

Posted by Jagadish Bihani <ja...@pubmatic.com>.

Hi Brock

I am using flume 1.2.0.

About the batching : as per user guide "exec source" does have batch 
option in 1.2.0 (param name:
batchSize and default value:20) and I
have tried it. Apparently it works fine. And file channel has parameter 
"transactionCapacity" set
to 1000 by default. Is that the batch size of file channel?

Anyway even with increased batching I couldn't cross 110-150 KB/sec with 
File Channel.
Could you please help me understanding questions I asked in the original 
mail of this thread about
fsync lies. Because with disk which "apparently does fsync lie" I get 3 
MB/sec in 1 flow.
I don't know whether that actually does "fsync lie" but there is 
remarkable difference in fsync
performance on 2 machines which do have almost similar hardware.

Regards
Jagadish



On 10/22/2012 07:59 PM, Brock Noland wrote:
> In this cae, it's best to think about FileChannel as if it were a 
> database. Let's pretend we are going to insert 1 million rows. If we 
> committed on each row, would performance be "good"?  No, everyone 
> knows that when you are inserting rows in databases, you want to batch 
> 100-1000 rows into a single commit, if you want "good" performance. 
> (Quoting good because it's subjective based on the scenario, but in 
> this case we mean lots of MB/second).
>
> Part of the reason behind this logic is that when a database does a 
> commit, it does an fsync operation to ensure that all data is written 
> to disk and that you will not lose data due to a subsequent power loss.
>
> FileChannel behaves *exactly* the same. If your "batch" is only a 
> single event, file channel will:
>
> write single event
> fsync
> write single event
> fsync
>
> As such, if you want "good" performance with FileChannel, you must 
> increase your batch size, just like a database. If you have a 
> batchSize of say 100, then FileChannel will:
>
> write single event 0
> write single event 1
> ...
> write single event 99
> fsync
>
> Which will result in much "better" performance. It's worth noting that 
> ExecSource in Flume 1.2, does not have a batchSize and as such each 
> event is written and then committed. ExecSource in flume 1.3, which we 
> will release soon, does have a configurable batchSize. If you want to 
> try that out you can build it from the flume-1.3.0 branch.
>
> Brock
>
> On Mon, Oct 22, 2012 at 8:59 AM, Brock Noland <brock@cloudera.com 
> <ma...@cloudera.com>> wrote:
>
>     Which version? 1.2 or trunk?
>
>     On Monday, October 22, 2012 at 8:18 AM, Jagadish Bihani wrote:
>
>>     Hi
>>
>>     This is the simplistic configuration with which I am getting
>>     lower performance.
>>     Even with 2-tier architecture (cat source - avro sinks - avro
>>     source- HDFS sink)
>>     I get the similar performance with file channel.
>>
>>     Configuration:
>>     =========
>>     adServerAgent.sources = avro-collection-source
>>     adServerAgent.channels = fileChannel
>>     adServerAgent.sinks = hdfsSink fileSink
>>
>>     # For each one of the sources, the type is defined
>>     adServerAgent.sources.avro-collection-source.type=exec
>>     adServerAgent.sources.avro-collection-source.command= cat
>>     /home/hadoop/file.tsf
>>
>>     # The channel can be defined as follows.
>>     adServerAgent.sources.avro-collection-source.channels = fileChannel
>>
>>     #Define file sink
>>     adServerAgent.sinks.fileSink.type = file_roll
>>     adServerAgent.sinks.fileSink.sink.directory =
>>     /home/hadoop/flume_sink*
>>     *
>>     adServerAgent.sinks.fileSink.channel = fileChannel
>>     adServerAgent.channels.fileChannel.type=file
>>     adServerAgent.channels.fileChannel.dataDirs=/home/hadoop/flume/channel/dataDir5
>>     adServerAgent.channels.fileChannel.checkpointDir=/home/hadoop/flume/channel/checkpointDir5
>>     adServerAgent.channels.fileChannel.maxFileSize=4000000000
>>
>>     And it is run with :
>>     JAVA_OPTS = -Xms500m -Xmx700m -Dcom.sun.management.jmxremote
>>     -XX:MaxDirectMemorySize=2g
>>
>>     Regards,
>>     Jagadish
>>
>>     On 10/22/2012 05:42 PM, Brock Noland wrote:
>>>     Hi,
>>>
>>>     I'll respond in more depth later, but it would help if you
>>>     posted your configuration file and the version of flume you are
>>>     using.
>>>
>>>     Brock
>>>
>>>     On Mon, Oct 22, 2012 at 6:48 AM, Jagadish Bihani
>>>     <jagadish.bihani@pubmatic.com
>>>     <ma...@pubmatic.com>> wrote:
>>>>     Hi
>>>>
>>>>     I am writing this on top of another thread where there was
>>>>     discussion on "fsync lies" and
>>>>     only file channel used fsync and not file sink. :
>>>>
>>>>     -- I tested the fsync performance on 2 machines (On 1 machine I
>>>>     was getting very good throughput
>>>>     using file channel and on another almost 100 times slower with
>>>>     almost same hardware configuration.)
>>>>     using following code
>>>>
>>>>
>>>>     #define PAGESIZE 4096
>>>>
>>>>     int main(int argc, char *argv[])
>>>>     {
>>>>
>>>>             char my_write_str[PAGESIZE];
>>>>             char my_read_str[PAGESIZE];
>>>>             char *read_filename= argv[1];
>>>>             int readfd,writefd;
>>>>
>>>>             readfd = open(read_filename,O_RDONLY);
>>>>             writefd = open("written_file",O_WRONLY|O_CREAT,777);
>>>>             int len=lseek(readfd,0,2);
>>>>             lseek(readfd,0,0);
>>>>             int iterations = len/PAGESIZE;
>>>>             int i;
>>>>             struct timeval t0,t1;
>>>>
>>>>     for(i=0;i<iterations;i++)
>>>>             {
>>>>
>>>>     read(readfd,my_read_str,PAGESIZE);
>>>>     write(writefd,my_read_str,PAGESIZE);
>>>>     *gettimeofday(&t0,0);**
>>>>     **fsync(writefd);**
>>>>     **gettimeofday(&t1,0);*
>>>>                     long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
>>>>     t1.tv_usec-t0.tv_usec;
>>>>     printf("Elapsed time is= %ld \n",elapsed);
>>>>              }
>>>>             close(readfd);
>>>>             close(writefd);
>>>>     }
>>>>
>>>>
>>>>     -- As expected it requires typically 50000 microseconds for
>>>>     fsync to complete on one machine and 200 microseconds
>>>>     on another machine it took 290 microseconds to complete on an
>>>>     average. So is machine with higher
>>>>     performance is doing a 'fsync lie'?
>>>>     i
>>>>     -- If I have understood it clearly; "fsync lie" means the data
>>>>     is not actually written to disk and it is in
>>>>     some disk/controller buffer.  I) Now if disk loses power due to
>>>>     some shutdown or any other disaster, data will
>>>>     be lost. II) Can data be lost even without it ? (e.g. if it is
>>>>     keeping data in some disk buffer and if fsync is being
>>>>     invoked continuously then will that data can also be lost? If
>>>>     only part -I is true; then it can be acceptable
>>>>     because probability of shutdown is usually less in production
>>>>     environment. But if even II is true then there is a
>>>>     problem.
>>>>
>>>>     -- But on the machine where disk doesn't lie performance of
>>>>     flume using File channel is very low (I have seen it
>>>>     maximum 100 KB/sec even with sufficient DirectMemory
>>>>     allocation.) Does anybody have stats about throughput
>>>>     of file channel ? Is anybody getting better performance with
>>>>     file channel (without fsync lies). What is the recommended
>>>>     usage of it for an average scenario ? (Transferring files of
>>>>     few MBs to HDFS sink continuously on typical hardware
>>>>     (16 core processors, 16 GB RAM etc.)
>>>>
>>>>
>>>>     Regards,
>>>>     Jagadish
>>>>
>>>>     On 10/10/2012 11:30 PM, Brock Noland wrote:
>>>>>     Hi,
>>>>>
>>>>>     On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>     Hi Brock
>>>>>>
>>>>>>     I will surely look into 'fsync lies'.
>>>>>>
>>>>>>     But as per my experiments I think "file channel" is causing the issue.
>>>>>>     Because on those 2 machines (one with higher throughput and other with
>>>>>>     lower)
>>>>>>     I did following experiment:
>>>>>>
>>>>>>     cat Source -memory channel - file sink.
>>>>>>
>>>>>>     Now with this setup I got same throughput on both the machines. (around 3
>>>>>>     MB/sec)
>>>>>>     Now as I have used "File sink" it should also do "fsync" at some point of
>>>>>>     time.
>>>>>>     'File Sink' and 'File Channel' both do disk writes.
>>>>>>     So if there is differences in disk behaviour then even in the 'File Sink' it
>>>>>>     should be visible.
>>>>>>
>>>>>>     Am I missing something here?
>>>>>     File sink does not call fsync.
>>>>>
>>>>>>     Regards,
>>>>>>     Jagadish
>>>>>>
>>>>>>
>>>>>>
>>>>>>     On 10/10/2012 09:35 PM, Brock Noland wrote:
>>>>>>>     OK your disk that is giving you 40KB/second is telling you the truth
>>>>>>>     and the faster disk is lying to you. Look up "fsync lies" to see what
>>>>>>>     I am referring to.
>>>>>>>
>>>>>>>     A spinning disk can do 100 fsync operations per second (this is done
>>>>>>>     at the end of every batch). That is how I estimated your event size,
>>>>>>>     40KB/second is doing 40KB / 100 =  409 bytes.
>>>>>>>
>>>>>>>     Once again, if you want increased performance, you should increase the
>>>>>>>     batch size.
>>>>>>>
>>>>>>>     Brock
>>>>>>>
>>>>>>>     On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>>     Hi
>>>>>>>>
>>>>>>>>     Yes. It is around 480 - 500 bytes.
>>>>>>>>
>>>>>>>>
>>>>>>>>     On 10/10/2012 09:24 PM, Brock Noland wrote:
>>>>>>>>>     How big are your events? Average about 400 bytes?
>>>>>>>>>
>>>>>>>>>     Brock
>>>>>>>>>
>>>>>>>>>     On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
>>>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>>>>     Hi
>>>>>>>>>>
>>>>>>>>>>     Thanks for the inputs Brock. After doing several experiments
>>>>>>>>>>     eventually problem boiled down to disks.
>>>>>>>>>>
>>>>>>>>>>         -- But I had used the same configuration (so all software components
>>>>>>>>>>     are
>>>>>>>>>>     same in all 3 machines)
>>>>>>>>>>     on all 3 machines.
>>>>>>>>>>     -- In User guide it is written that if multiple file channel instances
>>>>>>>>>>     are
>>>>>>>>>>     active on the same agent then
>>>>>>>>>>     different disks are preferable. But in my case only one file channel is
>>>>>>>>>>     active per agent.
>>>>>>>>>>     -- Only one pattern I observed that on the machines where I got better
>>>>>>>>>>     performance have multiple disks.
>>>>>>>>>>     But I don't understand how that will help if I have only 1 active file
>>>>>>>>>>     channel.
>>>>>>>>>>     -- What is the impact of the type of disk/disk device driver on
>>>>>>>>>>     performance?
>>>>>>>>>>     I mean I don't understand
>>>>>>>>>>     with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>>>>>>>>>>
>>>>>>>>>>     Could you please elaborate on File channel and disks correlation.
>>>>>>>>>>
>>>>>>>>>>     Regards,
>>>>>>>>>>     Jagadish
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     On 10/09/2012 08:01 PM, Brock Noland wrote:
>>>>>>>>>>
>>>>>>>>>>     Hi,
>>>>>>>>>>
>>>>>>>>>>     Using file channel, in terms of performance, the number and type of
>>>>>>>>>>     disks is going to be much more predictive of performance than CPU or
>>>>>>>>>>     RAM. Note that consumer level drives/controllers will give you much
>>>>>>>>>>     "better" performance because they lie to you about when your data is
>>>>>>>>>>     actually written to the drive. If you search for "fsync lies" you'll
>>>>>>>>>>     find more information on this.
>>>>>>>>>>
>>>>>>>>>>     You probably want to increase the batch size to get better performance.
>>>>>>>>>>
>>>>>>>>>>     Brock
>>>>>>>>>>
>>>>>>>>>>     On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
>>>>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>>>>
>>>>>>>>>>     Hi
>>>>>>>>>>
>>>>>>>>>>     My flume setup is:
>>>>>>>>>>
>>>>>>>>>>     Source Agent : cat source - File Channel - Avro Sink
>>>>>>>>>>     Dest Agent :     avro source - File Channel - HDFS Sink.
>>>>>>>>>>
>>>>>>>>>>     There is only 1 source agent and 1 destination agent.
>>>>>>>>>>
>>>>>>>>>>     I measure throughput as amount of data written to HDFS per second.
>>>>>>>>>>     ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
>>>>>>>>>>     sec
>>>>>>>>>>     the
>>>>>>>>>>     throughput is : -- 2 MB/sec ).
>>>>>>>>>>
>>>>>>>>>>     I have run source agent on various machines with different hardware
>>>>>>>>>>     configurations :
>>>>>>>>>>     (In all cases I run flume agent with JAVA OPTIONS as
>>>>>>>>>>     "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>>>>>>>>>>     -XX:MaxDirectMemorySize=2g")
>>>>>>>>>>
>>>>>>>>>>     JDK is 32 bit.
>>>>>>>>>>
>>>>>>>>>>     Experiment 1:
>>>>>>>>>>     =====
>>>>>>>>>>     RAM : 16 GB
>>>>>>>>>>     Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>>>>>>>>>>     64 bit Processor with 64 bit Kernel.
>>>>>>>>>>     Throughput: 2 MB/sec
>>>>>>>>>>
>>>>>>>>>>     Experiment 2:
>>>>>>>>>>     ======
>>>>>>>>>>     RAM : 4 GB
>>>>>>>>>>     Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>>>>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>>>>     Throughput : 30 KB/sec
>>>>>>>>>>
>>>>>>>>>>     Experiment 3:
>>>>>>>>>>     ======
>>>>>>>>>>     RAM : 8 GB
>>>>>>>>>>     Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>>>>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>>>>     Throughput : 80 KB/sec
>>>>>>>>>>
>>>>>>>>>>         -- So as can be seen there is huge difference in the throughput with
>>>>>>>>>>     same
>>>>>>>>>>     configuration but
>>>>>>>>>>     different hardware.
>>>>>>>>>>     -- In the first case where throughput is more RES is around 160 MB in
>>>>>>>>>>     other
>>>>>>>>>>     cases it is in
>>>>>>>>>>     the range of 40 MB - 50 MB.
>>>>>>>>>>
>>>>>>>>>>     Can anybody please give insights that why there is this huge difference
>>>>>>>>>>     in
>>>>>>>>>>     the throughput?
>>>>>>>>>>     What is the correlation between RAM and filechannel/HDFS sink
>>>>>>>>>>     performance
>>>>>>>>>>     and also
>>>>>>>>>>     with 32-bit/64 bit kernel?
>>>>>>>>>>
>>>>>>>>>>     Regards,
>>>>>>>>>>     Jagadish
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>
>>>
>>>
>>>     -- 
>>>     Apache MRUnit - Unit testing MapReduce -
>>>     http://incubator.apache.org/mrunit/
>>
>
>
>
>
> -- 
> Apache MRUnit - Unit testing MapReduce - 
> http://incubator.apache.org/mrunit/

Re: File Channel performance and fsync

Posted by Brock Noland <br...@cloudera.com>.

In this cae, it's best to think about FileChannel as if it were a database.
Let's pretend we are going to insert 1 million rows. If we committed on
each row, would performance be "good"?  No, everyone knows that when you
are inserting rows in databases, you want to batch 100-1000 rows into a
single commit, if you want "good" performance. (Quoting good because it's
subjective based on the scenario, but in this case we mean lots of
MB/second).

Part of the reason behind this logic is that when a database does a commit,
it does an fsync operation to ensure that all data is written to disk and
that you will not lose data due to a subsequent power loss.

FileChannel behaves *exactly* the same. If your "batch" is only a single
event, file channel will:

write single event
fsync
write single event
fsync

As such, if you want "good" performance with FileChannel, you must increase
your batch size, just like a database. If you have a batchSize of say 100,
then FileChannel will:

write single event 0
write single event 1
...
write single event 99
fsync

Which will result in much "better" performance. It's worth noting that
ExecSource in Flume 1.2, does not have a batchSize and as such each event
is written and then committed. ExecSource in flume 1.3, which we will
release soon, does have a configurable batchSize. If you want to try that
out you can build it from the flume-1.3.0 branch.

Brock

On Mon, Oct 22, 2012 at 8:59 AM, Brock Noland <br...@cloudera.com> wrote:

>  Which version? 1.2 or trunk?
>
> On Monday, October 22, 2012 at 8:18 AM, Jagadish Bihani wrote:
>
>  Hi
>
> This is the simplistic configuration with which I am getting lower
> performance.
> Even with 2-tier architecture (cat source - avro sinks - avro source- HDFS
> sink)
> I get the similar performance with file channel.
>
> Configuration:
> =========
> adServerAgent.sources = avro-collection-source
> adServerAgent.channels = fileChannel
> adServerAgent.sinks = hdfsSink fileSink
>
> # For each one of the sources, the type is defined
> adServerAgent.sources.avro-collection-source.type=exec
> adServerAgent.sources.avro-collection-source.command= cat
> /home/hadoop/file.tsf
>
> # The channel can be defined as follows.
> adServerAgent.sources.avro-collection-source.channels = fileChannel
>
> #Define file sink
> adServerAgent.sinks.fileSink.type = file_roll
> adServerAgent.sinks.fileSink.sink.directory = /home/hadoop/flume_sink*
> *
> adServerAgent.sinks.fileSink.channel = fileChannel
> adServerAgent.channels.fileChannel.type=file
>
> adServerAgent.channels.fileChannel.dataDirs=/home/hadoop/flume/channel/dataDir5
>
> adServerAgent.channels.fileChannel.checkpointDir=/home/hadoop/flume/channel/checkpointDir5
> adServerAgent.channels.fileChannel.maxFileSize=4000000000
>
> And it is run with :
> JAVA_OPTS = -Xms500m -Xmx700m -Dcom.sun.management.jmxremote
> -XX:MaxDirectMemorySize=2g
>
> Regards,
> Jagadish
>
> On 10/22/2012 05:42 PM, Brock Noland wrote:
>
> Hi,
>
>  I'll respond in more depth later, but it would help if you posted your
> configuration file and the version of flume you are using.
>
>  Brock
>
>  On Mon, Oct 22, 2012 at 6:48 AM, Jagadish Bihani <
> jagadish.bihani@pubmatic.com> wrote:
>
>  Hi
>
> I am writing this on top of another thread where there was discussion on
> "fsync lies" and
> only file channel used fsync and not file sink. :
>
> -- I tested the fsync performance on 2 machines  (On 1 machine I was
> getting very good throughput
> using file channel and on another almost 100 times slower with almost same
> hardware configuration.)
> using following code
>
>
> #define PAGESIZE 4096
>
> int main(int argc, char *argv[])
> {
>
>         char my_write_str[PAGESIZE];
>         char my_read_str[PAGESIZE];
>         char *read_filename= argv[1];
>         int readfd,writefd;
>
>         readfd = open(read_filename,O_RDONLY);
>         writefd = open("written_file",O_WRONLY|O_CREAT,777);
>         int len=lseek(readfd,0,2);
>         lseek(readfd,0,0);
>         int iterations = len/PAGESIZE;
>         int i;
>         struct timeval t0,t1;
>
>        for(i=0;i<iterations;i++)
>         {
>
>                 read(readfd,my_read_str,PAGESIZE);
>                 write(writefd,my_read_str,PAGESIZE);
>                 *gettimeofday(&t0,0);**
> **                fsync(writefd);**
> **              gettimeofday(&t1,0);*
>                 long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
> t1.tv_usec-t0.tv_usec;
>                 printf("Elapsed time is= %ld \n",elapsed);
>          }
>         close(readfd);
>         close(writefd);
> }
>
>
> -- As expected it requires typically 50000 microseconds for fsync to
> complete on one machine and 200 microseconds
> on another machine it took 290 microseconds to complete on an average. So
> is machine with higher
> performance is doing a 'fsync lie'?
> i
> -- If I have understood it clearly; "fsync lie" means the data is not
> actually written to disk and it is in
> some disk/controller buffer.  I) Now if disk loses power due to some
> shutdown or any other disaster, data will
> be lost. II) Can data be lost even without it ? (e.g. if it is keeping
> data in some disk buffer and if fsync is being
> invoked continuously then will that data can also  be lost? If only part
> -I is true; then it can be acceptable
> because probability of shutdown is usually less in production environment.
> But if even II is true then there is a
> problem.
>
> -- But on the machine where disk doesn't lie performance of flume using
> File channel is very low (I have seen it
> maximum 100 KB/sec even with sufficient  DirectMemory allocation.) Does
> anybody have stats about throughput
> of file channel ? Is anybody getting better performance with file channel
> (without fsync lies). What is the recommended
> usage of it for an average scenario ? (Transferring files of few MBs to
> HDFS sink continuously on typical hardware
> (16 core processors, 16 GB RAM etc.)
>
>
> Regards,
> Jagadish
>
> On 10/10/2012 11:30 PM, Brock Noland wrote:
>
> Hi,
>
> On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani<ja...@pubmatic.com> <ja...@pubmatic.com> wrote:
>
> Hi Brock
>
> I will surely look into 'fsync lies'.
>
> But as per my experiments I think "file channel" is causing the issue.
> Because on those 2 machines (one with higher throughput and other with
> lower)
> I did following experiment:
>
> cat Source -memory channel - file sink.
>
> Now with this setup I got same throughput on both the machines. (around 3
> MB/sec)
> Now as I have used "File sink" it should also do "fsync" at some point of
> time.
> 'File Sink' and 'File Channel' both do disk writes.
> So if there is differences in disk behaviour then even in the 'File Sink' it
> should be visible.
>
> Am I missing something here?
>
> File sink does not call fsync.
>
>
> Regards,
> Jagadish
>
>
>
> On 10/10/2012 09:35 PM, Brock Noland wrote:
>
> OK your disk that is giving you 40KB/second is telling you the truth
> and the faster disk is lying to you. Look up "fsync lies" to see what
> I am referring to.
>
> A spinning disk can do 100 fsync operations per second (this is done
> at the end of every batch). That is how I estimated your event size,
> 40KB/second is doing 40KB / 100 =  409 bytes.
>
> Once again, if you want increased performance, you should increase the
> batch size.
>
> Brock
>
> On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani<ja...@pubmatic.com> <ja...@pubmatic.com> wrote:
>
> Hi
>
> Yes. It is around 480 - 500 bytes.
>
>
> On 10/10/2012 09:24 PM, Brock Noland wrote:
>
> How big are your events? Average about 400 bytes?
>
> Brock
>
> On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani<ja...@pubmatic.com> <ja...@pubmatic.com> wrote:
>
> Hi
>
> Thanks for the inputs Brock. After doing several experiments
> eventually problem boiled down to disks.
>
>    -- But I had used the same configuration (so all software components
> are
> same in all 3 machines)
> on all 3 machines.
> -- In User guide it is written that if multiple file channel instances
> are
> active on the same agent then
> different disks are preferable. But in my case only one file channel is
> active per agent.
> -- Only one pattern I observed that on the machines where I got better
> performance have multiple disks.
> But I don't understand how that will help if I have only 1 active file
> channel.
> -- What is the impact of the type of disk/disk device driver on
> performance?
> I mean I don't understand
> with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>
> Could you please elaborate on File channel and disks correlation.
>
> Regards,
> Jagadish
>
>
> On 10/09/2012 08:01 PM, Brock Noland wrote:
>
> Hi,
>
> Using file channel, in terms of performance, the number and type of
> disks is going to be much more predictive of performance than CPU or
> RAM. Note that consumer level drives/controllers will give you much
> "better" performance because they lie to you about when your data is
> actually written to the drive. If you search for "fsync lies" you'll
> find more information on this.
>
> You probably want to increase the batch size to get better performance.
>
> Brock
>
> On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani<ja...@pubmatic.com> <ja...@pubmatic.com> wrote:
>
> Hi
>
> My flume setup is:
>
> Source Agent : cat source - File Channel - Avro Sink
> Dest Agent :     avro source - File Channel - HDFS Sink.
>
> There is only 1 source agent and 1 destination agent.
>
> I measure throughput as amount of data written to HDFS per second.
> ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
> sec
> the
> throughput is : -- 2 MB/sec ).
>
> I have run source agent on various machines with different hardware
> configurations :
> (In all cases I run flume agent with JAVA OPTIONS as
> "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
> -XX:MaxDirectMemorySize=2g")
>
> JDK is 32 bit.
>
> Experiment 1:
> =====
> RAM : 16 GB
> Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
> 64 bit Processor with 64 bit Kernel.
> Throughput: 2 MB/sec
>
> Experiment 2:
> ======
> RAM : 4 GB
> Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
> 64 bit Processor with 32 bit Kernel.
> Throughput : 30 KB/sec
>
> Experiment 3:
> ======
> RAM : 8 GB
> Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
> 64 bit Processor with 32 bit Kernel.
> Throughput : 80 KB/sec
>
>    -- So as can be seen there is huge difference in the throughput with
> same
> configuration but
> different hardware.
> -- In the first case where throughput is more RES is around 160 MB in
> other
> cases it is in
> the range of 40 MB - 50 MB.
>
> Can anybody please give insights that why there is this huge difference
> in
> the throughput?
> What is the correlation between RAM and filechannel/HDFS sink
> performance
> and also
> with 32-bit/64 bit kernel?
>
> Regards,
> Jagadish
>
>
>
>
>
>
>
>
>  --
> Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>
>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: File Channel performance and fsync

Posted by Brock Noland <br...@cloudera.com>.

Which version? 1.2 or trunk? 

-- 
Brock Noland
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, October 22, 2012 at 8:18 AM, Jagadish Bihani wrote:

> Hi 
> 
> This is the simplistic configuration with which I am getting lower performance.
> Even with 2-tier architecture (cat source - avro sinks - avro source- HDFS sink) 
> I get the similar performance with file channel. 
> 
> Configuration:
> =========
> adServerAgent.sources = avro-collection-source
> adServerAgent.channels = fileChannel
> adServerAgent.sinks = hdfsSink fileSink
> 
> # For each one of the sources, the type is defined
> adServerAgent.sources.avro-collection-source.type=exec
> adServerAgent.sources.avro-collection-source.command= cat /home/hadoop/file.tsf
> 
> # The channel can be defined as follows.
> adServerAgent.sources.avro-collection-source.channels = fileChannel
> 
> #Define file sink
> adServerAgent.sinks.fileSink.type = file_roll
> adServerAgent.sinks.fileSink.sink.directory = /home/hadoop/flume_sink
> 
> adServerAgent.sinks.fileSink.channel = fileChannel
> adServerAgent.channels.fileChannel.type=file
> adServerAgent.channels.fileChannel.dataDirs=/home/hadoop/flume/channel/dataDir5
> adServerAgent.channels.fileChannel.checkpointDir=/home/hadoop/flume/channel/checkpointDir5
> adServerAgent.channels.fileChannel.maxFileSize=4000000000
> 
> And it is run with :
> JAVA_OPTS = -Xms500m -Xmx700m -Dcom.sun.management.jmxremote -XX:MaxDirectMemorySize=2g
> 
> Regards,
> Jagadish
> 
> On 10/22/2012 05:42 PM, Brock Noland wrote:
> > Hi, 
> > 
> > I'll respond in more depth later, but it would help if you posted your configuration file and the version of flume you are using. 
> > 
> > Brock
> > 
> > On Mon, Oct 22, 2012 at 6:48 AM, Jagadish Bihani <jagadish.bihani@pubmatic.com (mailto:jagadish.bihani@pubmatic.com)> wrote:
> > > Hi
> > > 
> > > I am writing this on top of another thread where there was discussion on "fsync lies" and
> > > only file channel used fsync and not file sink. :
> > > 
> > > -- I tested the fsync performance on 2 machines  (On 1 machine I was getting very good throughput
> > > using file channel and on another almost 100 times slower with almost same hardware configuration.)
> > > using following code
> > > 
> > > 
> > > #define PAGESIZE 4096
> > > 
> > > int main(int argc, char *argv[])
> > > {
> > > 
> > >         char my_write_str[PAGESIZE];
> > >         char my_read_str[PAGESIZE];
> > >         char *read_filename= argv[1];
> > >         int readfd,writefd;
> > > 
> > >         readfd = open(read_filename,O_RDONLY);
> > >         writefd = open("written_file",O_WRONLY|O_CREAT,777);
> > >         int len=lseek(readfd,0,2);
> > >         lseek(readfd,0,0);
> > >         int iterations = len/PAGESIZE;
> > >         int i;
> > >         struct timeval t0,t1;
> > > 
> > >        for(i=0;i<iterations;i++)
> > >         {
> > > 
> > >                 read(readfd,my_read_str,PAGESIZE);
> > >                 write(writefd,my_read_str,PAGESIZE);
> > >                 gettimeofday(&t0,0);
> > >                 fsync(writefd);
> > >               gettimeofday(&t1,0);
> > >                 long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 + t1.tv_usec-t0.tv_usec;
> > >                 printf("Elapsed time is= %ld \n",elapsed);
> > >          }
> > >         close(readfd);
> > >         close(writefd);
> > > }
> > > 
> > > 
> > > -- As expected it requires typically 50000 microseconds for fsync to complete on one machine and 200 microseconds 
> > > on another machine it took 290 microseconds to complete on an average. So is machine with higher
> > > performance is doing a 'fsync lie'? 
> > > i
> > > -- If I have understood it clearly; "fsync lie" means the data is not actually written to disk and it is in 
> > > some disk/controller buffer.  I) Now if disk loses power due to some shutdown or any other disaster, data will
> > > be lost. II) Can data be lost even without it ? (e.g. if it is keeping data in some disk buffer and if fsync is being
> > > invoked continuously then will that data can also  be lost? If only part -I is true; then it can be acceptable
> > > because probability of shutdown is usually less in production environment. But if even II is true then there is a
> > > problem.
> > > 
> > > -- But on the machine where disk doesn't lie performance of flume using File channel is very low (I have seen it 
> > > maximum 100 KB/sec even with sufficient  DirectMemory allocation.) Does anybody have stats about throughput
> > > of file channel ? Is anybody getting better performance with file channel (without fsync lies). What is the recommended
> > > usage of it for an average scenario ? (Transferring files of few MBs to HDFS sink continuously on typical hardware 
> > > (16 core processors, 16 GB RAM etc.)
> > > 
> > > 
> > > Regards,
> > > Jagadish
> > > 
> > > On 10/10/2012 11:30 PM, Brock Noland wrote:
> > > > Hi, On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani <ja...@pubmatic.com> (mailto:jagadish.bihani@pubmatic.com) wrote: 
> > > > > Hi Brock I will surely look into 'fsync lies'. But as per my experiments I think "file channel" is causing the issue. Because on those 2 machines (one with higher throughput and other with lower) I did following experiment: cat Source -memory channel - file sink. Now with this setup I got same throughput on both the machines. (around 3 MB/sec) Now as I have used "File sink" it should also do "fsync" at some point of time. 'File Sink' and 'File Channel' both do disk writes. So if there is differences in disk behaviour then even in the 'File Sink' it should be visible. Am I missing something here? 
> > > > > 
> > > > 
> > > > File sink does not call fsync. 
> > > > > Regards, Jagadish On 10/10/2012 09:35 PM, Brock Noland wrote: 
> > > > > > OK your disk that is giving you 40KB/second is telling you the truth and the faster disk is lying to you. Look up "fsync lies" to see what I am referring to. A spinning disk can do 100 fsync operations per second (this is done at the end of every batch). That is how I estimated your event size, 40KB/second is doing 40KB / 100 = 409 bytes. Once again, if you want increased performance, you should increase the batch size. Brock On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani <ja...@pubmatic.com> (mailto:jagadish.bihani@pubmatic.com) wrote: 
> > > > > > > Hi Yes. It is around 480 - 500 bytes. On 10/10/2012 09:24 PM, Brock Noland wrote: 
> > > > > > > > How big are your events? Average about 400 bytes? Brock On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani <ja...@pubmatic.com> (mailto:jagadish.bihani@pubmatic.com) wrote: 
> > > > > > > > > Hi Thanks for the inputs Brock. After doing several experiments eventually problem boiled down to disks. -- But I had used the same configuration (so all software components are same in all 3 machines) on all 3 machines. -- In User guide it is written that if multiple file channel instances are active on the same agent then different disks are preferable. But in my case only one file channel is active per agent. -- Only one pattern I observed that on the machines where I got better performance have multiple disks. But I don't understand how that will help if I have only 1 active file channel. -- What is the impact of the type of disk/disk device driver on performance? I mean I don't understand with 1 disk I am getting 40 KB/sec and with other 2 MB/sec. Could you please elaborate on File channel and disks correlation. Regards, Jagadish On 10/09/2012 08:01 PM, Brock Noland wrote: Hi, Using file channel, in terms of performance, the number and type of disks is going to
 be much more predictive of performance than CPU or RAM. Note that consumer level drives/controllers will give you much "better" performance because they lie to you about when your data is actually written to the drive. If you search for "fsync lies" you'll find more information on this. You probably want to increase the batch size to get better performance. Brock On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani <ja...@pubmatic.com> (mailto:jagadish.bihani@pubmatic.com) wrote: Hi My flume setup is: Source Agent : cat source - File Channel - Avro Sink Dest Agent : avro source - File Channel - HDFS Sink. There is only 1 source agent and 1 destination agent. I measure throughput as amount of data written to HDFS per second. ( I have rolling interval 30 sec; so If 60 MB file is generated in 30 sec the throughput is : -- 2 MB/sec ). I have run source agent on various machines with different hardware configurations : (In all cases I run flume agent with JAVA OPTIONS as "-DJAVA_OPT
S="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote -XX:MaxDirectMemorySize=2g") JDK is 32 bit. Experiment 1: ===== RAM : 16 GB Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores). 64 bit Processor with 64 bit Kernel. Throughput: 2 MB/sec Experiment 2: ====== RAM : 4 GB Processor: Intel Xeon E5504 @ 2.00GHz (4 cores). 32 bit Processor 64 bit Processor with 32 bit Kernel. Throughput : 30 KB/sec Experiment 3: ====== RAM : 8 GB Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor 64 bit Processor with 32 bit Kernel. Throughput : 80 KB/sec -- So as can be seen there is huge difference in the throughput with same configuration but different hardware. -- In the first case where throughput is more RES is around 160 MB in other cases it is in the range of 40 MB - 50 MB. Can anybody please give insights that why there is this huge difference in the throughput? What is the correlation between RAM and filechannel/HDFS sink performance and also with 32-bit/64 bit kernel? Regards, Jagadi
sh 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> > -- 
> > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>

Re: File Channel performance and fsync

Posted by Jagadish Bihani <ja...@pubmatic.com>.

Hi

This is the simplistic configuration with which I am getting lower 
performance.
Even with 2-tier architecture (cat source - avro sinks - avro source- 
HDFS sink)
I get the similar performance with file channel.

Configuration:
=========
adServerAgent.sources = avro-collection-source
adServerAgent.channels = fileChannel
adServerAgent.sinks = hdfsSink fileSink

# For each one of the sources, the type is defined
adServerAgent.sources.avro-collection-source.type=exec
adServerAgent.sources.avro-collection-source.command= cat 
/home/hadoop/file.tsf

# The channel can be defined as follows.
adServerAgent.sources.avro-collection-source.channels = fileChannel

#Define file sink
adServerAgent.sinks.fileSink.type = file_roll
adServerAgent.sinks.fileSink.sink.directory = /home/hadoop/flume_sink*
*
adServerAgent.sinks.fileSink.channel = fileChannel
adServerAgent.channels.fileChannel.type=file
adServerAgent.channels.fileChannel.dataDirs=/home/hadoop/flume/channel/dataDir5
adServerAgent.channels.fileChannel.checkpointDir=/home/hadoop/flume/channel/checkpointDir5
adServerAgent.channels.fileChannel.maxFileSize=4000000000

And it is run with :
JAVA_OPTS = -Xms500m -Xmx700m -Dcom.sun.management.jmxremote 
-XX:MaxDirectMemorySize=2g

Regards,
Jagadish

On 10/22/2012 05:42 PM, Brock Noland wrote:
> Hi,
>
> I'll respond in more depth later, but it would help if you posted your 
> configuration file and the version of flume you are using.
>
> Brock
>
> On Mon, Oct 22, 2012 at 6:48 AM, Jagadish Bihani 
> <jagadish.bihani@pubmatic.com <ma...@pubmatic.com>> 
> wrote:
>
>     Hi
>
>     I am writing this on top of another thread where there was
>     discussion on "fsync lies" and
>     only file channel used fsync and not file sink. :
>
>     -- I tested the fsync performance on 2 machines  (On 1 machine I
>     was getting very good throughput
>     using file channel and on another almost 100 times slower with
>     almost same hardware configuration.)
>     using following code
>
>
>     #define PAGESIZE 4096
>
>     int main(int argc, char *argv[])
>     {
>
>             char my_write_str[PAGESIZE];
>             char my_read_str[PAGESIZE];
>             char *read_filename= argv[1];
>             int readfd,writefd;
>
>             readfd = open(read_filename,O_RDONLY);
>             writefd = open("written_file",O_WRONLY|O_CREAT,777);
>             int len=lseek(readfd,0,2);
>             lseek(readfd,0,0);
>             int iterations = len/PAGESIZE;
>             int i;
>             struct timeval t0,t1;
>
>            for(i=0;i<iterations;i++)
>             {
>
>                     read(readfd,my_read_str,PAGESIZE);
>                     write(writefd,my_read_str,PAGESIZE);
>     *gettimeofday(&t0,0);**
>     **                fsync(writefd);**
>     **              gettimeofday(&t1,0);*
>                     long elapsed = (t1.tv_sec-t0.tv_sec)*1000000 +
>     t1.tv_usec-t0.tv_usec;
>                     printf("Elapsed time is= %ld \n",elapsed);
>              }
>             close(readfd);
>             close(writefd);
>     }
>
>
>     -- As expected it requires typically 50000 microseconds for fsync
>     to complete on one machine and 200 microseconds
>     on another machine it took 290 microseconds to complete on an
>     average. So is machine with higher
>     performance is doing a 'fsync lie'?
>     i
>     -- If I have understood it clearly; "fsync lie" means the data is
>     not actually written to disk and it is in
>     some disk/controller buffer.  I) Now if disk loses power due to
>     some shutdown or any other disaster, data will
>     be lost. II) Can data be lost even without it ? (e.g. if it is
>     keeping data in some disk buffer and if fsync is being
>     invoked continuously then will that data can also  be lost? If
>     only part -I is true; then it can be acceptable
>     because probability of shutdown is usually less in production
>     environment. But if even II is true then there is a
>     problem.
>
>     -- But on the machine where disk doesn't lie performance of flume
>     using File channel is very low (I have seen it
>     maximum 100 KB/sec even with sufficient  DirectMemory allocation.)
>     Does anybody have stats about throughput
>     of file channel ? Is anybody getting better performance with file
>     channel (without fsync lies). What is the recommended
>     usage of it for an average scenario ? (Transferring files of few
>     MBs to HDFS sink continuously on typical hardware
>     (16 core processors, 16 GB RAM etc.)
>
>
>     Regards,
>     Jagadish
>
>     On 10/10/2012 11:30 PM, Brock Noland wrote:
>>     Hi,
>>
>>     On Wed, Oct 10, 2012 at 11:22 AM, Jagadish Bihani
>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>     Hi Brock
>>>
>>>     I will surely look into 'fsync lies'.
>>>
>>>     But as per my experiments I think "file channel" is causing the issue.
>>>     Because on those 2 machines (one with higher throughput and other with
>>>     lower)
>>>     I did following experiment:
>>>
>>>     cat Source -memory channel - file sink.
>>>
>>>     Now with this setup I got same throughput on both the machines. (around 3
>>>     MB/sec)
>>>     Now as I have used "File sink" it should also do "fsync" at some point of
>>>     time.
>>>     'File Sink' and 'File Channel' both do disk writes.
>>>     So if there is differences in disk behaviour then even in the 'File Sink' it
>>>     should be visible.
>>>
>>>     Am I missing something here?
>>     File sink does not call fsync.
>>
>>>     Regards,
>>>     Jagadish
>>>
>>>
>>>
>>>     On 10/10/2012 09:35 PM, Brock Noland wrote:
>>>>     OK your disk that is giving you 40KB/second is telling you the truth
>>>>     and the faster disk is lying to you. Look up "fsync lies" to see what
>>>>     I am referring to.
>>>>
>>>>     A spinning disk can do 100 fsync operations per second (this is done
>>>>     at the end of every batch). That is how I estimated your event size,
>>>>     40KB/second is doing 40KB / 100 =  409 bytes.
>>>>
>>>>     Once again, if you want increased performance, you should increase the
>>>>     batch size.
>>>>
>>>>     Brock
>>>>
>>>>     On Wed, Oct 10, 2012 at 11:00 AM, Jagadish Bihani
>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>     Hi
>>>>>
>>>>>     Yes. It is around 480 - 500 bytes.
>>>>>
>>>>>
>>>>>     On 10/10/2012 09:24 PM, Brock Noland wrote:
>>>>>>     How big are your events? Average about 400 bytes?
>>>>>>
>>>>>>     Brock
>>>>>>
>>>>>>     On Wed, Oct 10, 2012 at 5:11 AM, Jagadish Bihani
>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>     Hi
>>>>>>>
>>>>>>>     Thanks for the inputs Brock. After doing several experiments
>>>>>>>     eventually problem boiled down to disks.
>>>>>>>
>>>>>>>         -- But I had used the same configuration (so all software components
>>>>>>>     are
>>>>>>>     same in all 3 machines)
>>>>>>>     on all 3 machines.
>>>>>>>     -- In User guide it is written that if multiple file channel instances
>>>>>>>     are
>>>>>>>     active on the same agent then
>>>>>>>     different disks are preferable. But in my case only one file channel is
>>>>>>>     active per agent.
>>>>>>>     -- Only one pattern I observed that on the machines where I got better
>>>>>>>     performance have multiple disks.
>>>>>>>     But I don't understand how that will help if I have only 1 active file
>>>>>>>     channel.
>>>>>>>     -- What is the impact of the type of disk/disk device driver on
>>>>>>>     performance?
>>>>>>>     I mean I don't understand
>>>>>>>     with 1 disk I am getting 40 KB/sec and with other 2 MB/sec.
>>>>>>>
>>>>>>>     Could you please elaborate on File channel and disks correlation.
>>>>>>>
>>>>>>>     Regards,
>>>>>>>     Jagadish
>>>>>>>
>>>>>>>
>>>>>>>     On 10/09/2012 08:01 PM, Brock Noland wrote:
>>>>>>>
>>>>>>>     Hi,
>>>>>>>
>>>>>>>     Using file channel, in terms of performance, the number and type of
>>>>>>>     disks is going to be much more predictive of performance than CPU or
>>>>>>>     RAM. Note that consumer level drives/controllers will give you much
>>>>>>>     "better" performance because they lie to you about when your data is
>>>>>>>     actually written to the drive. If you search for "fsync lies" you'll
>>>>>>>     find more information on this.
>>>>>>>
>>>>>>>     You probably want to increase the batch size to get better performance.
>>>>>>>
>>>>>>>     Brock
>>>>>>>
>>>>>>>     On Tue, Oct 9, 2012 at 2:46 AM, Jagadish Bihani
>>>>>>>     <ja...@pubmatic.com>  <ma...@pubmatic.com>  wrote:
>>>>>>>
>>>>>>>     Hi
>>>>>>>
>>>>>>>     My flume setup is:
>>>>>>>
>>>>>>>     Source Agent : cat source - File Channel - Avro Sink
>>>>>>>     Dest Agent :     avro source - File Channel - HDFS Sink.
>>>>>>>
>>>>>>>     There is only 1 source agent and 1 destination agent.
>>>>>>>
>>>>>>>     I measure throughput as amount of data written to HDFS per second.
>>>>>>>     ( I have rolling interval 30 sec; so If 60 MB file is generated in 30
>>>>>>>     sec
>>>>>>>     the
>>>>>>>     throughput is : -- 2 MB/sec ).
>>>>>>>
>>>>>>>     I have run source agent on various machines with different hardware
>>>>>>>     configurations :
>>>>>>>     (In all cases I run flume agent with JAVA OPTIONS as
>>>>>>>     "-DJAVA_OPTS="-Xms500m -Xmx1g -Dcom.sun.management.jmxremote
>>>>>>>     -XX:MaxDirectMemorySize=2g")
>>>>>>>
>>>>>>>     JDK is 32 bit.
>>>>>>>
>>>>>>>     Experiment 1:
>>>>>>>     =====
>>>>>>>     RAM : 16 GB
>>>>>>>     Processor: Intel Xeon E5620 @ 2.40 GHz (16 cores).
>>>>>>>     64 bit Processor with 64 bit Kernel.
>>>>>>>     Throughput: 2 MB/sec
>>>>>>>
>>>>>>>     Experiment 2:
>>>>>>>     ======
>>>>>>>     RAM : 4 GB
>>>>>>>     Processor: Intel Xeon E5504  @ 2.00GHz (4 cores). 32 bit Processor
>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>     Throughput : 30 KB/sec
>>>>>>>
>>>>>>>     Experiment 3:
>>>>>>>     ======
>>>>>>>     RAM : 8 GB
>>>>>>>     Processor:Intel Xeon E5520 @ 2.27 GHz (16 cores).32 bit Processor
>>>>>>>     64 bit Processor with 32 bit Kernel.
>>>>>>>     Throughput : 80 KB/sec
>>>>>>>
>>>>>>>         -- So as can be seen there is huge difference in the throughput with
>>>>>>>     same
>>>>>>>     configuration but
>>>>>>>     different hardware.
>>>>>>>     -- In the first case where throughput is more RES is around 160 MB in
>>>>>>>     other
>>>>>>>     cases it is in
>>>>>>>     the range of 40 MB - 50 MB.
>>>>>>>
>>>>>>>     Can anybody please give insights that why there is this huge difference
>>>>>>>     in
>>>>>>>     the throughput?
>>>>>>>     What is the correlation between RAM and filechannel/HDFS sink
>>>>>>>     performance
>>>>>>>     and also
>>>>>>>     with 32-bit/64 bit kernel?
>>>>>>>
>>>>>>>     Regards,
>>>>>>>     Jagadish
>>>>>>>
>>>>>>>
>>>>>>>
>
>
>
>
> -- 
> Apache MRUnit - Unit testing MapReduce - 
> http://incubator.apache.org/mrunit/