You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2010/07/06 13:18:52 UTC

[jira] Resolved: (MAPREDUCE-583) get rid of excessive flushes from PipeMapper/Reducer

     [ https://issues.apache.org/jira/browse/MAPREDUCE-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu resolved MAPREDUCE-583.
-----------------------------------------------

    Resolution: Duplicate

Fixed by HADOOP-3429

> get rid of excessive flushes from PipeMapper/Reducer
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-583
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Joydeep Sen Sarma
>
> there's a flush on the buffered output streams in mapper/reducer for every row of data.
>       // 2/4 Hadoop to Tool                                                                                                                   
>       if (numExceptions_ == 0) {
>         if (!this.ignoreKey) {
>           write(key);
>           clientOut_.write('\t');
>         }
>         write(value);
>         if(!this.skipNewline) {
>             clientOut_.write('\n');
>         }
>         clientOut_.flush();
>       } else {
>         numRecSkipped_++;
>       }
> tried to measure impact of removing this. number of context switches reported by vmstat shows marked decline. 
> with flush (10 second intervals):
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  4  2    784  23140  83352 3114648    0    0  4819 32397 1175 13220 59 11 13 17
>  1  2    784 129724  80704 3075696    0    0  4614 27196 1156 14797 49 11 19 21
>  4  0    784  24160  83440 3174880    0    0    96 36070 1337 10976 67 11  9 12
>  5  0    784 155872  84400 3158840    0    0   125 44084 1280 11044 68 14 10  8
>  2  1    784 365128  87048 2892032    0    0   119 38472 1317 11610 69 14 10  7
> without flush:
>  5  0    784  24652  56056 3217864    0    0   310 29499 1379  7603 76  9  7  8
>  5  3    784 118456  54568 3209992    0    0  3249 33426 1173  6828 63 11 12 14
>  0  2    784 227628  54820 3198560    0    0  7840 30063 1146  8899 60 10 15 15
>  3  1    784  25608  55048 3313512    0    0  3251 36276 1194  7915 60 10 15 15
>  1  2    784 197324  49968 3194572    0    0  4714 35479 1281  8204 62 13 12 13
> cs goes down by about 20-30%. but having trouble measuring overall speed improvement (too many variables due to spec. execution etc. - need better benchmark).
> can't hurt.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.