You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2012/10/27 10:39:13 UTC

[jira] [Created] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Gopal V created MAPREDUCE-4755:
----------------------------------

             Summary: Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
                 Key: MAPREDUCE-4755
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
    Affects Versions: 3.0.0
         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
            Reporter: Gopal V
            Assignee: Gopal V
         Attachments: 0001-first-cut-of-MMapOutputBuffer.patch

The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.

With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.

The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic, when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Posted by "Gopal V (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486748#comment-13486748 ] 

Gopal V commented on MAPREDUCE-4755:
------------------------------------

Quicksorting a 64Mb chunk off an mmap() turned out to be slow mostly because of the overhead of the ByteBuffers (i.e DatInputBuffer needs a ByteBuffer version as well). Quicksort is great with paging, because it does mostly linear scans on the partitions.

Actually, MAPREDUCE-3235 does some of the things I discovered on my own. The first being the kvmeta swapper, which seemed to get no benefit out of swapping single ints (KVINDEX), the indirection was a complete waste of CPU and cache lines - I used the INDEX space to dump in the vallen into the buffer instead.

Perhaps I'm being biased by my own hardware here, but since the cache lines are going to be 64 bytes wide (at least on any serious hardware), there was no real benefit from making the data only 32 byte wide anyway, while sorting. 

I think I need to redesign this into an async pipeline. I *really* hope that the popular Comparator implementations are thread-safe.


                
> Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4755
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization, sort
>
> The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.
> With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.
> The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486098#comment-13486098 ] 

Todd Lipcon commented on MAPREDUCE-4755:
----------------------------------------

Hi Gopal,

I haven't looked at the patch yet, but how do you deal with compressing the spill files? Also, how much "handling your own buffers" do you anticipate obviating? We obviously still need to have bounded memory usage -- just using many GBs and letting the OS page stuff out is a recipe for swap usage and the machine grinding to a halt.

I agree that we could save CPU using direct buffers (both by avoiding copies and by using the more efficient CRC code), but I'm not sold on the mmap part. The other improvement you should look into if you're interested in improving the sort process would be to do the sort on L2-sized chunks and then merge at spill time. Right now our sort is horribly cache-inefficient.
                
> Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4755
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization, sort
>
> The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.
> With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.
> The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Posted by "Gopal V (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486251#comment-13486251 ] 

Gopal V commented on MAPREDUCE-4755:
------------------------------------

This is rather rough at the moment. Since I'm spilling via mmap(), I can't compress the spill till I am done sorting it.

As far as memory bounding goes, the real advantage is that we have a split between java GC space and the sort mb space, something which didn't exist before. Unfortunately, NIO doesn't provide me a portable unmap() operation, which makes a bit of this stuff fairly painful to unload from memory (perhaps an fadvise/madvise to unload pages).

Considering my experiment was to only sort the kvmeta here and not the kvbuffer, compression would cause significant trouble because I assume I can seek & read fast within the MappedByteBuffer.

I did look at sorting L2 sized chunks - which raises an interesting question, the comparator is responsible for blowing off the cache, the actual data (in say, a terasort) is actually not staying in cache during the loops. I might get a better bang for buck fixing the number of comparator calls over messing with the size of the sort block - perhaps hybridizing with something like a smoothsort after checking a few random positions for being sorted.
                
> Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4755
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization, sort
>
> The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.
> With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.
> The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Posted by "Gopal V (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485388#comment-13485388 ] 

Gopal V commented on MAPREDUCE-4755:
------------------------------------

TeraSort benchmarks for 10M entries showed improvement from 70s to 53s in wall-clock time (user cpu time is higher than wall-clock because of the asynchronous sort FutureTask)

running "terasort /tmp/data/ file:///tmp/t.$RANDOM/"

{code}
	Map-Reduce Framework
		Map input records=10000000
		Map output records=10000000
		GC time elapsed (ms)=81
		Total committed heap usage (bytes)=4242079744
real	0m53.355s
user	0m56.392s
sys	0m6.548s
{code}

{code}
	Map-Reduce Framework
		Map input records=10000000
		Map output records=10000000
		GC time elapsed (ms)=374
		Total committed heap usage (bytes)=4878761984
real	1m10.191s
user	1m8.908s
sys	0m8.609s
{code}

And the results from both runs are identical byte-for-byte

{code}
$ md5sum t.19982/part-r-00000 t.13037/part-r-00000 
d3368a9e0897ea8efcd2a290d8e27906  t.19982/part-r-00000
d3368a9e0897ea8efcd2a290d8e27906  t.13037/part-r-00000
{code}

The combiner remains to be tested and the counters+progress indicators need to be fixed.
                
> Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4755
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization, sort
>         Attachments: 0001-first-cut-of-MMapOutputBuffer.patch
>
>
> The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.
> With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.
> The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic, when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486304#comment-13486304 ] 

Todd Lipcon commented on MAPREDUCE-4755:
----------------------------------------

bq. This is rather rough at the moment. Since I'm spilling via mmap(), I can't compress the spill till I am done sorting it.
bq. Considering my experiment was to only sort the kvmeta here and not the kvbuffer, compression would cause significant trouble because I assume I can seek & read fast within the MappedByteBuffer.

Right -- we definitely don't want to be quicksorting a file which is mmapped.

bq. I did look at sorting L2 sized chunks - which raises an interesting question, the comparator is responsible for blowing off the cache, the actual data (in say, a terasort) is actually not staying in cache during the loops

Right, the trick would be to actually sort the _data_ in L2 sized chunks, not just do the indirection that we do now. For example:

- While data arrives:
-- Accumulate 2MB of data and associated indexes
-- Indirect-sort this data, which fits inside cache
-- 'Spill to RAM' -- actually rearrange the data to be in sorted order, and drop the indexes
-- If RAM is full:
--- merge the sorted segments in RAM to disk

There is actually an implementation of this in the facebook branch on github, if you want to take a look. I also had done some prototyping around improving cache efficiency in https://issues.apache.org/jira/browse/MAPREDUCE-3235
                
> Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4755
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization, sort
>
> The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.
> With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.
> The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Posted by "Gopal V (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated MAPREDUCE-4755:
-------------------------------

    Description: 
The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.

With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.

The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O.

  was:
The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.

With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.

The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic, when given enough address space & uses the buffer caches to deliver best effort I/O.

    
> Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4755
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization, sort
>         Attachments: 0001-first-cut-of-MMapOutputBuffer.patch
>
>
> The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.
> With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.
> The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Posted by "Gopal V (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated MAPREDUCE-4755:
-------------------------------

    Attachment: 0001-first-cut-of-MMapOutputBuffer.patch

A address hungry version of MMapOutputBuffer
                
> Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4755
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization, sort
>         Attachments: 0001-first-cut-of-MMapOutputBuffer.patch
>
>
> The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.
> With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.
> The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic, when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4755) Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect

Posted by "Gopal V (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated MAPREDUCE-4755:
-------------------------------

    Attachment:     (was: 0001-first-cut-of-MMapOutputBuffer.patch)
    
> Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4755
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core)
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: optimization, sort
>
> The MapOutputBuffer has been written with a very severe constraint on the amount of memory it can consume. This results in code that has to page-in & page-out (i.e spill) data as it passes through the map buffers.
> With the advent of the java.nio package, there is a fast and portable MMap alternative to handling your own buffers. This exists outside the GC space of Java and yet provides decently fast memory access to all the data.
> The suggestion is that using mmap() direct buffers can be faster when a spill is involved and simpler than the current spill logic when given enough address space & uses the buffer caches to deliver best effort I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira