You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org> on 2011/06/24 21:07:49 UTC

[jira] [Commented] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054631#comment-13054631 ] 

jiraposter@reviews.apache.org commented on MAPREDUCE-1347:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/953/
-----------------------------------------------------------

Review request for hadoop-mapreduce and Todd Lipcon.


Summary
-------

Used the makeComputingMap from Guava's MapMaker to provide a thread safe way of creating a RecordWriter cache.

For some reason, the map is not really caching it and is instead trying to apply() over and over again for the same key-value pairs.


This addresses bug MAPREDUCE-1347.
    http://issues.apache.org/jira/browse/MAPREDUCE-1347


Diffs
-----

  mapreduce/ivy.xml 85ee014 
  mapreduce/ivy/libraries.properties 9d40aaa 
  mapreduce/src/java/org/apache/hadoop/mapred/lib/MultipleOutputFormat.java b8944f1 
  mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMultipleTextOutputFormat.java 14c097d 

Diff: https://reviews.apache.org/r/953/diff


Testing
-------

Added a test case, but it fails with the current behavior of MapMaker's makeComputingMap() (would pass if its alright)


Thanks,

Harsh



> Missing synchronization in MultipleOutputFormat
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1347
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>         Attachments: MAPREDUCE-1347.r2.diff, MAPREDUCE-1347.r3.diff, mapreduce.1347.r1.diff
>
>
> MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing the recordWriters member. When using multithreaded mappers or reducers, this can result in problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException. Doing this more fine-grained than just synchronizing the whole method is probably a good idea, so that multithreaded mappers can actually achieve parallelism writing into separate output streams.
> From what I can tell, the new API's MultipleOutputs seems not to have this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira