You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mrunit.apache.org by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org> on 2012/11/30 18:31:59 UTC

[jira] [Created] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Yoni Ben-Meshulam created MRUNIT-165:
----------------------------------------

             Summary: MapReduceDriver calls Mapper#cleanup for each input instead of once
                 Key: MRUNIT-165
                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
             Project: MRUnit
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Yoni Ben-Meshulam


MapReduceDriver calls the {{run}} method for each input, causing the {{cleanup}} method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the {{Mapper#cleanup}} method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

----

To reproduce, create a MapReduce job with some stateful mapper:

{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final Text KEY = new Text("SomeKey");
    private Int someState = 0;

    /**
     * Increment someState for each input.
     *
     * @param context the Hadoop job Map context
     * @throws java.io.IOException
     */
    @Override
    public void map(
            LongWritable key,
            Text value,
            Context context
    ) throws IOException, InterruptedException {

        this.someState += 1;

    }

    /**
     * Runs once after all maps have occurred. Dumps the accumulated state to the output.
     * @param context the Hadoop job Map context
     */
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        context.write(this.KEY, new IntWritable(someState));
    }

}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Description: 
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

----

To reproduce, create a MapReduce job with some stateful mapper:

{code}
public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final Text KEY = new Text("SomeKey");
    private Int someState = 0;

    /**
     * Increment someState for each input.
     *
     * @param context the Hadoop job Map context
     * @throws java.io.IOException
     */
    @Override
    public void map(
            LongWritable key,
            Text value,
            Context context
    ) throws IOException, InterruptedException {

        this.someState += 1;

    }

    /**
     * Runs once after all maps have occurred. Dumps the accumulated state to the output.
     * @param context the Hadoop job Map context
     */
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        context.write(this.KEY, new IntWritable(this.someState));
    }

}
{code}

  was:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

----

To reproduce, create a MapReduce job with some stateful mapper:

{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final Text KEY = new Text("SomeKey");
    private Int someState = 0;

    /**
     * Increment someState for each input.
     *
     * @param context the Hadoop job Map context
     * @throws java.io.IOException
     */
    @Override
    public void map(
            LongWritable key,
            Text value,
            Context context
    ) throws IOException, InterruptedException {

        this.someState += 1;

    }

    /**
     * Runs once after all maps have occurred. Dumps the accumulated state to the output.
     * @param context the Hadoop job Map context
     */
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        context.write(this.KEY, new IntWritable(this.someState));
    }

}
{code}

    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Description: 
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

See attached patch for an example of a stateful mapper and a test which fails due to the bug.

  was:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

----

To reproduce, create a MapReduce job with some stateful mapper:

{code}
public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final Text KEY = new Text("SomeKey");
    private Int someState = 0;

    /**
     * Increment someState for each input.
     *
     * @param context the Hadoop job Map context
     * @throws java.io.IOException
     */
    @Override
    public void map(
            LongWritable key,
            Text value,
            Context context
    ) throws IOException, InterruptedException {

        this.someState += 1;

    }

    /**
     * Runs once after all maps have occurred. Dumps the accumulated state to the output.
     * @param context the Hadoop job Map context
     */
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        context.write(this.KEY, new IntWritable(this.someState));
    }

}
{code}

    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>         Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> See attached patch for an example of a stateful mapper and a test which fails due to the bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Beech reassigned MRUNIT-165:
---------------------------------

    Assignee: Dave Beech
    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Attachment:     (was: reproduce_MRUNIT-165.patch)
    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Description: 
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

----

To reproduce, create a MapReduce job with some stateful mapper:

{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final Text KEY = new Text("SomeKey");
    private Int someState = 0;

    /**
     * Increment someState for each input.
     *
     * @param context the Hadoop job Map context
     * @throws java.io.IOException
     */
    @Override
    public void map(
            LongWritable key,
            Text value,
            Context context
    ) throws IOException, InterruptedException {

        this.someState += 1;

    }

    /**
     * Runs once after all maps have occurred. Dumps the accumulated state to the output.
     * @param context the Hadoop job Map context
     */
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        context.write(this.KEY, new IntWritable(this.someState));
    }

}
{code}

  was:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

----

To reproduce, create a MapReduce job with some stateful mapper:

{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final Text KEY = new Text("SomeKey");
    private Int someState = 0;

    /**
     * Increment someState for each input.
     *
     * @param context the Hadoop job Map context
     * @throws java.io.IOException
     */
    @Override
    public void map(
            LongWritable key,
            Text value,
            Context context
    ) throws IOException, InterruptedException {

        this.someState += 1;

    }

    /**
     * Runs once after all maps have occurred. Dumps the accumulated state to the output.
     * @param context the Hadoop job Map context
     */
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        context.write(this.KEY, new IntWritable(someState));
    }

}
{code}

    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507706#comment-13507706 ] 

Dave Beech commented on MRUNIT-165:
-----------------------------------

Hi Yoni - could you please post your unit test code that exhibits this bug? I think this may not be a problem any longer since we now support multiple inputs, but I'd like to run your test again using the latest code to make sure. Thanks!
                
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Attachment: reproduce_MRUNIT-165.patch
    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>         Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Attachment: reproduce_MRUNIT-165.patch

Attaching a simple stateful mapper and a test which fails due to the bug in MapReduceDriver.
                
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>         Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Attachment: reproduce_MRUNIT-165.patch
    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>         Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Beech resolved MRUNIT-165.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0

Thanks for the patch. I've confirmed the test does fail against the 0.9.0 code but it passes on trunk. The recent multiple inputs changes in MRUNIT-64 seem to have fixed this bug as a side effect.
                
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>             Fix For: 1.0.0
>
>         Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> See attached patch for an example of a stateful mapper and a test which fails due to the bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510918#comment-13510918 ] 

Hudson commented on MRUNIT-165:
-------------------------------

Integrated in mrunit-trunk #506 (See [https://builds.apache.org/job/mrunit-trunk/506/])
    MRUNIT-165: MapReduceDriver calls Mapper#cleanup for each input instead of once (unit test contributed by Yoni Ben-Meshulam) (Revision 92ad229dafea47db18fa978e809d13261721a806)

     Result = SUCCESS
dbeech : https://git-wip-us.apache.org/repos/asf?p=mrunit.git&a=commit&h=92ad229dafea47db18fa978e809d13261721a806
Files : 
* src/test/java/org/apache/hadoop/mrunit/mapreduce/StatefulMapper.java
* src/test/java/org/apache/hadoop/mrunit/mapreduce/TestStatefulMapReduce.java

                
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>             Fix For: 1.0.0
>
>         Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> See attached patch for an example of a stateful mapper and a test which fails due to the bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Attachment:     (was: reproduce_MRUNIT-165.patch)
    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>         Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(this.someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------

    Description: 
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

----

To reproduce, create a MapReduce job with some stateful mapper:

{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final Text KEY = new Text("SomeKey");
    private Int someState = 0;

    /**
     * Increment someState for each input.
     *
     * @param context the Hadoop job Map context
     * @throws java.io.IOException
     */
    @Override
    public void map(
            LongWritable key,
            Text value,
            Context context
    ) throws IOException, InterruptedException {

        this.someState += 1;

    }

    /**
     * Runs once after all maps have occurred. Dumps the accumulated state to the output.
     * @param context the Hadoop job Map context
     */
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        context.write(this.KEY, new IntWritable(someState));
    }

}
{code}

  was:
MapReduceDriver calls the {{run}} method for each input, causing the {{cleanup}} method to be called multiple times. 

I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the {{Mapper#cleanup}} method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.

This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.

One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).

----

To reproduce, create a MapReduce job with some stateful mapper:

{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    public static final Text KEY = new Text("SomeKey");
    private Int someState = 0;

    /**
     * Increment someState for each input.
     *
     * @param context the Hadoop job Map context
     * @throws java.io.IOException
     */
    @Override
    public void map(
            LongWritable key,
            Text value,
            Context context
    ) throws IOException, InterruptedException {

        this.someState += 1;

    }

    /**
     * Runs once after all maps have occurred. Dumps the accumulated state to the output.
     * @param context the Hadoop job Map context
     */
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        context.write(this.KEY, new IntWritable(someState));
    }

}
{code}

    
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
>     public static final Text KEY = new Text("SomeKey");
>     private Int someState = 0;
>     /**
>      * Increment someState for each input.
>      *
>      * @param context the Hadoop job Map context
>      * @throws java.io.IOException
>      */
>     @Override
>     public void map(
>             LongWritable key,
>             Text value,
>             Context context
>     ) throws IOException, InterruptedException {
>         this.someState += 1;
>     }
>     /**
>      * Runs once after all maps have occurred. Dumps the accumulated state to the output.
>      * @param context the Hadoop job Map context
>      */
>     @Override
>     protected void cleanup(Context context) throws IOException, InterruptedException {
>         context.write(this.KEY, new IntWritable(someState));
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup for each input instead of once

Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509810#comment-13509810 ] 

Yoni Ben-Meshulam commented on MRUNIT-165:
------------------------------------------

Awesome. Thanks Dave.
                
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
>                 Key: MRUNIT-165
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-165
>             Project: MRUnit
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Yoni Ben-Meshulam
>            Assignee: Dave Beech
>             Fix For: 1.0.0
>
>         Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times. 
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> See attached patch for an example of a stateful mapper and a test which fails due to the bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira