You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mrunit.apache.org by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org> on 2012/11/30 18:31:59 UTC
[jira] [Created] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Yoni Ben-Meshulam created MRUNIT-165:
----------------------------------------
Summary: MapReduceDriver calls Mapper#cleanup for each input instead of once
Key: MRUNIT-165
URL: https://issues.apache.org/jira/browse/MRUNIT-165
Project: MRUnit
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Yoni Ben-Meshulam
MapReduceDriver calls the {{run}} method for each input, causing the {{cleanup}} method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the {{Mapper#cleanup}} method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
----
To reproduce, create a MapReduce job with some stateful mapper:
{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public static final Text KEY = new Text("SomeKey");
private Int someState = 0;
/**
* Increment someState for each input.
*
* @param context the Hadoop job Map context
* @throws java.io.IOException
*/
@Override
public void map(
LongWritable key,
Text value,
Context context
) throws IOException, InterruptedException {
this.someState += 1;
}
/**
* Runs once after all maps have occurred. Dumps the accumulated state to the output.
* @param context the Hadoop job Map context
*/
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(this.KEY, new IntWritable(someState));
}
}
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Description:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
----
To reproduce, create a MapReduce job with some stateful mapper:
{code}
public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public static final Text KEY = new Text("SomeKey");
private Int someState = 0;
/**
* Increment someState for each input.
*
* @param context the Hadoop job Map context
* @throws java.io.IOException
*/
@Override
public void map(
LongWritable key,
Text value,
Context context
) throws IOException, InterruptedException {
this.someState += 1;
}
/**
* Runs once after all maps have occurred. Dumps the accumulated state to the output.
* @param context the Hadoop job Map context
*/
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(this.KEY, new IntWritable(this.someState));
}
}
{code}
was:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
----
To reproduce, create a MapReduce job with some stateful mapper:
{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public static final Text KEY = new Text("SomeKey");
private Int someState = 0;
/**
* Increment someState for each input.
*
* @param context the Hadoop job Map context
* @throws java.io.IOException
*/
@Override
public void map(
LongWritable key,
Text value,
Context context
) throws IOException, InterruptedException {
this.someState += 1;
}
/**
* Runs once after all maps have occurred. Dumps the accumulated state to the output.
* @param context the Hadoop job Map context
*/
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(this.KEY, new IntWritable(this.someState));
}
}
{code}
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Description:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
See attached patch for an example of a stateful mapper and a test which fails due to the bug.
was:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
----
To reproduce, create a MapReduce job with some stateful mapper:
{code}
public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public static final Text KEY = new Text("SomeKey");
private Int someState = 0;
/**
* Increment someState for each input.
*
* @param context the Hadoop job Map context
* @throws java.io.IOException
*/
@Override
public void map(
LongWritable key,
Text value,
Context context
) throws IOException, InterruptedException {
this.someState += 1;
}
/**
* Runs once after all maps have occurred. Dumps the accumulated state to the output.
* @param context the Hadoop job Map context
*/
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(this.KEY, new IntWritable(this.someState));
}
}
{code}
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
> Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> See attached patch for an example of a stateful mapper and a test which fails due to the bug.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dave Beech reassigned MRUNIT-165:
---------------------------------
Assignee: Dave Beech
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Attachment: (was: reproduce_MRUNIT-165.patch)
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Description:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
----
To reproduce, create a MapReduce job with some stateful mapper:
{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public static final Text KEY = new Text("SomeKey");
private Int someState = 0;
/**
* Increment someState for each input.
*
* @param context the Hadoop job Map context
* @throws java.io.IOException
*/
@Override
public void map(
LongWritable key,
Text value,
Context context
) throws IOException, InterruptedException {
this.someState += 1;
}
/**
* Runs once after all maps have occurred. Dumps the accumulated state to the output.
* @param context the Hadoop job Map context
*/
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(this.KEY, new IntWritable(this.someState));
}
}
{code}
was:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
----
To reproduce, create a MapReduce job with some stateful mapper:
{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public static final Text KEY = new Text("SomeKey");
private Int someState = 0;
/**
* Increment someState for each input.
*
* @param context the Hadoop job Map context
* @throws java.io.IOException
*/
@Override
public void map(
LongWritable key,
Text value,
Context context
) throws IOException, InterruptedException {
this.someState += 1;
}
/**
* Runs once after all maps have occurred. Dumps the accumulated state to the output.
* @param context the Hadoop job Map context
*/
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(this.KEY, new IntWritable(someState));
}
}
{code}
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MRUNIT-165) MapReduceDriver calls
Mapper#cleanup for each input instead of once
Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507706#comment-13507706 ]
Dave Beech commented on MRUNIT-165:
-----------------------------------
Hi Yoni - could you please post your unit test code that exhibits this bug? I think this may not be a problem any longer since we now support multiple inputs, but I'd like to run your test again using the latest code to make sure. Thanks!
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Attachment: reproduce_MRUNIT-165.patch
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
> Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Attachment: reproduce_MRUNIT-165.patch
Attaching a simple stateful mapper and a test which fails due to the bug in MapReduceDriver.
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
> Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Attachment: reproduce_MRUNIT-165.patch
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
> Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Dave Beech (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dave Beech resolved MRUNIT-165.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.0.0
Thanks for the patch. I've confirmed the test does fail against the 0.9.0 code but it passes on trunk. The recent multiple inputs changes in MRUNIT-64 seem to have fixed this bug as a side effect.
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
> Fix For: 1.0.0
>
> Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> See attached patch for an example of a stateful mapper and a test which fails due to the bug.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MRUNIT-165) MapReduceDriver calls
Mapper#cleanup for each input instead of once
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510918#comment-13510918 ]
Hudson commented on MRUNIT-165:
-------------------------------
Integrated in mrunit-trunk #506 (See [https://builds.apache.org/job/mrunit-trunk/506/])
MRUNIT-165: MapReduceDriver calls Mapper#cleanup for each input instead of once (unit test contributed by Yoni Ben-Meshulam) (Revision 92ad229dafea47db18fa978e809d13261721a806)
Result = SUCCESS
dbeech : https://git-wip-us.apache.org/repos/asf?p=mrunit.git&a=commit&h=92ad229dafea47db18fa978e809d13261721a806
Files :
* src/test/java/org/apache/hadoop/mrunit/mapreduce/StatefulMapper.java
* src/test/java/org/apache/hadoop/mrunit/mapreduce/TestStatefulMapReduce.java
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
> Fix For: 1.0.0
>
> Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> See attached patch for an example of a stateful mapper and a test which fails due to the bug.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Attachment: (was: reproduce_MRUNIT-165.patch)
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
> Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class StatefulMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(this.someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MRUNIT-165) MapReduceDriver calls Mapper#cleanup
for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yoni Ben-Meshulam updated MRUNIT-165:
-------------------------------------
Description:
MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
----
To reproduce, create a MapReduce job with some stateful mapper:
{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public static final Text KEY = new Text("SomeKey");
private Int someState = 0;
/**
* Increment someState for each input.
*
* @param context the Hadoop job Map context
* @throws java.io.IOException
*/
@Override
public void map(
LongWritable key,
Text value,
Context context
) throws IOException, InterruptedException {
this.someState += 1;
}
/**
* Runs once after all maps have occurred. Dumps the accumulated state to the output.
* @param context the Hadoop job Map context
*/
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(this.KEY, new IntWritable(someState));
}
}
{code}
was:
MapReduceDriver calls the {{run}} method for each input, causing the {{cleanup}} method to be called multiple times.
I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the {{Mapper#cleanup}} method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
----
To reproduce, create a MapReduce job with some stateful mapper:
{code}
public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
public static final Text KEY = new Text("SomeKey");
private Int someState = 0;
/**
* Increment someState for each input.
*
* @param context the Hadoop job Map context
* @throws java.io.IOException
*/
@Override
public void map(
LongWritable key,
Text value,
Context context
) throws IOException, InterruptedException {
this.someState += 1;
}
/**
* Runs once after all maps have occurred. Dumps the accumulated state to the output.
* @param context the Hadoop job Map context
*/
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(this.KEY, new IntWritable(someState));
}
}
{code}
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> ----
> To reproduce, create a MapReduce job with some stateful mapper:
> {code}
> public class ClosedFormRegressionMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
> public static final Text KEY = new Text("SomeKey");
> private Int someState = 0;
> /**
> * Increment someState for each input.
> *
> * @param context the Hadoop job Map context
> * @throws java.io.IOException
> */
> @Override
> public void map(
> LongWritable key,
> Text value,
> Context context
> ) throws IOException, InterruptedException {
> this.someState += 1;
> }
> /**
> * Runs once after all maps have occurred. Dumps the accumulated state to the output.
> * @param context the Hadoop job Map context
> */
> @Override
> protected void cleanup(Context context) throws IOException, InterruptedException {
> context.write(this.KEY, new IntWritable(someState));
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MRUNIT-165) MapReduceDriver calls
Mapper#cleanup for each input instead of once
Posted by "Yoni Ben-Meshulam (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MRUNIT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509810#comment-13509810 ]
Yoni Ben-Meshulam commented on MRUNIT-165:
------------------------------------------
Awesome. Thanks Dave.
> MapReduceDriver calls Mapper#cleanup for each input instead of once
> -------------------------------------------------------------------
>
> Key: MRUNIT-165
> URL: https://issues.apache.org/jira/browse/MRUNIT-165
> Project: MRUnit
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Yoni Ben-Meshulam
> Assignee: Dave Beech
> Fix For: 1.0.0
>
> Attachments: reproduce_MRUNIT-165.patch
>
>
> MapReduceDriver calls the Mapper#run method for each input, causing the Mapper#cleanup method to be called multiple times.
> I believe this is a bug, since the contract in MapReduce is that, for a single Mapper instance, the Mapper#cleanup method is only called once after all inputs to that mapper have been processed. I might be mistaken in my assumption here.
> This would not be an issue, were it not for the fact that MapReduceDriver has only a single instance of Mapper.
> One solution might be to pass the Mapper _class_ into the MapReduceDriver and create a new instance for each input. Another solution might be to call the MapDriver with multiple inputs (which AFAIK is not possible).
> See attached patch for an example of a stateful mapper and a test which fails due to the bug.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira