You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Ulul <ha...@ulul.org> on 2015/03/01 13:41:41 UTC
Re: cleanup() in hadoop results in aggregation of whole file/not
Hi
I probably misunderstood your question because my impression is that
it's typically a job for a reducer. Emit "local" min and max with two
keys from each mapper and you will easily get gobal min and max in reducer
Ulul
Le 28/02/2015 14:10, Shahab Yunus a écrit :
> As far as I understand cleanup is called per task. In your case I.e.
> per map task. To get an overall count or measure, you need to
> aggregate it yourself after the job is done.
>
> One way to do that is to use counters and then merge them
> programmatically at the end of the job.
>
> Regards,
> Shahab
>
> On Saturday, February 28, 2015, unmesha sreeveni
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>
> I am having an input file, which contains last column as class label
> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
> ...................
> I am trying to get the unique class label of the whole file.
> Inorder to get the same I am doing the below code.
>
> /public class MyMapper extends Mapper<LongWritable, Text,
> IntWritable, FourvalueWritable>{/
> / Set<String> uniqueLabel = new HashSet();/
> /
> /
> / public void map(LongWritable key,Text value,Context context){/
> / //Last column of input is classlabel./
> / Vector<String> cls = CustomParam.customLabel(line,
> delimiter, classindex); // /
> / uniqueLabel.add(cls.get(0));/
> / }/
> / public void cleanup(Context context) throws IOException{/
> / //find min and max label/
> / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
> / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
> /}/
> Cleanup is only executed for once.
>
> And after each map whether "Set uniqueLabel = new HashSet();" the
> set get updated,Hope that set get updated for each map?
> Hope I am able to get the uniqueLabel of the whole file in cleanup
> Please suggest if I am wrong.
>
> Thanks in advance.
>
>
Re: cleanup() in hadoop results in aggregation of whole file/not
Posted by Ulul <ha...@ulul.org>.
Edit : instead of buffering in Hash and then emitting at cleanup you can
use a combiner. Likely slower but easier to code if speed is not your
main concern
Le 01/03/2015 13:41, Ulul a écrit :
> Hi
>
> I probably misunderstood your question because my impression is that
> it's typically a job for a reducer. Emit "local" min and max with two
> keys from each mapper and you will easily get gobal min and max in reducer
>
> Ulul
> Le 28/02/2015 14:10, Shahab Yunus a écrit :
>> As far as I understand cleanup is called per task. In your case I.e.
>> per map task. To get an overall count or measure, you need to
>> aggregate it yourself after the job is done.
>>
>> One way to do that is to use counters and then merge them
>> programmatically at the end of the job.
>>
>> Regards,
>> Shahab
>>
>> On Saturday, February 28, 2015, unmesha sreeveni
>> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>> I am having an input file, which contains last column as class label
>> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>> ...................
>> I am trying to get the unique class label of the whole file.
>> Inorder to get the same I am doing the below code.
>>
>> /public class MyMapper extends Mapper<LongWritable, Text,
>> IntWritable, FourvalueWritable>{/
>> / Set<String> uniqueLabel = new HashSet();/
>> /
>> /
>> / public void map(LongWritable key,Text value,Context context){/
>> / //Last column of input is classlabel./
>> / Vector<String> cls = CustomParam.customLabel(line,
>> delimiter, classindex); // /
>> / uniqueLabel.add(cls.get(0));/
>> / }/
>> / public void cleanup(Context context) throws IOException{/
>> / //find min and max label/
>> / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>> / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>> /}/
>> Cleanup is only executed for once.
>>
>> And after each map whether "Set uniqueLabel = new HashSet();" the
>> set get updated,Hope that set get updated for each map?
>> Hope I am able to get the uniqueLabel of the whole file in cleanup
>> Please suggest if I am wrong.
>>
>> Thanks in advance.
>>
>>
>
Re: cleanup() in hadoop results in aggregation of whole file/not
Posted by Ulul <ha...@ulul.org>.
Edit : instead of buffering in Hash and then emitting at cleanup you can
use a combiner. Likely slower but easier to code if speed is not your
main concern
Le 01/03/2015 13:41, Ulul a écrit :
> Hi
>
> I probably misunderstood your question because my impression is that
> it's typically a job for a reducer. Emit "local" min and max with two
> keys from each mapper and you will easily get gobal min and max in reducer
>
> Ulul
> Le 28/02/2015 14:10, Shahab Yunus a écrit :
>> As far as I understand cleanup is called per task. In your case I.e.
>> per map task. To get an overall count or measure, you need to
>> aggregate it yourself after the job is done.
>>
>> One way to do that is to use counters and then merge them
>> programmatically at the end of the job.
>>
>> Regards,
>> Shahab
>>
>> On Saturday, February 28, 2015, unmesha sreeveni
>> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>> I am having an input file, which contains last column as class label
>> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>> ...................
>> I am trying to get the unique class label of the whole file.
>> Inorder to get the same I am doing the below code.
>>
>> /public class MyMapper extends Mapper<LongWritable, Text,
>> IntWritable, FourvalueWritable>{/
>> / Set<String> uniqueLabel = new HashSet();/
>> /
>> /
>> / public void map(LongWritable key,Text value,Context context){/
>> / //Last column of input is classlabel./
>> / Vector<String> cls = CustomParam.customLabel(line,
>> delimiter, classindex); // /
>> / uniqueLabel.add(cls.get(0));/
>> / }/
>> / public void cleanup(Context context) throws IOException{/
>> / //find min and max label/
>> / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>> / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>> /}/
>> Cleanup is only executed for once.
>>
>> And after each map whether "Set uniqueLabel = new HashSet();" the
>> set get updated,Hope that set get updated for each map?
>> Hope I am able to get the uniqueLabel of the whole file in cleanup
>> Please suggest if I am wrong.
>>
>> Thanks in advance.
>>
>>
>
Re: cleanup() in hadoop results in aggregation of whole file/not
Posted by Ulul <ha...@ulul.org>.
Edit : instead of buffering in Hash and then emitting at cleanup you can
use a combiner. Likely slower but easier to code if speed is not your
main concern
Le 01/03/2015 13:41, Ulul a écrit :
> Hi
>
> I probably misunderstood your question because my impression is that
> it's typically a job for a reducer. Emit "local" min and max with two
> keys from each mapper and you will easily get gobal min and max in reducer
>
> Ulul
> Le 28/02/2015 14:10, Shahab Yunus a écrit :
>> As far as I understand cleanup is called per task. In your case I.e.
>> per map task. To get an overall count or measure, you need to
>> aggregate it yourself after the job is done.
>>
>> One way to do that is to use counters and then merge them
>> programmatically at the end of the job.
>>
>> Regards,
>> Shahab
>>
>> On Saturday, February 28, 2015, unmesha sreeveni
>> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>> I am having an input file, which contains last column as class label
>> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>> ...................
>> I am trying to get the unique class label of the whole file.
>> Inorder to get the same I am doing the below code.
>>
>> /public class MyMapper extends Mapper<LongWritable, Text,
>> IntWritable, FourvalueWritable>{/
>> / Set<String> uniqueLabel = new HashSet();/
>> /
>> /
>> / public void map(LongWritable key,Text value,Context context){/
>> / //Last column of input is classlabel./
>> / Vector<String> cls = CustomParam.customLabel(line,
>> delimiter, classindex); // /
>> / uniqueLabel.add(cls.get(0));/
>> / }/
>> / public void cleanup(Context context) throws IOException{/
>> / //find min and max label/
>> / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>> / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>> /}/
>> Cleanup is only executed for once.
>>
>> And after each map whether "Set uniqueLabel = new HashSet();" the
>> set get updated,Hope that set get updated for each map?
>> Hope I am able to get the uniqueLabel of the whole file in cleanup
>> Please suggest if I am wrong.
>>
>> Thanks in advance.
>>
>>
>
Re: cleanup() in hadoop results in aggregation of whole file/not
Posted by Ulul <ha...@ulul.org>.
Edit : instead of buffering in Hash and then emitting at cleanup you can
use a combiner. Likely slower but easier to code if speed is not your
main concern
Le 01/03/2015 13:41, Ulul a écrit :
> Hi
>
> I probably misunderstood your question because my impression is that
> it's typically a job for a reducer. Emit "local" min and max with two
> keys from each mapper and you will easily get gobal min and max in reducer
>
> Ulul
> Le 28/02/2015 14:10, Shahab Yunus a écrit :
>> As far as I understand cleanup is called per task. In your case I.e.
>> per map task. To get an overall count or measure, you need to
>> aggregate it yourself after the job is done.
>>
>> One way to do that is to use counters and then merge them
>> programmatically at the end of the job.
>>
>> Regards,
>> Shahab
>>
>> On Saturday, February 28, 2015, unmesha sreeveni
>> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>> I am having an input file, which contains last column as class label
>> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>> ...................
>> I am trying to get the unique class label of the whole file.
>> Inorder to get the same I am doing the below code.
>>
>> /public class MyMapper extends Mapper<LongWritable, Text,
>> IntWritable, FourvalueWritable>{/
>> / Set<String> uniqueLabel = new HashSet();/
>> /
>> /
>> / public void map(LongWritable key,Text value,Context context){/
>> / //Last column of input is classlabel./
>> / Vector<String> cls = CustomParam.customLabel(line,
>> delimiter, classindex); // /
>> / uniqueLabel.add(cls.get(0));/
>> / }/
>> / public void cleanup(Context context) throws IOException{/
>> / //find min and max label/
>> / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>> / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>> /}/
>> Cleanup is only executed for once.
>>
>> And after each map whether "Set uniqueLabel = new HashSet();" the
>> set get updated,Hope that set get updated for each map?
>> Hope I am able to get the uniqueLabel of the whole file in cleanup
>> Please suggest if I am wrong.
>>
>> Thanks in advance.
>>
>>
>