You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by unmesha sreeveni <un...@gmail.com> on 2015/02/28 06:24:27 UTC

cleanup() in hadoop results in aggregation of whole file/not

I am having an input file, which contains last column as class label
7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
...................
I am trying to get the unique class label of the whole file. Inorder to get
the same I am doing the below code.

*public class MyMapper extends Mapper<LongWritable, Text, IntWritable,
FourvalueWritable>{*
*    Set<String> uniqueLabel = new HashSet();*

*    public void map(LongWritable key,Text value,Context context){*
*        //Last column of input is classlabel.*
*         Vector<String> cls = CustomParam.customLabel(line, delimiter,
classindex); // *
*         uniqueLabel.add(cls.get(0));*
*    }*
*    public void cleanup(Context context) throws IOException{*
*        //find min and max label*
*
 context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));*
*
 context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));*
*}*
Cleanup is only executed for once.

And after each map whether "Set uniqueLabel = new HashSet();" the set get
updated,Hope that set get updated for each map?
Hope I am able to get the uniqueLabel of the whole file in cleanup
Please suggest if I am wrong.

Thanks in advance.

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Ulul <ha...@ulul.org>.

Edit : instead of buffering in Hash and then emitting at cleanup you can 
use a combiner. Likely slower but easier to code if speed is not your 
main concern

Le 01/03/2015 13:41, Ulul a écrit :
> Hi
>
> I probably misunderstood your question because my impression is that 
> it's typically a job for a reducer. Emit "local" min and max with two 
> keys from each mapper and you will easily get gobal min and max in reducer
>
> Ulul
> Le 28/02/2015 14:10, Shahab Yunus a écrit :
>> As far as I understand cleanup is called per task. In your case I.e. 
>> per map task. To get an overall count or measure, you need to 
>> aggregate it yourself after the job is done.
>>
>> One way to do that is to use counters and then merge them 
>> programmatically at the end of the job.
>>
>> Regards,
>> Shahab
>>
>> On Saturday, February 28, 2015, unmesha sreeveni 
>> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>>     I am having an input file, which contains last column as class label
>>     7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>>     10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>>     7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>>     6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>>     ...................
>>     I am trying to get the unique class label of the whole file.
>>     Inorder to get the same I am doing the below code.
>>
>>     /public class MyMapper extends Mapper<LongWritable, Text,
>>     IntWritable, FourvalueWritable>{/
>>     /    Set<String> uniqueLabel = new HashSet();/
>>     /
>>     /
>>     /    public void map(LongWritable key,Text value,Context context){/
>>     /        //Last column of input is classlabel./
>>     /         Vector<String> cls = CustomParam.customLabel(line,
>>     delimiter, classindex); // /
>>     /         uniqueLabel.add(cls.get(0));/
>>     /    }/
>>     /    public void cleanup(Context context) throws IOException{/
>>     /        //find min and max label/
>>     / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>>     / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>>     /}/
>>     Cleanup is only executed for once.
>>
>>     And after each map whether "Set uniqueLabel = new HashSet();" the
>>     set get updated,Hope that set get updated for each map?
>>     Hope I am able to get the uniqueLabel of the whole file in cleanup
>>     Please suggest if I am wrong.
>>
>>     Thanks in advance.
>>
>>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Ulul <ha...@ulul.org>.

Edit : instead of buffering in Hash and then emitting at cleanup you can 
use a combiner. Likely slower but easier to code if speed is not your 
main concern

Le 01/03/2015 13:41, Ulul a écrit :
> Hi
>
> I probably misunderstood your question because my impression is that 
> it's typically a job for a reducer. Emit "local" min and max with two 
> keys from each mapper and you will easily get gobal min and max in reducer
>
> Ulul
> Le 28/02/2015 14:10, Shahab Yunus a écrit :
>> As far as I understand cleanup is called per task. In your case I.e. 
>> per map task. To get an overall count or measure, you need to 
>> aggregate it yourself after the job is done.
>>
>> One way to do that is to use counters and then merge them 
>> programmatically at the end of the job.
>>
>> Regards,
>> Shahab
>>
>> On Saturday, February 28, 2015, unmesha sreeveni 
>> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>>     I am having an input file, which contains last column as class label
>>     7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>>     10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>>     7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>>     6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>>     ...................
>>     I am trying to get the unique class label of the whole file.
>>     Inorder to get the same I am doing the below code.
>>
>>     /public class MyMapper extends Mapper<LongWritable, Text,
>>     IntWritable, FourvalueWritable>{/
>>     /    Set<String> uniqueLabel = new HashSet();/
>>     /
>>     /
>>     /    public void map(LongWritable key,Text value,Context context){/
>>     /        //Last column of input is classlabel./
>>     /         Vector<String> cls = CustomParam.customLabel(line,
>>     delimiter, classindex); // /
>>     /         uniqueLabel.add(cls.get(0));/
>>     /    }/
>>     /    public void cleanup(Context context) throws IOException{/
>>     /        //find min and max label/
>>     / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>>     / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>>     /}/
>>     Cleanup is only executed for once.
>>
>>     And after each map whether "Set uniqueLabel = new HashSet();" the
>>     set get updated,Hope that set get updated for each map?
>>     Hope I am able to get the uniqueLabel of the whole file in cleanup
>>     Please suggest if I am wrong.
>>
>>     Thanks in advance.
>>
>>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Ulul <ha...@ulul.org>.

Edit : instead of buffering in Hash and then emitting at cleanup you can 
use a combiner. Likely slower but easier to code if speed is not your 
main concern

Le 01/03/2015 13:41, Ulul a écrit :
> Hi
>
> I probably misunderstood your question because my impression is that 
> it's typically a job for a reducer. Emit "local" min and max with two 
> keys from each mapper and you will easily get gobal min and max in reducer
>
> Ulul
> Le 28/02/2015 14:10, Shahab Yunus a écrit :
>> As far as I understand cleanup is called per task. In your case I.e. 
>> per map task. To get an overall count or measure, you need to 
>> aggregate it yourself after the job is done.
>>
>> One way to do that is to use counters and then merge them 
>> programmatically at the end of the job.
>>
>> Regards,
>> Shahab
>>
>> On Saturday, February 28, 2015, unmesha sreeveni 
>> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>>     I am having an input file, which contains last column as class label
>>     7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>>     10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>>     7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>>     6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>>     ...................
>>     I am trying to get the unique class label of the whole file.
>>     Inorder to get the same I am doing the below code.
>>
>>     /public class MyMapper extends Mapper<LongWritable, Text,
>>     IntWritable, FourvalueWritable>{/
>>     /    Set<String> uniqueLabel = new HashSet();/
>>     /
>>     /
>>     /    public void map(LongWritable key,Text value,Context context){/
>>     /        //Last column of input is classlabel./
>>     /         Vector<String> cls = CustomParam.customLabel(line,
>>     delimiter, classindex); // /
>>     /         uniqueLabel.add(cls.get(0));/
>>     /    }/
>>     /    public void cleanup(Context context) throws IOException{/
>>     /        //find min and max label/
>>     / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>>     / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>>     /}/
>>     Cleanup is only executed for once.
>>
>>     And after each map whether "Set uniqueLabel = new HashSet();" the
>>     set get updated,Hope that set get updated for each map?
>>     Hope I am able to get the uniqueLabel of the whole file in cleanup
>>     Please suggest if I am wrong.
>>
>>     Thanks in advance.
>>
>>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Ulul <ha...@ulul.org>.

Edit : instead of buffering in Hash and then emitting at cleanup you can 
use a combiner. Likely slower but easier to code if speed is not your 
main concern

Le 01/03/2015 13:41, Ulul a écrit :
> Hi
>
> I probably misunderstood your question because my impression is that 
> it's typically a job for a reducer. Emit "local" min and max with two 
> keys from each mapper and you will easily get gobal min and max in reducer
>
> Ulul
> Le 28/02/2015 14:10, Shahab Yunus a écrit :
>> As far as I understand cleanup is called per task. In your case I.e. 
>> per map task. To get an overall count or measure, you need to 
>> aggregate it yourself after the job is done.
>>
>> One way to do that is to use counters and then merge them 
>> programmatically at the end of the job.
>>
>> Regards,
>> Shahab
>>
>> On Saturday, February 28, 2015, unmesha sreeveni 
>> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>>     I am having an input file, which contains last column as class label
>>     7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>>     10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>>     7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>>     6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>>     ...................
>>     I am trying to get the unique class label of the whole file.
>>     Inorder to get the same I am doing the below code.
>>
>>     /public class MyMapper extends Mapper<LongWritable, Text,
>>     IntWritable, FourvalueWritable>{/
>>     /    Set<String> uniqueLabel = new HashSet();/
>>     /
>>     /
>>     /    public void map(LongWritable key,Text value,Context context){/
>>     /        //Last column of input is classlabel./
>>     /         Vector<String> cls = CustomParam.customLabel(line,
>>     delimiter, classindex); // /
>>     /         uniqueLabel.add(cls.get(0));/
>>     /    }/
>>     /    public void cleanup(Context context) throws IOException{/
>>     /        //find min and max label/
>>     / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>>     / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>>     /}/
>>     Cleanup is only executed for once.
>>
>>     And after each map whether "Set uniqueLabel = new HashSet();" the
>>     set get updated,Hope that set get updated for each map?
>>     Hope I am able to get the uniqueLabel of the whole file in cleanup
>>     Please suggest if I am wrong.
>>
>>     Thanks in advance.
>>
>>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Ulul <ha...@ulul.org>.

Hi

I probably misunderstood your question because my impression is that 
it's typically a job for a reducer. Emit "local" min and max with two 
keys from each mapper and you will easily get gobal min and max in reducer

Ulul
Le 28/02/2015 14:10, Shahab Yunus a écrit :
> As far as I understand cleanup is called per task. In your case I.e. 
> per map task. To get an overall count or measure, you need to 
> aggregate it yourself after the job is done.
>
> One way to do that is to use counters and then merge them 
> programmatically at the end of the job.
>
> Regards,
> Shahab
>
> On Saturday, February 28, 2015, unmesha sreeveni 
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>
>     I am having an input file, which contains last column as class label
>     7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>     10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>     7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>     6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>     ...................
>     I am trying to get the unique class label of the whole file.
>     Inorder to get the same I am doing the below code.
>
>     /public class MyMapper extends Mapper<LongWritable, Text,
>     IntWritable, FourvalueWritable>{/
>     /    Set<String> uniqueLabel = new HashSet();/
>     /
>     /
>     /    public void map(LongWritable key,Text value,Context context){/
>     /        //Last column of input is classlabel./
>     /         Vector<String> cls = CustomParam.customLabel(line,
>     delimiter, classindex); // /
>     /         uniqueLabel.add(cls.get(0));/
>     /    }/
>     /    public void cleanup(Context context) throws IOException{/
>     /        //find min and max label/
>     / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>     / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>     /}/
>     Cleanup is only executed for once.
>
>     And after each map whether "Set uniqueLabel = new HashSet();" the
>     set get updated,Hope that set get updated for each map?
>     Hope I am able to get the uniqueLabel of the whole file in cleanup
>     Please suggest if I am wrong.
>
>     Thanks in advance.
>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Ulul <ha...@ulul.org>.

Hi

I probably misunderstood your question because my impression is that 
it's typically a job for a reducer. Emit "local" min and max with two 
keys from each mapper and you will easily get gobal min and max in reducer

Ulul
Le 28/02/2015 14:10, Shahab Yunus a écrit :
> As far as I understand cleanup is called per task. In your case I.e. 
> per map task. To get an overall count or measure, you need to 
> aggregate it yourself after the job is done.
>
> One way to do that is to use counters and then merge them 
> programmatically at the end of the job.
>
> Regards,
> Shahab
>
> On Saturday, February 28, 2015, unmesha sreeveni 
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>
>     I am having an input file, which contains last column as class label
>     7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>     10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>     7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>     6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>     ...................
>     I am trying to get the unique class label of the whole file.
>     Inorder to get the same I am doing the below code.
>
>     /public class MyMapper extends Mapper<LongWritable, Text,
>     IntWritable, FourvalueWritable>{/
>     /    Set<String> uniqueLabel = new HashSet();/
>     /
>     /
>     /    public void map(LongWritable key,Text value,Context context){/
>     /        //Last column of input is classlabel./
>     /         Vector<String> cls = CustomParam.customLabel(line,
>     delimiter, classindex); // /
>     /         uniqueLabel.add(cls.get(0));/
>     /    }/
>     /    public void cleanup(Context context) throws IOException{/
>     /        //find min and max label/
>     / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>     / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>     /}/
>     Cleanup is only executed for once.
>
>     And after each map whether "Set uniqueLabel = new HashSet();" the
>     set get updated,Hope that set get updated for each map?
>     Hope I am able to get the uniqueLabel of the whole file in cleanup
>     Please suggest if I am wrong.
>
>     Thanks in advance.
>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Ulul <ha...@ulul.org>.

Hi

I probably misunderstood your question because my impression is that 
it's typically a job for a reducer. Emit "local" min and max with two 
keys from each mapper and you will easily get gobal min and max in reducer

Ulul
Le 28/02/2015 14:10, Shahab Yunus a écrit :
> As far as I understand cleanup is called per task. In your case I.e. 
> per map task. To get an overall count or measure, you need to 
> aggregate it yourself after the job is done.
>
> One way to do that is to use counters and then merge them 
> programmatically at the end of the job.
>
> Regards,
> Shahab
>
> On Saturday, February 28, 2015, unmesha sreeveni 
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>
>     I am having an input file, which contains last column as class label
>     7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>     10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>     7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>     6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>     ...................
>     I am trying to get the unique class label of the whole file.
>     Inorder to get the same I am doing the below code.
>
>     /public class MyMapper extends Mapper<LongWritable, Text,
>     IntWritable, FourvalueWritable>{/
>     /    Set<String> uniqueLabel = new HashSet();/
>     /
>     /
>     /    public void map(LongWritable key,Text value,Context context){/
>     /        //Last column of input is classlabel./
>     /         Vector<String> cls = CustomParam.customLabel(line,
>     delimiter, classindex); // /
>     /         uniqueLabel.add(cls.get(0));/
>     /    }/
>     /    public void cleanup(Context context) throws IOException{/
>     /        //find min and max label/
>     / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>     / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>     /}/
>     Cleanup is only executed for once.
>
>     And after each map whether "Set uniqueLabel = new HashSet();" the
>     set get updated,Hope that set get updated for each map?
>     Hope I am able to get the uniqueLabel of the whole file in cleanup
>     Please suggest if I am wrong.
>
>     Thanks in advance.
>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Ulul <ha...@ulul.org>.

Hi

I probably misunderstood your question because my impression is that 
it's typically a job for a reducer. Emit "local" min and max with two 
keys from each mapper and you will easily get gobal min and max in reducer

Ulul
Le 28/02/2015 14:10, Shahab Yunus a écrit :
> As far as I understand cleanup is called per task. In your case I.e. 
> per map task. To get an overall count or measure, you need to 
> aggregate it yourself after the job is done.
>
> One way to do that is to use counters and then merge them 
> programmatically at the end of the job.
>
> Regards,
> Shahab
>
> On Saturday, February 28, 2015, unmesha sreeveni 
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>
>     I am having an input file, which contains last column as class label
>     7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
>     10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
>     7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
>     6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
>     ...................
>     I am trying to get the unique class label of the whole file.
>     Inorder to get the same I am doing the below code.
>
>     /public class MyMapper extends Mapper<LongWritable, Text,
>     IntWritable, FourvalueWritable>{/
>     /    Set<String> uniqueLabel = new HashSet();/
>     /
>     /
>     /    public void map(LongWritable key,Text value,Context context){/
>     /        //Last column of input is classlabel./
>     /         Vector<String> cls = CustomParam.customLabel(line,
>     delimiter, classindex); // /
>     /         uniqueLabel.add(cls.get(0));/
>     /    }/
>     /    public void cleanup(Context context) throws IOException{/
>     /        //find min and max label/
>     / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));/
>     / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));/
>     /}/
>     Cleanup is only executed for once.
>
>     And after each map whether "Set uniqueLabel = new HashSet();" the
>     set get updated,Hope that set get updated for each map?
>     Hope I am able to get the uniqueLabel of the whole file in cleanup
>     Please suggest if I am wrong.
>
>     Thanks in advance.
>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Shahab Yunus <sh...@gmail.com>.

As far as I understand cleanup is called per task. In your case I.e.
per map task. To get an overall count or measure, you need to aggregate
it yourself after the job is done.

One way to do that is to use counters and then merge them programmatically
at the end of the job.

Regards,
Shahab

On Saturday, February 28, 2015, unmesha sreeveni <un...@gmail.com>
wrote:

>
> I am having an input file, which contains last column as class label
> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
> ...................
> I am trying to get the unique class label of the whole file. Inorder to
> get the same I am doing the below code.
>
> *public class MyMapper extends Mapper<LongWritable, Text, IntWritable,
> FourvalueWritable>{*
> *    Set<String> uniqueLabel = new HashSet();*
>
> *    public void map(LongWritable key,Text value,Context context){*
> *        //Last column of input is classlabel.*
> *         Vector<String> cls = CustomParam.customLabel(line, delimiter,
> classindex); // *
> *         uniqueLabel.add(cls.get(0));*
> *    }*
> *    public void cleanup(Context context) throws IOException{*
> *        //find min and max label*
> *
>  context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));*
> *
>  context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));*
> *}*
> Cleanup is only executed for once.
>
> And after each map whether "Set uniqueLabel = new HashSet();" the set get
> updated,Hope that set get updated for each map?
> Hope I am able to get the uniqueLabel of the whole file in cleanup
> Please suggest if I am wrong.
>
> Thanks in advance.
>
>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Shahab Yunus <sh...@gmail.com>.

As far as I understand cleanup is called per task. In your case I.e.
per map task. To get an overall count or measure, you need to aggregate
it yourself after the job is done.

One way to do that is to use counters and then merge them programmatically
at the end of the job.

Regards,
Shahab

On Saturday, February 28, 2015, unmesha sreeveni <un...@gmail.com>
wrote:

>
> I am having an input file, which contains last column as class label
> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
> ...................
> I am trying to get the unique class label of the whole file. Inorder to
> get the same I am doing the below code.
>
> *public class MyMapper extends Mapper<LongWritable, Text, IntWritable,
> FourvalueWritable>{*
> *    Set<String> uniqueLabel = new HashSet();*
>
> *    public void map(LongWritable key,Text value,Context context){*
> *        //Last column of input is classlabel.*
> *         Vector<String> cls = CustomParam.customLabel(line, delimiter,
> classindex); // *
> *         uniqueLabel.add(cls.get(0));*
> *    }*
> *    public void cleanup(Context context) throws IOException{*
> *        //find min and max label*
> *
>  context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));*
> *
>  context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));*
> *}*
> Cleanup is only executed for once.
>
> And after each map whether "Set uniqueLabel = new HashSet();" the set get
> updated,Hope that set get updated for each map?
> Hope I am able to get the uniqueLabel of the whole file in cleanup
> Please suggest if I am wrong.
>
> Thanks in advance.
>
>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Shahab Yunus <sh...@gmail.com>.

As far as I understand cleanup is called per task. In your case I.e.
per map task. To get an overall count or measure, you need to aggregate
it yourself after the job is done.

One way to do that is to use counters and then merge them programmatically
at the end of the job.

Regards,
Shahab

On Saturday, February 28, 2015, unmesha sreeveni <un...@gmail.com>
wrote:

>
> I am having an input file, which contains last column as class label
> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
> ...................
> I am trying to get the unique class label of the whole file. Inorder to
> get the same I am doing the below code.
>
> *public class MyMapper extends Mapper<LongWritable, Text, IntWritable,
> FourvalueWritable>{*
> *    Set<String> uniqueLabel = new HashSet();*
>
> *    public void map(LongWritable key,Text value,Context context){*
> *        //Last column of input is classlabel.*
> *         Vector<String> cls = CustomParam.customLabel(line, delimiter,
> classindex); // *
> *         uniqueLabel.add(cls.get(0));*
> *    }*
> *    public void cleanup(Context context) throws IOException{*
> *        //find min and max label*
> *
>  context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));*
> *
>  context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));*
> *}*
> Cleanup is only executed for once.
>
> And after each map whether "Set uniqueLabel = new HashSet();" the set get
> updated,Hope that set get updated for each map?
> Hope I am able to get the uniqueLabel of the whole file in cleanup
> Please suggest if I am wrong.
>
> Thanks in advance.
>
>
>

Re: cleanup() in hadoop results in aggregation of whole file/not

Posted by Shahab Yunus <sh...@gmail.com>.

As far as I understand cleanup is called per task. In your case I.e.
per map task. To get an overall count or measure, you need to aggregate
it yourself after the job is done.

One way to do that is to use counters and then merge them programmatically
at the end of the job.

Regards,
Shahab

On Saturday, February 28, 2015, unmesha sreeveni <un...@gmail.com>
wrote:

>
> I am having an input file, which contains last column as class label
> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
> ...................
> I am trying to get the unique class label of the whole file. Inorder to
> get the same I am doing the below code.
>
> *public class MyMapper extends Mapper<LongWritable, Text, IntWritable,
> FourvalueWritable>{*
> *    Set<String> uniqueLabel = new HashSet();*
>
> *    public void map(LongWritable key,Text value,Context context){*
> *        //Last column of input is classlabel.*
> *         Vector<String> cls = CustomParam.customLabel(line, delimiter,
> classindex); // *
> *         uniqueLabel.add(cls.get(0));*
> *    }*
> *    public void cleanup(Context context) throws IOException{*
> *        //find min and max label*
> *
>  context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));*
> *
>  context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));*
> *}*
> Cleanup is only executed for once.
>
> And after each map whether "Set uniqueLabel = new HashSet();" the set get
> updated,Hope that set get updated for each map?
> Hope I am able to get the uniqueLabel of the whole file in cleanup
> Please suggest if I am wrong.
>
> Thanks in advance.
>
>
>