You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by helena21 <ah...@gmail.com> on 2007/12/28 15:35:05 UTC

how to create collections in the mapper class

Hi Everybody,

i want to create arraylist that collects some objects from the input in the
mapper class so that i want to use these collections to filter my input. the
problem is my arraylist can't have even one object in it. its size is always
zero. pls pls point me how can i create arraylist or other collection
objects. i make it static object but still the arraylist can't collect any
object.

Thanks
Helen
-- 
View this message in context: http://www.nabble.com/how-to-create-collections-in-the-mapper-class-tp14526519p14526519.html
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: how to create collections in the mapper class

Posted by Ted Dunning <td...@veoh.com>.

I would like to point out that this is a REALLY bad idiom.  You should use a
static initializer.

    private static Map usersMap = new HashMap();

Also, since this is a static field in a very small class, there is very
little reason to use a getter.  No need for 7 lines of code when one will
do.

On 12/31/07 1:17 AM, "helena21" <ah...@gmail.com> wrote:

>                 private static Map usersMap=null;
> 
> public static Map getUsersMap(){
> if(usersMap==null){
> usersMap=new HashMap();
> }
> return usersMap;
> }

Re: how to create collections in the mapper class

Posted by Ted Dunning <td...@veoh.com>.

Yes.  This is what I suspected.

I don't know why your map is empty, but it probably has to do with the fact
that the mappers are being invoked more than you think.

Regardless of that, this is a very poor design for this problem.  It would
be much better if you were to simply use a map-reduce pass to eliminate
duplicate elements.  The basic idea would be to use the following functions:

   map: <key, value> -> <key, key, value>
   collect and reduce: <key, values> -> null, first(values)

This will give you an echo of your original file with all duplicates
removed.  You can then do the processing that you originally planned to do.

I should also point out that if your duplicate records are grouped together
in your input data, then this operation will be very efficient because the
collect function will do most of the duplicate elimination even before your
data is written to disk.


On 12/31/07 1:17 AM, "helena21" <ah...@gmail.com> wrote:

> 
> Thanks, for your response. Just to make my question clear i want to have
> hashMap and declare it as follows
> public static class MapClass extends MapReduceBase implements Mapper {
> private final static LongWritable ONE = new LongWritable(1);
>                 private static Map usersMap=null;
> 
> public static Map getUsersMap(){
> if(usersMap==null){
> usersMap=new HashMap();
> }
> return usersMap;
> }
> 
>          ........
>       
>           public void map(WritableComparable key, Writable value,
> OutputCollector output, Reporter reporter) throws IOException {
> 
>                .......
>                        // nkey is object
>                       //name is Text
> 
>                        if(getUsersMap().get(nKey)==null){
> output.collect(name, ONE);
> getUsersMap().put(nKey, data[12]);
> }
> 
>                      ......
> 
> 
>                      }
> 
> 
>                the problem is my hashmap(userMap) is always empty.Now I hope
> my problem is clear.
> 
> Thanks,
> 
> Helen
> 
> 
> 
> 
> 
> Ted Dunning-3 wrote:
>> 
>> 
>> This sounds like there is a little bit of confusion going on here.
>> 
>> It is common for people who are starting with Hadoop that they are
>> surprised
>> when static fields of the mapper do not get shared across all parallel
>> instances of the map function.  This is, of course, because you are
>> running
>> many mappers.
>> 
>> Usually when people say what you are saying, the reason is that they are
>> trying to do something like removing duplicate elements.  The best way to
>> do
>> that is to NOT try to put state into the map function, but rather to use
>> the
>> reduce and sorting functions to do the work.  A good example is trying to
>> find all of the unique words in a set of documents.  If you just use a
>> word-counting function, you get what you want (a list of unique words).
>> If
>> you want a list of unique words per day, then you simply have to change
>> the
>> program so that the mapper outputs a key that contains the word and the
>> day
>> and do the count as before.
>> 
>> Remember also that your program may contain several map/reduce steps.
>> 
>> Perhaps if you say more about what you are trying to do, it would be
>> easier
>> to help you.
>> 
>> 
>> On 12/28/07 6:35 AM, "helena21" <ah...@gmail.com> wrote:
>> 
>>> 
>>> Hi Everybody,
>>> 
>>> i want to create arraylist that collects some objects from the input in
>>> the
>>> mapper class so that i want to use these collections to filter my input.
>>> the
>>> problem is my arraylist can't have even one object in it. its size is
>>> always
>>> zero. pls pls point me how can i create arraylist or other collection
>>> objects. i make it static object but still the arraylist can't collect
>>> any
>>> object.
>>> 
>>> Thanks
>>> Helen
>> 
>> 
>>

Re: how to create collections in the mapper class

Posted by helena21 <ah...@gmail.com>.

Thanks, for your response. Just to make my question clear i want to have
hashMap and declare it as follows 
public static class MapClass extends MapReduceBase implements Mapper {
		private final static LongWritable ONE = new LongWritable(1);
                private static Map usersMap=null;

		public static Map getUsersMap(){
			if(usersMap==null){
				usersMap=new HashMap();
			}
			return usersMap;
		}

         ........
      
          public void map(WritableComparable key, Writable value,
				OutputCollector output, Reporter reporter) throws IOException {

               .......
                       // nkey is object 
                      //name is Text

                       if(getUsersMap().get(nKey)==null){
						output.collect(name, ONE);
						getUsersMap().put(nKey, data[12]);
					}

                     ......


                     }       


               the problem is my hashmap(userMap) is always empty.Now I hope
my problem is clear.

Thanks,

Helen





Ted Dunning-3 wrote:
> 
> 
> This sounds like there is a little bit of confusion going on here.
> 
> It is common for people who are starting with Hadoop that they are
> surprised
> when static fields of the mapper do not get shared across all parallel
> instances of the map function.  This is, of course, because you are
> running
> many mappers.
> 
> Usually when people say what you are saying, the reason is that they are
> trying to do something like removing duplicate elements.  The best way to
> do
> that is to NOT try to put state into the map function, but rather to use
> the
> reduce and sorting functions to do the work.  A good example is trying to
> find all of the unique words in a set of documents.  If you just use a
> word-counting function, you get what you want (a list of unique words). 
> If
> you want a list of unique words per day, then you simply have to change
> the
> program so that the mapper outputs a key that contains the word and the
> day
> and do the count as before.
> 
> Remember also that your program may contain several map/reduce steps.
> 
> Perhaps if you say more about what you are trying to do, it would be
> easier
> to help you.
> 
> 
> On 12/28/07 6:35 AM, "helena21" <ah...@gmail.com> wrote:
> 
>> 
>> Hi Everybody,
>> 
>> i want to create arraylist that collects some objects from the input in
>> the
>> mapper class so that i want to use these collections to filter my input.
>> the
>> problem is my arraylist can't have even one object in it. its size is
>> always
>> zero. pls pls point me how can i create arraylist or other collection
>> objects. i make it static object but still the arraylist can't collect
>> any
>> object.
>> 
>> Thanks
>> Helen
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/how-to-create-collection-object-in-the-mapper-class-tp14526519p14555596.html
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: how to create collections in the mapper class

Posted by Ted Dunning <td...@veoh.com>.

This sounds like there is a little bit of confusion going on here.

It is common for people who are starting with Hadoop that they are surprised
when static fields of the mapper do not get shared across all parallel
instances of the map function.  This is, of course, because you are running
many mappers.

Usually when people say what you are saying, the reason is that they are
trying to do something like removing duplicate elements.  The best way to do
that is to NOT try to put state into the map function, but rather to use the
reduce and sorting functions to do the work.  A good example is trying to
find all of the unique words in a set of documents.  If you just use a
word-counting function, you get what you want (a list of unique words).  If
you want a list of unique words per day, then you simply have to change the
program so that the mapper outputs a key that contains the word and the day
and do the count as before.

Remember also that your program may contain several map/reduce steps.

Perhaps if you say more about what you are trying to do, it would be easier
to help you.

On 12/28/07 6:35 AM, "helena21" <ah...@gmail.com> wrote:

> 
> Hi Everybody,
> 
> i want to create arraylist that collects some objects from the input in the
> mapper class so that i want to use these collections to filter my input. the
> problem is my arraylist can't have even one object in it. its size is always
> zero. pls pls point me how can i create arraylist or other collection
> objects. i make it static object but still the arraylist can't collect any
> object.
> 
> Thanks
> Helen