You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by maha <ma...@umail.ucsb.edu> on 2011/02/07 20:21:26 UTC

Mappers reading from a Global inverted Index

Thanks Vijay, now my question is how can I build one inverted index and have it ready to be accessed by all Mappers ??

I had my main function initialize a global variable declared in the main class as:

  public static Hashtable<String,String> hashtable = new Hashtable<String,String>(); ;         

Yet, the mappers find it Null.

Any help is appreciated ,


Maha

Depending on the scale of data, between the two, it would be best stored in hdfs 
, and use the built-in InputFormat-s , as that is more scalable. 

If necessary, (depending on how the data is stored), build a custom InputFormat, 
as per the API and set it for the job. 
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html . 



--
 Vijay



----- Original Message ----
> From: maha <ma...@umail.ucsb.edu>
> To: common-user <co...@hadoop.apache.org>
> Sent: Sun, February 6, 2011 5:09:38 PM
> Subject: Mapper reading from local directory or global variable?
> 
> Hello,
> 
>  I'm wondering which option is more efficient to store  "People's Names"  to 
> be processed by Mappers. 
> 
> 
> 1. Store it in a  global variable declared in the main class?
> 
> 2. Store it in the HDFS to  be distributed and read in each map.
> 
> 
>  Note that the number of  mappers until now is around 1000 mappers. Appreciate 
> any thought :)
> 
> Thank  you,
> 
> Maha

Re: Mappers reading from a Global inverted Index

Posted by maha <ma...@umail.ucsb.edu>.
My question is simply how to have a global variable (eg. HashTable) in hadoop ?

To be available for all mappers. Please help,
 
Thank you,

 Maha

On Feb 7, 2011, at 11:21 AM, maha wrote:

> Thanks Vijay, now my question is how can I build one inverted index and have it ready to be accessed by all Mappers ??
> 
> I had my main function initialize a global variable declared in the main class as:
> 
>  public static Hashtable<String,String> hashtable = new Hashtable<String,String>(); ;         
> 
> Yet, the mappers find it Null.
> 
> Any help is appreciated ,
> 
> 
> Maha
> 
> Depending on the scale of data, between the two, it would be best stored in hdfs 
> , and use the built-in InputFormat-s , as that is more scalable. 
> 
> If necessary, (depending on how the data is stored), build a custom InputFormat, 
> as per the API and set it for the job. 
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html . 
> 
> 
> 
> --
> Vijay
> 
> 
> 
> ----- Original Message ----
>> From: maha <ma...@umail.ucsb.edu>
>> To: common-user <co...@hadoop.apache.org>
>> Sent: Sun, February 6, 2011 5:09:38 PM
>> Subject: Mapper reading from local directory or global variable?
>> 
>> Hello,
>> 
>> I'm wondering which option is more efficient to store  "People's Names"  to 
>> be processed by Mappers. 
>> 
>> 
>> 1. Store it in a  global variable declared in the main class?
>> 
>> 2. Store it in the HDFS to  be distributed and read in each map.
>> 
>> 
>> Note that the number of  mappers until now is around 1000 mappers. Appreciate 
>> any thought :)
>> 
>> Thank  you,
>> 
>> Maha


Re: Mappers reading from a Global inverted Index

Posted by maha <ma...@umail.ucsb.edu>.
Thanks Ted, I needed to know that there is no way I can make my program less IO-intensive.

Maha

On Feb 7, 2011, at 12:04 PM, Ted Dunning wrote:

> That isn't going to happen.
> 
> Remember that all of the mappers are running in different JVM's on
> (typically) different machines.  They can't see each other.
> 
> If you want to collect data into one place, use a reducer.
> 
> On Mon, Feb 7, 2011 at 11:21 AM, maha <ma...@umail.ucsb.edu> wrote:
> 
>> Thanks Vijay, now my question is how can I build one inverted index and
>> have it ready to be accessed by all Mappers ??
>> 
>> I had my main function initialize a global variable declared in the main
>> class as:
>> 
>> public static Hashtable<String,String> hashtable = new
>> Hashtable<String,String>(); ;
>> 
>> Yet, the mappers find it Null.
>> 
>> Any help is appreciated ,
>> 
>> 
>> Maha
>> 
>> Depending on the scale of data, between the two, it would be best stored in
>> hdfs
>> , and use the built-in InputFormat-s , as that is more scalable.
>> 
>> If necessary, (depending on how the data is stored), build a custom
>> InputFormat,
>> as per the API and set it for the job.
>> 
>> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html.
>> 
>> 
>> 
>> --
>> Vijay
>> 
>> 
>> 
>> ----- Original Message ----
>>> From: maha <ma...@umail.ucsb.edu>
>>> To: common-user <co...@hadoop.apache.org>
>>> Sent: Sun, February 6, 2011 5:09:38 PM
>>> Subject: Mapper reading from local directory or global variable?
>>> 
>>> Hello,
>>> 
>>> I'm wondering which option is more efficient to store  "People's Names"
>> to
>>> be processed by Mappers.
>>> 
>>> 
>>> 1. Store it in a  global variable declared in the main class?
>>> 
>>> 2. Store it in the HDFS to  be distributed and read in each map.
>>> 
>>> 
>>> Note that the number of  mappers until now is around 1000 mappers.
>> Appreciate
>>> any thought :)
>>> 
>>> Thank  you,
>>> 
>>> Maha
>> 


Re: Mappers reading from a Global inverted Index

Posted by Ted Dunning <td...@maprtech.com>.
That isn't going to happen.

Remember that all of the mappers are running in different JVM's on
(typically) different machines.  They can't see each other.

If you want to collect data into one place, use a reducer.

On Mon, Feb 7, 2011 at 11:21 AM, maha <ma...@umail.ucsb.edu> wrote:

> Thanks Vijay, now my question is how can I build one inverted index and
> have it ready to be accessed by all Mappers ??
>
> I had my main function initialize a global variable declared in the main
> class as:
>
>  public static Hashtable<String,String> hashtable = new
> Hashtable<String,String>(); ;
>
> Yet, the mappers find it Null.
>
> Any help is appreciated ,
>
>
> Maha
>
> Depending on the scale of data, between the two, it would be best stored in
> hdfs
> , and use the built-in InputFormat-s , as that is more scalable.
>
> If necessary, (depending on how the data is stored), build a custom
> InputFormat,
> as per the API and set it for the job.
>
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html.
>
>
>
> --
>  Vijay
>
>
>
> ----- Original Message ----
> > From: maha <ma...@umail.ucsb.edu>
> > To: common-user <co...@hadoop.apache.org>
> > Sent: Sun, February 6, 2011 5:09:38 PM
> > Subject: Mapper reading from local directory or global variable?
> >
> > Hello,
> >
> >  I'm wondering which option is more efficient to store  "People's Names"
>  to
> > be processed by Mappers.
> >
> >
> > 1. Store it in a  global variable declared in the main class?
> >
> > 2. Store it in the HDFS to  be distributed and read in each map.
> >
> >
> >  Note that the number of  mappers until now is around 1000 mappers.
> Appreciate
> > any thought :)
> >
> > Thank  you,
> >
> > Maha
>