You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by maha <ma...@umail.ucsb.edu> on 2011/02/07 20:21:26 UTC
Mappers reading from a Global inverted Index
Thanks Vijay, now my question is how can I build one inverted index and have it ready to be accessed by all Mappers ??
I had my main function initialize a global variable declared in the main class as:
public static Hashtable<String,String> hashtable = new Hashtable<String,String>(); ;
Yet, the mappers find it Null.
Any help is appreciated ,
Maha
Depending on the scale of data, between the two, it would be best stored in hdfs
, and use the built-in InputFormat-s , as that is more scalable.
If necessary, (depending on how the data is stored), build a custom InputFormat,
as per the API and set it for the job.
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html .
--
Vijay
----- Original Message ----
> From: maha <ma...@umail.ucsb.edu>
> To: common-user <co...@hadoop.apache.org>
> Sent: Sun, February 6, 2011 5:09:38 PM
> Subject: Mapper reading from local directory or global variable?
>
> Hello,
>
> I'm wondering which option is more efficient to store "People's Names" to
> be processed by Mappers.
>
>
> 1. Store it in a global variable declared in the main class?
>
> 2. Store it in the HDFS to be distributed and read in each map.
>
>
> Note that the number of mappers until now is around 1000 mappers. Appreciate
> any thought :)
>
> Thank you,
>
> Maha
Re: Mappers reading from a Global inverted Index
Posted by maha <ma...@umail.ucsb.edu>.
My question is simply how to have a global variable (eg. HashTable) in hadoop ?
To be available for all mappers. Please help,
Thank you,
Maha
On Feb 7, 2011, at 11:21 AM, maha wrote:
> Thanks Vijay, now my question is how can I build one inverted index and have it ready to be accessed by all Mappers ??
>
> I had my main function initialize a global variable declared in the main class as:
>
> public static Hashtable<String,String> hashtable = new Hashtable<String,String>(); ;
>
> Yet, the mappers find it Null.
>
> Any help is appreciated ,
>
>
> Maha
>
> Depending on the scale of data, between the two, it would be best stored in hdfs
> , and use the built-in InputFormat-s , as that is more scalable.
>
> If necessary, (depending on how the data is stored), build a custom InputFormat,
> as per the API and set it for the job.
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html .
>
>
>
> --
> Vijay
>
>
>
> ----- Original Message ----
>> From: maha <ma...@umail.ucsb.edu>
>> To: common-user <co...@hadoop.apache.org>
>> Sent: Sun, February 6, 2011 5:09:38 PM
>> Subject: Mapper reading from local directory or global variable?
>>
>> Hello,
>>
>> I'm wondering which option is more efficient to store "People's Names" to
>> be processed by Mappers.
>>
>>
>> 1. Store it in a global variable declared in the main class?
>>
>> 2. Store it in the HDFS to be distributed and read in each map.
>>
>>
>> Note that the number of mappers until now is around 1000 mappers. Appreciate
>> any thought :)
>>
>> Thank you,
>>
>> Maha
Re: Mappers reading from a Global inverted Index
Posted by maha <ma...@umail.ucsb.edu>.
Thanks Ted, I needed to know that there is no way I can make my program less IO-intensive.
Maha
On Feb 7, 2011, at 12:04 PM, Ted Dunning wrote:
> That isn't going to happen.
>
> Remember that all of the mappers are running in different JVM's on
> (typically) different machines. They can't see each other.
>
> If you want to collect data into one place, use a reducer.
>
> On Mon, Feb 7, 2011 at 11:21 AM, maha <ma...@umail.ucsb.edu> wrote:
>
>> Thanks Vijay, now my question is how can I build one inverted index and
>> have it ready to be accessed by all Mappers ??
>>
>> I had my main function initialize a global variable declared in the main
>> class as:
>>
>> public static Hashtable<String,String> hashtable = new
>> Hashtable<String,String>(); ;
>>
>> Yet, the mappers find it Null.
>>
>> Any help is appreciated ,
>>
>>
>> Maha
>>
>> Depending on the scale of data, between the two, it would be best stored in
>> hdfs
>> , and use the built-in InputFormat-s , as that is more scalable.
>>
>> If necessary, (depending on how the data is stored), build a custom
>> InputFormat,
>> as per the API and set it for the job.
>>
>> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html.
>>
>>
>>
>> --
>> Vijay
>>
>>
>>
>> ----- Original Message ----
>>> From: maha <ma...@umail.ucsb.edu>
>>> To: common-user <co...@hadoop.apache.org>
>>> Sent: Sun, February 6, 2011 5:09:38 PM
>>> Subject: Mapper reading from local directory or global variable?
>>>
>>> Hello,
>>>
>>> I'm wondering which option is more efficient to store "People's Names"
>> to
>>> be processed by Mappers.
>>>
>>>
>>> 1. Store it in a global variable declared in the main class?
>>>
>>> 2. Store it in the HDFS to be distributed and read in each map.
>>>
>>>
>>> Note that the number of mappers until now is around 1000 mappers.
>> Appreciate
>>> any thought :)
>>>
>>> Thank you,
>>>
>>> Maha
>>
Re: Mappers reading from a Global inverted Index
Posted by Ted Dunning <td...@maprtech.com>.
That isn't going to happen.
Remember that all of the mappers are running in different JVM's on
(typically) different machines. They can't see each other.
If you want to collect data into one place, use a reducer.
On Mon, Feb 7, 2011 at 11:21 AM, maha <ma...@umail.ucsb.edu> wrote:
> Thanks Vijay, now my question is how can I build one inverted index and
> have it ready to be accessed by all Mappers ??
>
> I had my main function initialize a global variable declared in the main
> class as:
>
> public static Hashtable<String,String> hashtable = new
> Hashtable<String,String>(); ;
>
> Yet, the mappers find it Null.
>
> Any help is appreciated ,
>
>
> Maha
>
> Depending on the scale of data, between the two, it would be best stored in
> hdfs
> , and use the built-in InputFormat-s , as that is more scalable.
>
> If necessary, (depending on how the data is stored), build a custom
> InputFormat,
> as per the API and set it for the job.
>
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html.
>
>
>
> --
> Vijay
>
>
>
> ----- Original Message ----
> > From: maha <ma...@umail.ucsb.edu>
> > To: common-user <co...@hadoop.apache.org>
> > Sent: Sun, February 6, 2011 5:09:38 PM
> > Subject: Mapper reading from local directory or global variable?
> >
> > Hello,
> >
> > I'm wondering which option is more efficient to store "People's Names"
> to
> > be processed by Mappers.
> >
> >
> > 1. Store it in a global variable declared in the main class?
> >
> > 2. Store it in the HDFS to be distributed and read in each map.
> >
> >
> > Note that the number of mappers until now is around 1000 mappers.
> Appreciate
> > any thought :)
> >
> > Thank you,
> >
> > Maha
>