You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tarandeep Singh <ta...@gmail.com> on 2008/04/30 13:19:33 UTC

JobConf: How to pass List/Map

Hi,

How can I set a list or map to JobConf that I can access in
Mapper/Reducer class ?
The get/setObject method from Configuration has been deprecated and
the documentation says -
"A side map of Configuration to Object should be used instead."
I could not follow this :(

Can someone please explain to me how to do this ?

Thanks,
Taran

Re: JobConf: How to pass List/Map

Posted by Enis Soztutar <en...@gmail.com>.
It is exactly what DefaultStringifier does, ugly but useful *smile*.

Jason Venner wrote:
> We have been serializing to a bytearrayoutput stream then base64 
> encoding the underlying byte array and passing that string in the conf.
> It is ugly but it works well until 0.17
>
> Enis Soztutar wrote:
>> Yes Stringifier was committed in 0.17. What you can do in 0.16 is to 
>> simulate DefaultStringifier. The key feature of the Stringifier is 
>> that it can convert/restore any object to string using base64 
>> encoding on the binary form of the object. If your objects can be 
>> easily converted to and from strings, then you can directly store 
>> them in conf. The other obvious alternative would be to switch to 
>> 0.17, once it is out.
>>
>> Tarandeep Singh wrote:
>>> On Wed, Apr 30, 2008 at 5:11 AM, Enis Soztutar 
>>> <en...@gmail.com> wrote:
>>>  
>>>> Hi,
>>>>
>>>>  There are many ways which you can pass objects using configuration.
>>>> Possibly the easiest way would be to use Stringifier interface.
>>>>
>>>>  you can for example :
>>>>
>>>>  DefaultStringifier.store(conf, variable ,"mykey");
>>>>
>>>>  variable = DefaultStringifier.load(conf, "mykey", variableClass );
>>>>     
>>>
>>> thanks... but I am using Hadoop-0.16 and Stringifier is a fix for 
>>> 0.17 version -
>>> https://issues.apache.org/jira/browse/HADOOP-3048
>>>
>>> Any thoughts on how to do this in 0.16 version ?
>>>
>>> thanks,
>>> Taran
>>>
>>>  
>>>>  you should take into account that the variable you pass to 
>>>> configuration
>>>> should be serializable by the framework. That means it must implement
>>>> Writable of Serializable interfaces. In your particular case, you 
>>>> might want
>>>> to look at ArrayWritable and MapWritable classes.
>>>>
>>>>  That said, you should however not pass large objects via 
>>>> configuration,
>>>> since it can seriously effect job overhead. If the data you want to 
>>>> pass is
>>>> large, then you should use other alternatives(such as 
>>>> DistributedCache,
>>>> HDFS, etc).
>>>>
>>>>
>>>>
>>>>  Tarandeep Singh wrote:
>>>>
>>>>   
>>>>> Hi,
>>>>>
>>>>> How can I set a list or map to JobConf that I can access in
>>>>> Mapper/Reducer class ?
>>>>> The get/setObject method from Configuration has been deprecated and
>>>>> the documentation says -
>>>>> "A side map of Configuration to Object should be used instead."
>>>>> I could not follow this :(
>>>>>
>>>>> Can someone please explain to me how to do this ?
>>>>>
>>>>> Thanks,
>>>>> Taran
>>>>>
>>>>>
>>>>>
>>>>>       
>>>
>>>   
>>

Re: JobConf: How to pass List/Map

Posted by Jason Venner <ja...@attributor.com>.
We have been serializing to a bytearrayoutput stream then base64 
encoding the underlying byte array and passing that string in the conf.
It is ugly but it works well until 0.17

Enis Soztutar wrote:
> Yes Stringifier was committed in 0.17. What you can do in 0.16 is to 
> simulate DefaultStringifier. The key feature of the Stringifier is 
> that it can convert/restore any object to string using base64 encoding 
> on the binary form of the object. If your objects can be easily 
> converted to and from strings, then you can directly store them in 
> conf. The other obvious alternative would be to switch to 0.17, once 
> it is out.
>
> Tarandeep Singh wrote:
>> On Wed, Apr 30, 2008 at 5:11 AM, Enis Soztutar 
>> <en...@gmail.com> wrote:
>>  
>>> Hi,
>>>
>>>  There are many ways which you can pass objects using configuration.
>>> Possibly the easiest way would be to use Stringifier interface.
>>>
>>>  you can for example :
>>>
>>>  DefaultStringifier.store(conf, variable ,"mykey");
>>>
>>>  variable = DefaultStringifier.load(conf, "mykey", variableClass );
>>>     
>>
>> thanks... but I am using Hadoop-0.16 and Stringifier is a fix for 
>> 0.17 version -
>> https://issues.apache.org/jira/browse/HADOOP-3048
>>
>> Any thoughts on how to do this in 0.16 version ?
>>
>> thanks,
>> Taran
>>
>>  
>>>  you should take into account that the variable you pass to 
>>> configuration
>>> should be serializable by the framework. That means it must implement
>>> Writable of Serializable interfaces. In your particular case, you 
>>> might want
>>> to look at ArrayWritable and MapWritable classes.
>>>
>>>  That said, you should however not pass large objects via 
>>> configuration,
>>> since it can seriously effect job overhead. If the data you want to 
>>> pass is
>>> large, then you should use other alternatives(such as DistributedCache,
>>> HDFS, etc).
>>>
>>>
>>>
>>>  Tarandeep Singh wrote:
>>>
>>>    
>>>> Hi,
>>>>
>>>> How can I set a list or map to JobConf that I can access in
>>>> Mapper/Reducer class ?
>>>> The get/setObject method from Configuration has been deprecated and
>>>> the documentation says -
>>>> "A side map of Configuration to Object should be used instead."
>>>> I could not follow this :(
>>>>
>>>> Can someone please explain to me how to do this ?
>>>>
>>>> Thanks,
>>>> Taran
>>>>
>>>>
>>>>
>>>>       
>>
>>   
>
-- 
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers and coding wizards, contact if 
interested

Re: JobConf: How to pass List/Map

Posted by Enis Soztutar <en...@gmail.com>.
Yes Stringifier was committed in 0.17. What you can do in 0.16 is to 
simulate DefaultStringifier. The key feature of the Stringifier is that 
it can convert/restore any object to string using base64 encoding on the 
binary form of the object. If your objects can be easily converted to 
and from strings, then you can directly store them in conf. The other 
obvious alternative would be to switch to 0.17, once it is out.

Tarandeep Singh wrote:
> On Wed, Apr 30, 2008 at 5:11 AM, Enis Soztutar <en...@gmail.com> wrote:
>   
>> Hi,
>>
>>  There are many ways which you can pass objects using configuration.
>> Possibly the easiest way would be to use Stringifier interface.
>>
>>  you can for example :
>>
>>  DefaultStringifier.store(conf, variable ,"mykey");
>>
>>  variable = DefaultStringifier.load(conf, "mykey", variableClass );
>>     
>
> thanks... but I am using Hadoop-0.16 and Stringifier is a fix for 0.17 version -
> https://issues.apache.org/jira/browse/HADOOP-3048
>
> Any thoughts on how to do this in 0.16 version ?
>
> thanks,
> Taran
>
>   
>>  you should take into account that the variable you pass to configuration
>> should be serializable by the framework. That means it must implement
>> Writable of Serializable interfaces. In your particular case, you might want
>> to look at ArrayWritable and MapWritable classes.
>>
>>  That said, you should however not pass large objects via configuration,
>> since it can seriously effect job overhead. If the data you want to pass is
>> large, then you should use other alternatives(such as DistributedCache,
>> HDFS, etc).
>>
>>
>>
>>  Tarandeep Singh wrote:
>>
>>     
>>> Hi,
>>>
>>> How can I set a list or map to JobConf that I can access in
>>> Mapper/Reducer class ?
>>> The get/setObject method from Configuration has been deprecated and
>>> the documentation says -
>>> "A side map of Configuration to Object should be used instead."
>>> I could not follow this :(
>>>
>>> Can someone please explain to me how to do this ?
>>>
>>> Thanks,
>>> Taran
>>>
>>>
>>>
>>>       
>
>   

Re: JobConf: How to pass List/Map

Posted by Tarandeep Singh <ta...@gmail.com>.
On Wed, Apr 30, 2008 at 5:11 AM, Enis Soztutar <en...@gmail.com> wrote:
> Hi,
>
>  There are many ways which you can pass objects using configuration.
> Possibly the easiest way would be to use Stringifier interface.
>
>  you can for example :
>
>  DefaultStringifier.store(conf, variable ,"mykey");
>
>  variable = DefaultStringifier.load(conf, "mykey", variableClass );

thanks... but I am using Hadoop-0.16 and Stringifier is a fix for 0.17 version -
https://issues.apache.org/jira/browse/HADOOP-3048

Any thoughts on how to do this in 0.16 version ?

thanks,
Taran

>
>  you should take into account that the variable you pass to configuration
> should be serializable by the framework. That means it must implement
> Writable of Serializable interfaces. In your particular case, you might want
> to look at ArrayWritable and MapWritable classes.
>
>  That said, you should however not pass large objects via configuration,
> since it can seriously effect job overhead. If the data you want to pass is
> large, then you should use other alternatives(such as DistributedCache,
> HDFS, etc).
>
>
>
>  Tarandeep Singh wrote:
>
> > Hi,
> >
> > How can I set a list or map to JobConf that I can access in
> > Mapper/Reducer class ?
> > The get/setObject method from Configuration has been deprecated and
> > the documentation says -
> > "A side map of Configuration to Object should be used instead."
> > I could not follow this :(
> >
> > Can someone please explain to me how to do this ?
> >
> > Thanks,
> > Taran
> >
> >
> >
>

Re: JobConf: How to pass List/Map

Posted by Enis Soztutar <en...@gmail.com>.
Hi,

There are many ways which you can pass objects using configuration. 
Possibly the easiest way would be to use Stringifier interface.

you can for example :

DefaultStringifier.store(conf, variable ,"mykey");

variable = DefaultStringifier.load(conf, "mykey", variableClass );

you should take into account that the variable you pass to configuration 
should be serializable by the framework. That means it must implement 
Writable of Serializable interfaces. In your particular case, you might 
want to look at ArrayWritable and MapWritable classes.

That said, you should however not pass large objects via configuration, 
since it can seriously effect job overhead. If the data you want to pass 
is large, then you should use other alternatives(such as 
DistributedCache, HDFS, etc).

Tarandeep Singh wrote:
> Hi,
>
> How can I set a list or map to JobConf that I can access in
> Mapper/Reducer class ?
> The get/setObject method from Configuration has been deprecated and
> the documentation says -
> "A side map of Configuration to Object should be used instead."
> I could not follow this :(
>
> Can someone please explain to me how to do this ?
>
> Thanks,
> Taran
>
>