You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Peter Cogan <pe...@gmail.com> on 2013/02/08 16:15:19 UTC

Passing data via Configuration

Hi,

I have data stored in an object that I want to pass into my Mapper.

I see from Configuration that there are setters and getters for primitives,
but is there a way of doing this with non-primitives - either my own
classes or builtin classes (such as HashMap etc)

thanks!
Peter

Re: Passing data via Configuration

Posted by Peter Cogan <pe...@gmail.com>.
Hi Rob,

thanks for the explanation - I had also thought about 'cheating' by
serialising - I guess that's the way to go in my case as the data structure
is really quite small.

thanks!


On Fri, Feb 8, 2013 at 3:23 PM, Robert Evans <ev...@yahoo-inc.com> wrote:

> You could, but this is generally discouraged.  Pig does something like
> this by taking the object serializing it out into a byte array and then
> using base64 encoding turns it into a string that is put in the config.
>  The problem with this is that the config can grow very large.  In the 1.0
> line of Hadoop the maximum size of the Job's config is limited to avoid
> causing the Job Tracker to go out of memory.  In V2 this is less of a
> concern because it is your own application master that has to read it all
> in.
>
> In general if it is a very small amount of data you can play games like
> this, if it is a large amount of data you probably want to use the
> distributed cache to do this instead.
>
> --Bobby
>
> From: Peter Cogan <pe...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, February 8, 2013 9:15 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Passing data via Configuration
>
> Hi,
>
> I have data stored in an object that I want to pass into my Mapper.
>
> I see from Configuration that there are setters and getters for
> primitives, but is there a way of doing this with non-primitives - either
> my own classes or builtin classes (such as HashMap etc)
>
> thanks!
> Peter
>

Re: Passing data via Configuration

Posted by Peter Cogan <pe...@gmail.com>.
Hi Rob,

thanks for the explanation - I had also thought about 'cheating' by
serialising - I guess that's the way to go in my case as the data structure
is really quite small.

thanks!


On Fri, Feb 8, 2013 at 3:23 PM, Robert Evans <ev...@yahoo-inc.com> wrote:

> You could, but this is generally discouraged.  Pig does something like
> this by taking the object serializing it out into a byte array and then
> using base64 encoding turns it into a string that is put in the config.
>  The problem with this is that the config can grow very large.  In the 1.0
> line of Hadoop the maximum size of the Job's config is limited to avoid
> causing the Job Tracker to go out of memory.  In V2 this is less of a
> concern because it is your own application master that has to read it all
> in.
>
> In general if it is a very small amount of data you can play games like
> this, if it is a large amount of data you probably want to use the
> distributed cache to do this instead.
>
> --Bobby
>
> From: Peter Cogan <pe...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, February 8, 2013 9:15 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Passing data via Configuration
>
> Hi,
>
> I have data stored in an object that I want to pass into my Mapper.
>
> I see from Configuration that there are setters and getters for
> primitives, but is there a way of doing this with non-primitives - either
> my own classes or builtin classes (such as HashMap etc)
>
> thanks!
> Peter
>

Re: Passing data via Configuration

Posted by Peter Cogan <pe...@gmail.com>.
Hi Rob,

thanks for the explanation - I had also thought about 'cheating' by
serialising - I guess that's the way to go in my case as the data structure
is really quite small.

thanks!


On Fri, Feb 8, 2013 at 3:23 PM, Robert Evans <ev...@yahoo-inc.com> wrote:

> You could, but this is generally discouraged.  Pig does something like
> this by taking the object serializing it out into a byte array and then
> using base64 encoding turns it into a string that is put in the config.
>  The problem with this is that the config can grow very large.  In the 1.0
> line of Hadoop the maximum size of the Job's config is limited to avoid
> causing the Job Tracker to go out of memory.  In V2 this is less of a
> concern because it is your own application master that has to read it all
> in.
>
> In general if it is a very small amount of data you can play games like
> this, if it is a large amount of data you probably want to use the
> distributed cache to do this instead.
>
> --Bobby
>
> From: Peter Cogan <pe...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, February 8, 2013 9:15 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Passing data via Configuration
>
> Hi,
>
> I have data stored in an object that I want to pass into my Mapper.
>
> I see from Configuration that there are setters and getters for
> primitives, but is there a way of doing this with non-primitives - either
> my own classes or builtin classes (such as HashMap etc)
>
> thanks!
> Peter
>

Re: Passing data via Configuration

Posted by Peter Cogan <pe...@gmail.com>.
Hi Rob,

thanks for the explanation - I had also thought about 'cheating' by
serialising - I guess that's the way to go in my case as the data structure
is really quite small.

thanks!


On Fri, Feb 8, 2013 at 3:23 PM, Robert Evans <ev...@yahoo-inc.com> wrote:

> You could, but this is generally discouraged.  Pig does something like
> this by taking the object serializing it out into a byte array and then
> using base64 encoding turns it into a string that is put in the config.
>  The problem with this is that the config can grow very large.  In the 1.0
> line of Hadoop the maximum size of the Job's config is limited to avoid
> causing the Job Tracker to go out of memory.  In V2 this is less of a
> concern because it is your own application master that has to read it all
> in.
>
> In general if it is a very small amount of data you can play games like
> this, if it is a large amount of data you probably want to use the
> distributed cache to do this instead.
>
> --Bobby
>
> From: Peter Cogan <pe...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, February 8, 2013 9:15 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Passing data via Configuration
>
> Hi,
>
> I have data stored in an object that I want to pass into my Mapper.
>
> I see from Configuration that there are setters and getters for
> primitives, but is there a way of doing this with non-primitives - either
> my own classes or builtin classes (such as HashMap etc)
>
> thanks!
> Peter
>

Re: Passing data via Configuration

Posted by Robert Evans <ev...@yahoo-inc.com>.
You could, but this is generally discouraged.  Pig does something like this by taking the object serializing it out into a byte array and then using base64 encoding turns it into a string that is put in the config.  The problem with this is that the config can grow very large.  In the 1.0 line of Hadoop the maximum size of the Job's config is limited to avoid causing the Job Tracker to go out of memory.  In V2 this is less of a concern because it is your own application master that has to read it all in.

In general if it is a very small amount of data you can play games like this, if it is a large amount of data you probably want to use the distributed cache to do this instead.

--Bobby

From: Peter Cogan <pe...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 8, 2013 9:15 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing data via Configuration

Hi,

I have data stored in an object that I want to pass into my Mapper.

I see from Configuration that there are setters and getters for primitives, but is there a way of doing this with non-primitives - either my own classes or builtin classes (such as HashMap etc)

thanks!
Peter

Re: Passing data via Configuration

Posted by Robert Evans <ev...@yahoo-inc.com>.
You could, but this is generally discouraged.  Pig does something like this by taking the object serializing it out into a byte array and then using base64 encoding turns it into a string that is put in the config.  The problem with this is that the config can grow very large.  In the 1.0 line of Hadoop the maximum size of the Job's config is limited to avoid causing the Job Tracker to go out of memory.  In V2 this is less of a concern because it is your own application master that has to read it all in.

In general if it is a very small amount of data you can play games like this, if it is a large amount of data you probably want to use the distributed cache to do this instead.

--Bobby

From: Peter Cogan <pe...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 8, 2013 9:15 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing data via Configuration

Hi,

I have data stored in an object that I want to pass into my Mapper.

I see from Configuration that there are setters and getters for primitives, but is there a way of doing this with non-primitives - either my own classes or builtin classes (such as HashMap etc)

thanks!
Peter

Re: Passing data via Configuration

Posted by Robert Evans <ev...@yahoo-inc.com>.
You could, but this is generally discouraged.  Pig does something like this by taking the object serializing it out into a byte array and then using base64 encoding turns it into a string that is put in the config.  The problem with this is that the config can grow very large.  In the 1.0 line of Hadoop the maximum size of the Job's config is limited to avoid causing the Job Tracker to go out of memory.  In V2 this is less of a concern because it is your own application master that has to read it all in.

In general if it is a very small amount of data you can play games like this, if it is a large amount of data you probably want to use the distributed cache to do this instead.

--Bobby

From: Peter Cogan <pe...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 8, 2013 9:15 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing data via Configuration

Hi,

I have data stored in an object that I want to pass into my Mapper.

I see from Configuration that there are setters and getters for primitives, but is there a way of doing this with non-primitives - either my own classes or builtin classes (such as HashMap etc)

thanks!
Peter

Re: Passing data via Configuration

Posted by Robert Evans <ev...@yahoo-inc.com>.
You could, but this is generally discouraged.  Pig does something like this by taking the object serializing it out into a byte array and then using base64 encoding turns it into a string that is put in the config.  The problem with this is that the config can grow very large.  In the 1.0 line of Hadoop the maximum size of the Job's config is limited to avoid causing the Job Tracker to go out of memory.  In V2 this is less of a concern because it is your own application master that has to read it all in.

In general if it is a very small amount of data you can play games like this, if it is a large amount of data you probably want to use the distributed cache to do this instead.

--Bobby

From: Peter Cogan <pe...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 8, 2013 9:15 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Passing data via Configuration

Hi,

I have data stored in an object that I want to pass into my Mapper.

I see from Configuration that there are setters and getters for primitives, but is there a way of doing this with non-primitives - either my own classes or builtin classes (such as HashMap etc)

thanks!
Peter