You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Huy Pham <ph...@yahoo-inc.com> on 2013/07/25 20:46:54 UTC

Is it safe to have static methods in Hadoop Framework

Hi All,
   I am writing a class (called Parser) with a couple of static functions because I don't want millions of instances of this class to be created during the run.
   However, I realized that Hadoop will eventually produce parallel jobs, and if all jobs will call static functions of this Parser class, would that be safe?
   In other words, will all hadoop jobs share the same class Parser or will each of them have their own Parser? In the former case, if all jobs share the same class, then if I make the methods synchronized, then the jobs would need to wait until the locks to the functions are released, thus that would affect the performance. However, in later case, that would not cause any problem.
Can someone provide some insights?
Thanks
Huy


Re: Is it safe to have static methods in Hadoop Framework

Posted by Ashish Umrani <as...@gmail.com>.
Huy,

as I understand it, mapper or reducer is actually run in its own JVM.  So
if your class is required by mapper or reducer then one instance of it will
be created for every mapper or reducer.  Also it would mean that only 1
such instance would be created, because you have made those functions
static.

I believe, unless you have a threading logic in mapper or reducer, you do
not need to make it synchronous either.

Regards
ashish


On Thu, Jul 25, 2013 at 11:46 AM, Huy Pham <ph...@yahoo-inc.com> wrote:

>  Hi All,
>    I am writing a class (called Parser) with a couple of static functions
> because I don't want millions of instances of this class to be created
> during the run.
>    However, I realized that Hadoop will eventually produce parallel jobs,
> and if all jobs will call static functions of this Parser class, would that
> be safe?
>    In other words, will all hadoop jobs share the same class Parser or
> will each of them have their own Parser? In the former case, if all jobs
> share the same class, then if I make the methods synchronized, then the
> jobs would need to wait until the locks to the functions are released, thus
> that would affect the performance. However, in later case, that would not
> cause any problem.
> Can someone provide some insights?
> Thanks
> Huy
>
>

Re: Is it safe to have static methods in Hadoop Framework

Posted by Ashish Umrani <as...@gmail.com>.
Huy,

as I understand it, mapper or reducer is actually run in its own JVM.  So
if your class is required by mapper or reducer then one instance of it will
be created for every mapper or reducer.  Also it would mean that only 1
such instance would be created, because you have made those functions
static.

I believe, unless you have a threading logic in mapper or reducer, you do
not need to make it synchronous either.

Regards
ashish


On Thu, Jul 25, 2013 at 11:46 AM, Huy Pham <ph...@yahoo-inc.com> wrote:

>  Hi All,
>    I am writing a class (called Parser) with a couple of static functions
> because I don't want millions of instances of this class to be created
> during the run.
>    However, I realized that Hadoop will eventually produce parallel jobs,
> and if all jobs will call static functions of this Parser class, would that
> be safe?
>    In other words, will all hadoop jobs share the same class Parser or
> will each of them have their own Parser? In the former case, if all jobs
> share the same class, then if I make the methods synchronized, then the
> jobs would need to wait until the locks to the functions are released, thus
> that would affect the performance. However, in later case, that would not
> cause any problem.
> Can someone provide some insights?
> Thanks
> Huy
>
>

Re: Is it safe to have static methods in Hadoop Framework

Posted by Shahab Yunus <sh...@gmail.com>.
If each job (its child tasks) is running in its own JVM then this should
not be a problem.

Regards,
Shahab


On Thu, Jul 25, 2013 at 2:46 PM, Huy Pham <ph...@yahoo-inc.com> wrote:

> Hi All,
>    I am writing a class (called Parser) with a couple of static functions
> because I don't want millions of instances of this class to be created
> during the run.
>    However, I realized that Hadoop will eventually produce parallel jobs,
> and if all jobs will call static functions of this Parser class, would that
> be safe?
>    In other words, will all hadoop jobs share the same class Parser or
> will each of them have their own Parser? In the former case, if all jobs
> share the same class, then if I make the methods synchronized, then the
> jobs would need to wait until the locks to the functions are released, thus
> that would affect the performance. However, in later case, that would not
> cause any problem.
> Can someone provide some insights?
> Thanks
> Huy
>
>

Re: Is it safe to have static methods in Hadoop Framework

Posted by Ashish Umrani <as...@gmail.com>.
Huy,

as I understand it, mapper or reducer is actually run in its own JVM.  So
if your class is required by mapper or reducer then one instance of it will
be created for every mapper or reducer.  Also it would mean that only 1
such instance would be created, because you have made those functions
static.

I believe, unless you have a threading logic in mapper or reducer, you do
not need to make it synchronous either.

Regards
ashish


On Thu, Jul 25, 2013 at 11:46 AM, Huy Pham <ph...@yahoo-inc.com> wrote:

>  Hi All,
>    I am writing a class (called Parser) with a couple of static functions
> because I don't want millions of instances of this class to be created
> during the run.
>    However, I realized that Hadoop will eventually produce parallel jobs,
> and if all jobs will call static functions of this Parser class, would that
> be safe?
>    In other words, will all hadoop jobs share the same class Parser or
> will each of them have their own Parser? In the former case, if all jobs
> share the same class, then if I make the methods synchronized, then the
> jobs would need to wait until the locks to the functions are released, thus
> that would affect the performance. However, in later case, that would not
> cause any problem.
> Can someone provide some insights?
> Thanks
> Huy
>
>

Re: Is it safe to have static methods in Hadoop Framework

Posted by Ashish Umrani <as...@gmail.com>.
Huy,

as I understand it, mapper or reducer is actually run in its own JVM.  So
if your class is required by mapper or reducer then one instance of it will
be created for every mapper or reducer.  Also it would mean that only 1
such instance would be created, because you have made those functions
static.

I believe, unless you have a threading logic in mapper or reducer, you do
not need to make it synchronous either.

Regards
ashish


On Thu, Jul 25, 2013 at 11:46 AM, Huy Pham <ph...@yahoo-inc.com> wrote:

>  Hi All,
>    I am writing a class (called Parser) with a couple of static functions
> because I don't want millions of instances of this class to be created
> during the run.
>    However, I realized that Hadoop will eventually produce parallel jobs,
> and if all jobs will call static functions of this Parser class, would that
> be safe?
>    In other words, will all hadoop jobs share the same class Parser or
> will each of them have their own Parser? In the former case, if all jobs
> share the same class, then if I make the methods synchronized, then the
> jobs would need to wait until the locks to the functions are released, thus
> that would affect the performance. However, in later case, that would not
> cause any problem.
> Can someone provide some insights?
> Thanks
> Huy
>
>

Re: Is it safe to have static methods in Hadoop Framework

Posted by Serega Sheypak <se...@gmail.com>.
Hadoop forces to write pure functions, your code mustn't aware of any
state. It gets X and returns Y.
There should be no problem with static.


2013/7/25 Huy Pham <ph...@yahoo-inc.com>

> Hi All,
>    I am writing a class (called Parser) with a couple of static functions
> because I don't want millions of instances of this class to be created
> during the run.
>    However, I realized that Hadoop will eventually produce parallel jobs,
> and if all jobs will call static functions of this Parser class, would that
> be safe?
>    In other words, will all hadoop jobs share the same class Parser or
> will each of them have their own Parser? In the former case, if all jobs
> share the same class, then if I make the methods synchronized, then the
> jobs would need to wait until the locks to the functions are released, thus
> that would affect the performance. However, in later case, that would not
> cause any problem.
> Can someone provide some insights?
> Thanks
> Huy
>
>