You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Public Network Services <pu...@gmail.com> on 2013/05/18 02:33:21 UTC

Passing values from InputFormat via the Configuration object

Hi...

I need to communicate some proprietary number (long) values from the
getSplits() method of a custom InputFormat class to the Hadoop driver class
(used to launch the job), but the JobContext object passed to the
getSplits() method has no access to a Counters object.

>From the source code, it seems that the Configuration object of the
launched job is passed around, so the JobContext object of getSplits() has
direct access to it via getConfiguration().

So, what about using a loop like

        Job job = ... // The launched job
        Configuration conf = job.getConfiguration();
        while (!job.isComplete()) {
        // Read the values from the configuration
        }

from the driver class, which presumably runs in the same framework that
creates the splits?

The getSplits() method of the custom InputFormat would set each of the
values once.

All this does seem like a hack, so I would like some expert advice before
starting implementation. That is,

   1. Will it work?
   2. Is there a better method?

Thanks!

Re: Passing values from InputFormat via the Configuration object

Posted by Public Network Services <pu...@gmail.com>.
I tested both options and the "hack" is the only one working. No counter
can be created in the custom InputFormat.

If anyone has any better alternative, please advise.


On Fri, May 17, 2013 at 5:53 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> A more standard approach could be converting the JobContext parameter of
> the getSplits() method, into a Job object, which then allows retrieving the
> counters, e.g.:
>
> public List<InputSplit> getSplits(JobContext job) throws IOException
>  {
> ...
> Job work = new Job(job.getConfiguration());
>  Counters counters = work.getCounters();
> }
>
> Would that be correct?
>
>
> On Fri, May 17, 2013 at 5:33 PM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I need to communicate some proprietary number (long) values from the
>> getSplits() method of a custom InputFormat class to the Hadoop driver class
>> (used to launch the job), but the JobContext object passed to the
>> getSplits() method has no access to a Counters object.
>>
>> From the source code, it seems that the Configuration object of the
>> launched job is passed around, so the JobContext object of getSplits() has
>> direct access to it via getConfiguration().
>>
>> So, what about using a loop like
>>
>>         Job job = ... // The launched job
>>         Configuration conf = job.getConfiguration();
>>         while (!job.isComplete()) {
>>         // Read the values from the configuration
>>         }
>>
>> from the driver class, which presumably runs in the same framework that
>> creates the splits?
>>
>> The getSplits() method of the custom InputFormat would set each of the
>> values once.
>>
>> All this does seem like a hack, so I would like some expert advice before
>> starting implementation. That is,
>>
>>    1. Will it work?
>>    2. Is there a better method?
>>
>> Thanks!
>>
>>
>

Re: Passing values from InputFormat via the Configuration object

Posted by Public Network Services <pu...@gmail.com>.
I tested both options and the "hack" is the only one working. No counter
can be created in the custom InputFormat.

If anyone has any better alternative, please advise.


On Fri, May 17, 2013 at 5:53 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> A more standard approach could be converting the JobContext parameter of
> the getSplits() method, into a Job object, which then allows retrieving the
> counters, e.g.:
>
> public List<InputSplit> getSplits(JobContext job) throws IOException
>  {
> ...
> Job work = new Job(job.getConfiguration());
>  Counters counters = work.getCounters();
> }
>
> Would that be correct?
>
>
> On Fri, May 17, 2013 at 5:33 PM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I need to communicate some proprietary number (long) values from the
>> getSplits() method of a custom InputFormat class to the Hadoop driver class
>> (used to launch the job), but the JobContext object passed to the
>> getSplits() method has no access to a Counters object.
>>
>> From the source code, it seems that the Configuration object of the
>> launched job is passed around, so the JobContext object of getSplits() has
>> direct access to it via getConfiguration().
>>
>> So, what about using a loop like
>>
>>         Job job = ... // The launched job
>>         Configuration conf = job.getConfiguration();
>>         while (!job.isComplete()) {
>>         // Read the values from the configuration
>>         }
>>
>> from the driver class, which presumably runs in the same framework that
>> creates the splits?
>>
>> The getSplits() method of the custom InputFormat would set each of the
>> values once.
>>
>> All this does seem like a hack, so I would like some expert advice before
>> starting implementation. That is,
>>
>>    1. Will it work?
>>    2. Is there a better method?
>>
>> Thanks!
>>
>>
>

Re: Passing values from InputFormat via the Configuration object

Posted by Public Network Services <pu...@gmail.com>.
I tested both options and the "hack" is the only one working. No counter
can be created in the custom InputFormat.

If anyone has any better alternative, please advise.


On Fri, May 17, 2013 at 5:53 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> A more standard approach could be converting the JobContext parameter of
> the getSplits() method, into a Job object, which then allows retrieving the
> counters, e.g.:
>
> public List<InputSplit> getSplits(JobContext job) throws IOException
>  {
> ...
> Job work = new Job(job.getConfiguration());
>  Counters counters = work.getCounters();
> }
>
> Would that be correct?
>
>
> On Fri, May 17, 2013 at 5:33 PM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I need to communicate some proprietary number (long) values from the
>> getSplits() method of a custom InputFormat class to the Hadoop driver class
>> (used to launch the job), but the JobContext object passed to the
>> getSplits() method has no access to a Counters object.
>>
>> From the source code, it seems that the Configuration object of the
>> launched job is passed around, so the JobContext object of getSplits() has
>> direct access to it via getConfiguration().
>>
>> So, what about using a loop like
>>
>>         Job job = ... // The launched job
>>         Configuration conf = job.getConfiguration();
>>         while (!job.isComplete()) {
>>         // Read the values from the configuration
>>         }
>>
>> from the driver class, which presumably runs in the same framework that
>> creates the splits?
>>
>> The getSplits() method of the custom InputFormat would set each of the
>> values once.
>>
>> All this does seem like a hack, so I would like some expert advice before
>> starting implementation. That is,
>>
>>    1. Will it work?
>>    2. Is there a better method?
>>
>> Thanks!
>>
>>
>

Re: Passing values from InputFormat via the Configuration object

Posted by Public Network Services <pu...@gmail.com>.
I tested both options and the "hack" is the only one working. No counter
can be created in the custom InputFormat.

If anyone has any better alternative, please advise.


On Fri, May 17, 2013 at 5:53 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> A more standard approach could be converting the JobContext parameter of
> the getSplits() method, into a Job object, which then allows retrieving the
> counters, e.g.:
>
> public List<InputSplit> getSplits(JobContext job) throws IOException
>  {
> ...
> Job work = new Job(job.getConfiguration());
>  Counters counters = work.getCounters();
> }
>
> Would that be correct?
>
>
> On Fri, May 17, 2013 at 5:33 PM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I need to communicate some proprietary number (long) values from the
>> getSplits() method of a custom InputFormat class to the Hadoop driver class
>> (used to launch the job), but the JobContext object passed to the
>> getSplits() method has no access to a Counters object.
>>
>> From the source code, it seems that the Configuration object of the
>> launched job is passed around, so the JobContext object of getSplits() has
>> direct access to it via getConfiguration().
>>
>> So, what about using a loop like
>>
>>         Job job = ... // The launched job
>>         Configuration conf = job.getConfiguration();
>>         while (!job.isComplete()) {
>>         // Read the values from the configuration
>>         }
>>
>> from the driver class, which presumably runs in the same framework that
>> creates the splits?
>>
>> The getSplits() method of the custom InputFormat would set each of the
>> values once.
>>
>> All this does seem like a hack, so I would like some expert advice before
>> starting implementation. That is,
>>
>>    1. Will it work?
>>    2. Is there a better method?
>>
>> Thanks!
>>
>>
>

Re: Passing values from InputFormat via the Configuration object

Posted by Public Network Services <pu...@gmail.com>.
A more standard approach could be converting the JobContext parameter of
the getSplits() method, into a Job object, which then allows retrieving the
counters, e.g.:

public List<InputSplit> getSplits(JobContext job) throws IOException
{
...
Job work = new Job(job.getConfiguration());
Counters counters = work.getCounters();
}

Would that be correct?


On Fri, May 17, 2013 at 5:33 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
>
> I need to communicate some proprietary number (long) values from the
> getSplits() method of a custom InputFormat class to the Hadoop driver class
> (used to launch the job), but the JobContext object passed to the
> getSplits() method has no access to a Counters object.
>
> From the source code, it seems that the Configuration object of the
> launched job is passed around, so the JobContext object of getSplits() has
> direct access to it via getConfiguration().
>
> So, what about using a loop like
>
>         Job job = ... // The launched job
>         Configuration conf = job.getConfiguration();
>         while (!job.isComplete()) {
>         // Read the values from the configuration
>         }
>
> from the driver class, which presumably runs in the same framework that
> creates the splits?
>
> The getSplits() method of the custom InputFormat would set each of the
> values once.
>
> All this does seem like a hack, so I would like some expert advice before
> starting implementation. That is,
>
>    1. Will it work?
>    2. Is there a better method?
>
> Thanks!
>
>

Re: Passing values from InputFormat via the Configuration object

Posted by Public Network Services <pu...@gmail.com>.
A more standard approach could be converting the JobContext parameter of
the getSplits() method, into a Job object, which then allows retrieving the
counters, e.g.:

public List<InputSplit> getSplits(JobContext job) throws IOException
{
...
Job work = new Job(job.getConfiguration());
Counters counters = work.getCounters();
}

Would that be correct?


On Fri, May 17, 2013 at 5:33 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
>
> I need to communicate some proprietary number (long) values from the
> getSplits() method of a custom InputFormat class to the Hadoop driver class
> (used to launch the job), but the JobContext object passed to the
> getSplits() method has no access to a Counters object.
>
> From the source code, it seems that the Configuration object of the
> launched job is passed around, so the JobContext object of getSplits() has
> direct access to it via getConfiguration().
>
> So, what about using a loop like
>
>         Job job = ... // The launched job
>         Configuration conf = job.getConfiguration();
>         while (!job.isComplete()) {
>         // Read the values from the configuration
>         }
>
> from the driver class, which presumably runs in the same framework that
> creates the splits?
>
> The getSplits() method of the custom InputFormat would set each of the
> values once.
>
> All this does seem like a hack, so I would like some expert advice before
> starting implementation. That is,
>
>    1. Will it work?
>    2. Is there a better method?
>
> Thanks!
>
>

Re: Passing values from InputFormat via the Configuration object

Posted by Public Network Services <pu...@gmail.com>.
A more standard approach could be converting the JobContext parameter of
the getSplits() method, into a Job object, which then allows retrieving the
counters, e.g.:

public List<InputSplit> getSplits(JobContext job) throws IOException
{
...
Job work = new Job(job.getConfiguration());
Counters counters = work.getCounters();
}

Would that be correct?


On Fri, May 17, 2013 at 5:33 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
>
> I need to communicate some proprietary number (long) values from the
> getSplits() method of a custom InputFormat class to the Hadoop driver class
> (used to launch the job), but the JobContext object passed to the
> getSplits() method has no access to a Counters object.
>
> From the source code, it seems that the Configuration object of the
> launched job is passed around, so the JobContext object of getSplits() has
> direct access to it via getConfiguration().
>
> So, what about using a loop like
>
>         Job job = ... // The launched job
>         Configuration conf = job.getConfiguration();
>         while (!job.isComplete()) {
>         // Read the values from the configuration
>         }
>
> from the driver class, which presumably runs in the same framework that
> creates the splits?
>
> The getSplits() method of the custom InputFormat would set each of the
> values once.
>
> All this does seem like a hack, so I would like some expert advice before
> starting implementation. That is,
>
>    1. Will it work?
>    2. Is there a better method?
>
> Thanks!
>
>

Re: Passing values from InputFormat via the Configuration object

Posted by Public Network Services <pu...@gmail.com>.
A more standard approach could be converting the JobContext parameter of
the getSplits() method, into a Job object, which then allows retrieving the
counters, e.g.:

public List<InputSplit> getSplits(JobContext job) throws IOException
{
...
Job work = new Job(job.getConfiguration());
Counters counters = work.getCounters();
}

Would that be correct?


On Fri, May 17, 2013 at 5:33 PM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
>
> I need to communicate some proprietary number (long) values from the
> getSplits() method of a custom InputFormat class to the Hadoop driver class
> (used to launch the job), but the JobContext object passed to the
> getSplits() method has no access to a Counters object.
>
> From the source code, it seems that the Configuration object of the
> launched job is passed around, so the JobContext object of getSplits() has
> direct access to it via getConfiguration().
>
> So, what about using a loop like
>
>         Job job = ... // The launched job
>         Configuration conf = job.getConfiguration();
>         while (!job.isComplete()) {
>         // Read the values from the configuration
>         }
>
> from the driver class, which presumably runs in the same framework that
> creates the splits?
>
> The getSplits() method of the custom InputFormat would set each of the
> values once.
>
> All this does seem like a hack, so I would like some expert advice before
> starting implementation. That is,
>
>    1. Will it work?
>    2. Is there a better method?
>
> Thanks!
>
>