You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Geoffry Roberts <ge...@gmail.com> on 2011/05/03 19:21:50 UTC

Three Questions

All,

I have three questions I would appreciate if anyone could weigh in on.  I
apologise in advance if I sound whiny.

1.  The namenode logs, when I view them from a browser, are displayed with
the lines wrapped upon each other as if there were no new line characters
('\n') in the output.  I access these files using the dfshealth.jsp thing
that comes in the distribution.  Is this intentional? and Can it be fixed?
If I use a browser to look at any other log4j log file, I don't get this.

2.  In my own MR jobs, I place log statements.  The log level in
$HADOOP_HOME/conf/log4j.properties is set to INFO.  My log statements are
set to INFO, but I get nothing in the user logs, which are a bugger to read
(see question 1).  Am I missing something?

3.  I am attempting to use
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
      multipleOutputsObject.write(new Text("some text"), value,
"my_file_name_here");
I am not getting any output files with the names I specify.  Instead I get
the part* file names we all know and love so well.  I've looked at the
source code for MultipleOutputs and found nothing obvious, but since the
logging is not working ( see question 2) need I go on.  Is anyone else
having either trouble or success with multiple outputs using the
aforementioned class?

Thanks
-- 
Geoffry Roberts

Re: Three Questions

Posted by Geoffry Roberts <ge...@gmail.com>.

David and All,

Thanks again for helping.

I found my problem and I'll post it here.

I use eclipse as my IDE.  When I set up my reduce class, I of course
extended Reducer.  Then I used the eclipse Source / Override/Implement
Methods...  This brings up a dialog that lists the methods that are
available for stubbing in.  I clicked reduce and the reduce() method was
stubbed in thus:

@Override
protected void reduce(Text arg0, Iterable<Text> arg1,

// This parameter is the problem.
org.apache.hadoop.mapreduce.Reducer.Context arg2)
            throws IOException, InterruptedException {
// TODO Auto-generated method stub
super.reduce(arg0, arg1, arg2);
}

The correct parameter is Context not Reducer.Contex.
eclipse will red line its own mistake as a compile error, then recommend
removing the @Override annotation.

I did puzzle over this and realized it must be some kind of bug, but did not
catch on to the fact that I was using the wrong Context.

Hope this helps somebody.

On 3 May 2011 15:08, David Rosenstrauch <da...@darose.net> wrote:

> On 05/03/2011 05:49 PM, Geoffry Roberts wrote:
>
>> David,
>>
>> Thanks for the response.
>>
>> Last thing first:
>>
>> I am using org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
>>
>> which is differs from what your link points to
>> org.apache.hadoop.mapred.lib.MultipleOutputs. Using the class you propose,
>> requires me to use a number of other classes from the same package.  These
>> used to be deprecated, but apparently are not any more.
>>
>> Question: Does my package even work?  Must I use the other?
>>
>
> My apologies, I sent you the wrong link.  Should be this:
>
>
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#addNamedOutput%28org.apache.hadoop.mapreduce.Job,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class%29
>
> Point remains the same though:  you need to call
> MultipleOutputs.addNamedOutput() to configure the named output before you
> can start writing to it.
>
>
>  So far as the logging goes, I didn't quite follow your response.  You say
>> "You can find an individual map or reduce task's logs here:"  but there is
>> no link.
>>
>> I am familiar with the drill down that starts by clicking NameNode
>> Logs/userlogs/job*/attempt*r*/stdout.  Are you recommending something
>> different?
>>
>
> I am.  I wasn't recommending a specific link, but rather for you to click
> on links in one of your own M/R jobs in your Hadoop GUI.  I.e.:
>
> * go to http:<your job tracker>:50030
> * click on a job
>
> * click on (e.g.) the word "reduce" in the UI, which brings you to the "All
> Tasks" page
> ...
> etc.
>
>
>  btw,
>>
>> In my Reduce class, I have a System.out statement in the setup() method
>> that
>> works (i.e. I get output.), but similar statements in the reduce() method
>> yield nada.
>>
>
> System.out.println won't output to the job log like I'm describing above.
>  (Plus you'll have no control over the logging level.)  Using the logging
> framework is much preferred.
>
> HTH,
>
> DR
>



-- 
Geoffry Roberts

Re: Three Questions

Posted by David Rosenstrauch <da...@darose.net>.

On 05/03/2011 05:49 PM, Geoffry Roberts wrote:
> David,
>
> Thanks for the response.
>
> Last thing first:
>
> I am using org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
>
> which is differs from what your link points to
> org.apache.hadoop.mapred.lib.MultipleOutputs. Using the class you propose,
> requires me to use a number of other classes from the same package.  These
> used to be deprecated, but apparently are not any more.
>
> Question: Does my package even work?  Must I use the other?

My apologies, I sent you the wrong link.  Should be this:

http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#addNamedOutput%28org.apache.hadoop.mapreduce.Job,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class%29

Point remains the same though:  you need to call 
MultipleOutputs.addNamedOutput() to configure the named output before 
you can start writing to it.

> So far as the logging goes, I didn't quite follow your response.  You say
> "You can find an individual map or reduce task's logs here:"  but there is
> no link.
>
> I am familiar with the drill down that starts by clicking NameNode
> Logs/userlogs/job*/attempt*r*/stdout.  Are you recommending something
> different?

I am.  I wasn't recommending a specific link, but rather for you to 
click on links in one of your own M/R jobs in your Hadoop GUI.  I.e.:

* go to http:<your job tracker>:50030
* click on a job
* click on (e.g.) the word "reduce" in the UI, which brings you to the 
"All Tasks" page
...
etc.

> btw,
>
> In my Reduce class, I have a System.out statement in the setup() method that
> works (i.e. I get output.), but similar statements in the reduce() method
> yield nada.

System.out.println won't output to the job log like I'm describing 
above.  (Plus you'll have no control over the logging level.)  Using the 
logging framework is much preferred.

HTH,

DR

Re: Three Questions

Posted by Geoffry Roberts <ge...@gmail.com>.

David,

Thanks for the response.

Last thing first:

I am using org.apache.hadoop.mapreduce.lib.output.MultipleOutputs

which is differs from what your link points to
org.apache.hadoop.mapred.lib.MultipleOutputs. Using the class you propose,
requires me to use a number of other classes from the same package.  These
used to be deprecated, but apparently are not any more.

Question: Does my package even work?  Must I use the other?

So far as the logging goes, I didn't quite follow your response.  You say
"You can find an individual map or reduce task's logs here:"  but there is
no link.

I am familiar with the drill down that starts by clicking NameNode
Logs/userlogs/job*/attempt*r*/stdout.  Are you recommending something
different?

btw,

In my Reduce class, I have a System.out statement in the setup() method that
works (i.e. I get output.), but similar statements in the reduce() method
yield nada.

On 3 May 2011 13:39, David Rosenstrauch <da...@darose.net> wrote:

> On 05/03/2011 01:21 PM, Geoffry Roberts wrote:
>
>> All,
>>
>> I have three questions I would appreciate if anyone could weigh in on.  I
>> apologise in advance if I sound whiny.
>>
>> 1.  The namenode logs, when I view them from a browser, are displayed with
>> the lines wrapped upon each other as if there were no new line characters
>> ('\n') in the output.  I access these files using the dfshealth.jsp thing
>> that comes in the distribution.  Is this intentional? and Can it be fixed?
>> If I use a browser to look at any other log4j log file, I don't get this.
>>
>> 2.  In my own MR jobs, I place log statements.  The log level in
>> $HADOOP_HOME/conf/log4j.properties is set to INFO.  My log statements are
>> set to INFO, but I get nothing in the user logs, which are a bugger to
>> read
>> (see question 1).  Am I missing something?
>>
>
> You can find an individual map or reduce task's logs here:
>
> * click on (e.g.) the word "reduce" in the UI, which brings you to the "All
> Tasks" page
> * click on a given task ID (e.g., task_201105030249_0004_r_000000)
> * In the "Task logs" column, click on "All"
>
>
>  3.  I am attempting to use
>> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
>>       multipleOutputsObject.write(new Text("some text"), value,
>> "my_file_name_here");
>> I am not getting any output files with the names I specify.  Instead I get
>> the part* file names we all know and love so well.  I've looked at the
>> source code for MultipleOutputs and found nothing obvious, but since the
>> logging is not working ( see question 2) need I go on.  Is anyone else
>> having either trouble or success with multiple outputs using the
>> aforementioned class?
>>
>
> We use MultipleOutputs pretty heavily here, and it works fine.  You need to
> initialize each named output before you use it, by doing this:
>
>
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html#addNamedOutput%28org.apache.hadoop.mapred.JobConf,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class%29
>
> HTH,
>
> DR
>



-- 
Geoffry Roberts

Re: Three Questions

Posted by David Rosenstrauch <da...@darose.net>.

On 05/03/2011 01:21 PM, Geoffry Roberts wrote:
> All,
>
> I have three questions I would appreciate if anyone could weigh in on.  I
> apologise in advance if I sound whiny.
>
> 1.  The namenode logs, when I view them from a browser, are displayed with
> the lines wrapped upon each other as if there were no new line characters
> ('\n') in the output.  I access these files using the dfshealth.jsp thing
> that comes in the distribution.  Is this intentional? and Can it be fixed?
> If I use a browser to look at any other log4j log file, I don't get this.
>
> 2.  In my own MR jobs, I place log statements.  The log level in
> $HADOOP_HOME/conf/log4j.properties is set to INFO.  My log statements are
> set to INFO, but I get nothing in the user logs, which are a bugger to read
> (see question 1).  Am I missing something?

You can find an individual map or reduce task's logs here:

* click on (e.g.) the word "reduce" in the UI, which brings you to the 
"All Tasks" page
* click on a given task ID (e.g., task_201105030249_0004_r_000000)
* In the "Task logs" column, click on "All"

> 3.  I am attempting to use
> org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
>        multipleOutputsObject.write(new Text("some text"), value,
> "my_file_name_here");
> I am not getting any output files with the names I specify.  Instead I get
> the part* file names we all know and love so well.  I've looked at the
> source code for MultipleOutputs and found nothing obvious, but since the
> logging is not working ( see question 2) need I go on.  Is anyone else
> having either trouble or success with multiple outputs using the
> aforementioned class?

We use MultipleOutputs pretty heavily here, and it works fine.  You need 
to initialize each named output before you use it, by doing this:

http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html#addNamedOutput%28org.apache.hadoop.mapred.JobConf,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class%29

HTH,

DR