You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Rita Liu <cr...@gmail.com> on 2010/08/15 05:37:15 UTC

Hadoop basics

Hi!

I am a total beginner, but I am very interested in hadoop. I've already
downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I want
to do two things:

1. Explore how hadoop works internally with one of the example applications
hadoop provides
2. Write an application on my own

Those two things bring me following questions:

a. debugger?
I am stuck since I don't know how to "explore" hadoop. I used to trace
through the code using a debugger, but in this case, I don't know if there
is a good debugger to use; or -- maybe a debugger is not necessary for
hadoop? If not, then how do you trace through the code to either debug or
just gain an understanding about the system? May I know what you,
experienced experts, do? :)

b. Where to run hadoop?
Also -- may I know where you run your hadoop? Do you run on linux, or on VM
-- in particular, Cloudera? I heard that Cloudera is good for writing
mapreduce applications with hadoop itself as a blackbox; is it true? If my
ultimate goal is to understand how hadoop works internally, would it be
better if I directly run it on linux?

c. Single-node or multi-node?
In the beginning (just like my case :p) would it be better to use
single-node or multi-node? If the latter is true, should I obtain more
machines, or should I use more virtual machines to create more nodes?

As a newbie, I am sorry for all those basic (and silly, I know :$)
questions. If possible, please help me out? Any suggestion or advice will be
greatly appreciated. Thank you very much!

Best,
Rita :)

P.S. If my questions are not suitable for this mailing-list, please let me
apologize, and then, could you please direct me to other mailing-lists?
Sorry, and thanks a lot! :)

Re: Hadoop basics

Posted by Rita Liu <cr...@gmail.com>.

Thanks so much! I did do that, and now I want to move to the next step. If
possible, may I know more to move on? Thank you very much!
-Rita :)

On Sat, Aug 14, 2010 at 8:49 PM, smith jack <th...@gmail.com> wrote:

> There is still a long way to go before you start debuging hadoop,
> the first point is that, you should run hadoop first!
> how to make this done.
> simple run bin/start-all, then hadoop can response to your request.
> you can use FsShell when you begin.
> hope you will enjoy it
>
> 2010/8/15 Rita Liu <cr...@gmail.com>:
> > Hi!
> >
> > I am a total beginner, but I am very interested in hadoop. I've already
> > downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
> want
> > to do two things:
> >
> > 1. Explore how hadoop works internally with one of the example
> applications
> > hadoop provides
> > 2. Write an application on my own
> >
> > Those two things bring me following questions:
> >
> > a. debugger?
> > I am stuck since I don't know how to "explore" hadoop. I used to trace
> > through the code using a debugger, but in this case, I don't know if
> there
> > is a good debugger to use; or -- maybe a debugger is not necessary for
> > hadoop? If not, then how do you trace through the code to either debug or
> > just gain an understanding about the system? May I know what you,
> > experienced experts, do? :)
> >
> > b. Where to run hadoop?
> > Also -- may I know where you run your hadoop? Do you run on linux, or on
> VM
> > -- in particular, Cloudera? I heard that Cloudera is good for writing
> > mapreduce applications with hadoop itself as a blackbox; is it true? If
> my
> > ultimate goal is to understand how hadoop works internally, would it be
> > better if I directly run it on linux?
> >
> > c. Single-node or multi-node?
> > In the beginning (just like my case :p) would it be better to use
> > single-node or multi-node? If the latter is true, should I obtain more
> > machines, or should I use more virtual machines to create more nodes?
> >
> > As a newbie, I am sorry for all those basic (and silly, I know :$)
> > questions. If possible, please help me out? Any suggestion or advice will
> be
> > greatly appreciated. Thank you very much!
> >
> > Best,
> > Rita :)
> >
> > P.S. If my questions are not suitable for this mailing-list, please let
> me
> > apologize, and then, could you please direct me to other mailing-lists?
> > Sorry, and thanks a lot! :)
> >
>

Re: Hadoop basics

Posted by smith jack <th...@gmail.com>.

There is still a long way to go before you start debuging hadoop,
the first point is that, you should run hadoop first!
how to make this done.
simple run bin/start-all, then hadoop can response to your request.
you can use FsShell when you begin.
hope you will enjoy it

2010/8/15 Rita Liu <cr...@gmail.com>:
> Hi!
>
> I am a total beginner, but I am very interested in hadoop. I've already
> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I want
> to do two things:
>
> 1. Explore how hadoop works internally with one of the example applications
> hadoop provides
> 2. Write an application on my own
>
> Those two things bring me following questions:
>
> a. debugger?
> I am stuck since I don't know how to "explore" hadoop. I used to trace
> through the code using a debugger, but in this case, I don't know if there
> is a good debugger to use; or -- maybe a debugger is not necessary for
> hadoop? If not, then how do you trace through the code to either debug or
> just gain an understanding about the system? May I know what you,
> experienced experts, do? :)
>
> b. Where to run hadoop?
> Also -- may I know where you run your hadoop? Do you run on linux, or on VM
> -- in particular, Cloudera? I heard that Cloudera is good for writing
> mapreduce applications with hadoop itself as a blackbox; is it true? If my
> ultimate goal is to understand how hadoop works internally, would it be
> better if I directly run it on linux?
>
> c. Single-node or multi-node?
> In the beginning (just like my case :p) would it be better to use
> single-node or multi-node? If the latter is true, should I obtain more
> machines, or should I use more virtual machines to create more nodes?
>
> As a newbie, I am sorry for all those basic (and silly, I know :$)
> questions. If possible, please help me out? Any suggestion or advice will be
> greatly appreciated. Thank you very much!
>
> Best,
> Rita :)
>
> P.S. If my questions are not suitable for this mailing-list, please let me
> apologize, and then, could you please direct me to other mailing-lists?
> Sorry, and thanks a lot! :)
>

Re: Hadoop basics

Posted by Rita Liu <cr...@gmail.com>.

Hi Piyush and Amit:

Thanks so much for your kind suggestions!! I am trying log4j now :))
Here is a beginner-level question -- where should I put the loggers?

Say I start with a MapReduce application (say, WordCount.java), and I
want to trace the code so that I could know which methods (from which
classes) have been called and what have been done while they are being
called, before hadoop finishes executing the application. In order to
write log files to record those information, I have to know where
(i.e. in which files) to put my loggers. However, without knowing
which methods (from which classes) are called, how do I know where to
put the loggers? If I just put my logger inside the main method of
WordCount.java, it probably doesn't make too much sense ...

Is there any way to trace the call stack so that I would know where to
put my loggers (with log4j)? Or: there might be a smart way for me to
create a logger so that I would get what I need?

Also -- although I know debugging in a distributed system could be a
pain, I wonder if I could just load the whole hadoop project into
Eclipse, say, and trace the code locally without actually running the
application on the cluster. There are many libs and consequent
dependencies in hadoop -- how may I load hadoop so that I could
locally trace it?

If possible, please help me out ... Piyush, Amit, and all the experts
here? Thank you very much!

Best,
Rita :))


On Sun, Aug 15, 2010 at 11:27 PM, amit kumar verma <v....@verchaska.com> wrote:
>
>  Hi Rita,
>
> If you reached a place where you need to use api like hahoop, forget about the debugging the code. Your code must be syntactically and logically error free, for rest of the things logging is enough. Try log4j only.
>
> Thanks,
> Amit Kumar Verma
> Verchaska Infotech Pvt. Ltd.
>
>
>
> On 08/15/2010 11:10 AM, Rita Liu wrote:
>>
>> Hi Harsh and Piyush! Thank you very much. So it seems like it would be best
>> if I use log4j to trace, and debugging with a debugger is still possible if
>> I set "mapred.job.tracker" to be "local" and "fs.default.name" to be
>> "local", in hadoop-site.xml. Plus: in hadoop-env.sh, I should specify
>> HADOOP_OPTS to be:
>>
>> "-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000" (why
>> 8000? also, what does "-agentlib:jdwp=transport=dt_socket" mean?)
>>
>> ... in order to use a debugger. Is my understanding correct? :)
>>
>> If so -- then which debugger do you use? May I know? Thanks a lot! I am also
>> going to try log4j now!
>>
>> Many thanks,
>> -Rita :))
>>
>> On Sat, Aug 14, 2010 at 10:22 PM, Piyush Garg<pi...@gmail.com>wrote:
>>
>>> Hi Smith,
>>>
>>> step debugging also works in hadoop as with other java applications.
>>> export
>>>
>>> HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
>>> 'suspend=y' is to let the jvm suspend until the remote debugger is
>>> attached.
>>>
>>> Thanks and Regards
>>> Piyush Garg
>>>
>>>
>>> On Sunday 15 August 2010 10:39 AM, smith jack wrote:
>>>>
>>>> that means you can only trace by log,
>>>> and not possible to debug hadoop using step debug, haha
>>>> distributed system always introduce extra complexity and confusing
>>>
>>> issues.
>>>>
>>>> 2010/8/15 Piyush Garg<pi...@gmail.com>:
>>>>
>>>>> Hi Rita,
>>>>>
>>>>> You can put log4j logger debug statements in the code. log4j library is
>>>>> part of hadoop framework and there is already a log4j.properties file in
>>>>> hadoop conf directory and all the output logs are saved in hadoop logs
>>>>> directory.
>>>>>
>>>>> Thanks and Regards
>>>>> Piyush Garg
>>>>>
>>>>>
>>>>> On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
>>>>>
>>>>>> Thank you very much, Piyush! :) May I know more about how to use
>>>
>>> "traces"?
>>>>>>
>>>>>> And -- yes, please teach me if possible, experts! :)
>>>>>>
>>>>>> Thanks a lot,
>>>>>> -Rita :))
>>>>>>
>>>>>> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg<pi...@gmail.com>
>>>
>>> wrote:
>>>>>>
>>>>>>
>>>>>>> Hi Rita,
>>>>>>>
>>>>>>> I have just started to learn hadoop as well, I know there is a long
>>>
>>> way
>>>>>>>
>>>>>>> to go.
>>>>>>> I found some useful links which I am sharing with you.
>>>>>>>
>>>>>>> Hadoop Tutorial - YDN
>>>>>>> <http://developer.yahoo.com/hadoop/tutorial/index.html>  excellent
>>>>>>> beginners tutorial and well organized.
>>>>>>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
>>>>>>> <
>>>>>>>
>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>>>>>>>
>>>>>>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>>>>>>> <
>>>>>>>
>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>>>>
>>>>>>> The tutorial on the hadoop wiki
>>>>>>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html>
>>>
>>> is
>>>>>>>
>>>>>>> too much for a beginner.
>>>>>>>
>>>>>>> Debugger:
>>>>>>> I do not think you can easily do debugging using remote debugger. This
>>>>>>> is natural since hadoop is not sequential programming, it would be
>>>
>>> very
>>>>>>>
>>>>>>> difficult to debug its apps.
>>>>>>> The only way to debug is to use traces.
>>>>>>>
>>>>>>> I think you can learn how to setup multi-node cluster, but for
>>>
>>> practice
>>>>>>>
>>>>>>> session you can use single node setup.
>>>>>>>
>>>>>>> Lets see what the experts say.
>>>>>>>
>>>>>>> Thanks and Regards
>>>>>>> Piyush Garg
>>>>>>>
>>>>>>>
>>>>>>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I am a total beginner, but I am very interested in hadoop. I've
>>>
>>> already
>>>>>>>>
>>>>>>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
>>>>>>>>
>>>>>>>>
>>>>>>> want
>>>>>>>
>>>>>>>
>>>>>>>> to do two things:
>>>>>>>>
>>>>>>>> 1. Explore how hadoop works internally with one of the example
>>>>>>>>
>>>>>>>>
>>>>>>> applications
>>>>>>>
>>>>>>>
>>>>>>>> hadoop provides
>>>>>>>> 2. Write an application on my own
>>>>>>>>
>>>>>>>> Those two things bring me following questions:
>>>>>>>>
>>>>>>>> a. debugger?
>>>>>>>> I am stuck since I don't know how to "explore" hadoop. I used to
>>>
>>> trace
>>>>>>>>
>>>>>>>> through the code using a debugger, but in this case, I don't know if
>>>>>>>>
>>>>>>>>
>>>>>>> there
>>>>>>>
>>>>>>>
>>>>>>>> is a good debugger to use; or -- maybe a debugger is not necessary
>>>
>>> for
>>>>>>>>
>>>>>>>> hadoop? If not, then how do you trace through the code to either
>>>
>>> debug or
>>>>>>>>
>>>>>>>> just gain an understanding about the system? May I know what you,
>>>>>>>> experienced experts, do? :)
>>>>>>>>
>>>>>>>> b. Where to run hadoop?
>>>>>>>> Also -- may I know where you run your hadoop? Do you run on linux, or
>>>
>>> on
>>>>>>>>
>>>>>>> VM
>>>>>>>
>>>>>>>
>>>>>>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
>>>>>>>> mapreduce applications with hadoop itself as a blackbox; is it true?
>>>
>>> If
>>>>>>>>
>>>>>>> my
>>>>>>>
>>>>>>>
>>>>>>>> ultimate goal is to understand how hadoop works internally, would it
>>>
>>> be
>>>>>>>>
>>>>>>>> better if I directly run it on linux?
>>>>>>>>
>>>>>>>> c. Single-node or multi-node?
>>>>>>>> In the beginning (just like my case :p) would it be better to use
>>>>>>>> single-node or multi-node? If the latter is true, should I obtain
>>>
>>> more
>>>>>>>>
>>>>>>>> machines, or should I use more virtual machines to create more nodes?
>>>>>>>>
>>>>>>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
>>>>>>>> questions. If possible, please help me out? Any suggestion or advice
>>>
>>> will
>>>>>>>>
>>>>>>> be
>>>>>>>
>>>>>>>
>>>>>>>> greatly appreciated. Thank you very much!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Rita :)
>>>>>>>>
>>>>>>>> P.S. If my questions are not suitable for this mailing-list, please
>>>
>>> let
>>>>>>>>
>>>>>>> me
>>>>>>>
>>>>>>>
>>>>>>>> apologize, and then, could you please direct me to other
>>>
>>> mailing-lists?
>>>>>>>>
>>>>>>>> Sorry, and thanks a lot! :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: Hadoop basics

Posted by amit kumar verma <v....@verchaska.com>.

  Hi Rita,

If you reached a place where you need to use api like hahoop, forget 
about the debugging the code. Your code must be syntactically and 
logically error free, for rest of the things logging is enough. Try 
log4j only.

Thanks,
Amit Kumar Verma
Verchaska Infotech Pvt. Ltd.



On 08/15/2010 11:10 AM, Rita Liu wrote:
> Hi Harsh and Piyush! Thank you very much. So it seems like it would be best
> if I use log4j to trace, and debugging with a debugger is still possible if
> I set "mapred.job.tracker" to be "local" and "fs.default.name" to be
> "local", in hadoop-site.xml. Plus: in hadoop-env.sh, I should specify
> HADOOP_OPTS to be:
>
> "-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000" (why
> 8000? also, what does "-agentlib:jdwp=transport=dt_socket" mean?)
>
> ... in order to use a debugger. Is my understanding correct? :)
>
> If so -- then which debugger do you use? May I know? Thanks a lot! I am also
> going to try log4j now!
>
> Many thanks,
> -Rita :))
>
> On Sat, Aug 14, 2010 at 10:22 PM, Piyush Garg<pi...@gmail.com>wrote:
>
>> Hi Smith,
>>
>> step debugging also works in hadoop as with other java applications.
>> export
>>
>> HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
>> 'suspend=y' is to let the jvm suspend until the remote debugger is
>> attached.
>>
>> Thanks and Regards
>> Piyush Garg
>>
>>
>> On Sunday 15 August 2010 10:39 AM, smith jack wrote:
>>> that means you can only trace by log,
>>> and not possible to debug hadoop using step debug, haha
>>> distributed system always introduce extra complexity and confusing
>> issues.
>>> 2010/8/15 Piyush Garg<pi...@gmail.com>:
>>>
>>>> Hi Rita,
>>>>
>>>> You can put log4j logger debug statements in the code. log4j library is
>>>> part of hadoop framework and there is already a log4j.properties file in
>>>> hadoop conf directory and all the output logs are saved in hadoop logs
>>>> directory.
>>>>
>>>> Thanks and Regards
>>>> Piyush Garg
>>>>
>>>>
>>>> On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
>>>>
>>>>> Thank you very much, Piyush! :) May I know more about how to use
>> "traces"?
>>>>> And -- yes, please teach me if possible, experts! :)
>>>>>
>>>>> Thanks a lot,
>>>>> -Rita :))
>>>>>
>>>>> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg<pi...@gmail.com>
>> wrote:
>>>>>
>>>>>
>>>>>> Hi Rita,
>>>>>>
>>>>>> I have just started to learn hadoop as well, I know there is a long
>> way
>>>>>> to go.
>>>>>> I found some useful links which I am sharing with you.
>>>>>>
>>>>>> Hadoop Tutorial - YDN
>>>>>> <http://developer.yahoo.com/hadoop/tutorial/index.html>  excellent
>>>>>> beginners tutorial and well organized.
>>>>>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
>>>>>> <
>>>>>>
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>>>>>>
>>>>>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>>>>>> <
>>>>>>
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>>>
>>>>>> The tutorial on the hadoop wiki
>>>>>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html>
>> is
>>>>>> too much for a beginner.
>>>>>>
>>>>>> Debugger:
>>>>>> I do not think you can easily do debugging using remote debugger. This
>>>>>> is natural since hadoop is not sequential programming, it would be
>> very
>>>>>> difficult to debug its apps.
>>>>>> The only way to debug is to use traces.
>>>>>>
>>>>>> I think you can learn how to setup multi-node cluster, but for
>> practice
>>>>>> session you can use single node setup.
>>>>>>
>>>>>> Lets see what the experts say.
>>>>>>
>>>>>> Thanks and Regards
>>>>>> Piyush Garg
>>>>>>
>>>>>>
>>>>>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>>>>>>
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I am a total beginner, but I am very interested in hadoop. I've
>> already
>>>>>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
>>>>>>>
>>>>>>>
>>>>>> want
>>>>>>
>>>>>>
>>>>>>> to do two things:
>>>>>>>
>>>>>>> 1. Explore how hadoop works internally with one of the example
>>>>>>>
>>>>>>>
>>>>>> applications
>>>>>>
>>>>>>
>>>>>>> hadoop provides
>>>>>>> 2. Write an application on my own
>>>>>>>
>>>>>>> Those two things bring me following questions:
>>>>>>>
>>>>>>> a. debugger?
>>>>>>> I am stuck since I don't know how to "explore" hadoop. I used to
>> trace
>>>>>>> through the code using a debugger, but in this case, I don't know if
>>>>>>>
>>>>>>>
>>>>>> there
>>>>>>
>>>>>>
>>>>>>> is a good debugger to use; or -- maybe a debugger is not necessary
>> for
>>>>>>> hadoop? If not, then how do you trace through the code to either
>> debug or
>>>>>>> just gain an understanding about the system? May I know what you,
>>>>>>> experienced experts, do? :)
>>>>>>>
>>>>>>> b. Where to run hadoop?
>>>>>>> Also -- may I know where you run your hadoop? Do you run on linux, or
>> on
>>>>>>>
>>>>>> VM
>>>>>>
>>>>>>
>>>>>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
>>>>>>> mapreduce applications with hadoop itself as a blackbox; is it true?
>> If
>>>>>>>
>>>>>> my
>>>>>>
>>>>>>
>>>>>>> ultimate goal is to understand how hadoop works internally, would it
>> be
>>>>>>> better if I directly run it on linux?
>>>>>>>
>>>>>>> c. Single-node or multi-node?
>>>>>>> In the beginning (just like my case :p) would it be better to use
>>>>>>> single-node or multi-node? If the latter is true, should I obtain
>> more
>>>>>>> machines, or should I use more virtual machines to create more nodes?
>>>>>>>
>>>>>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
>>>>>>> questions. If possible, please help me out? Any suggestion or advice
>> will
>>>>>>>
>>>>>> be
>>>>>>
>>>>>>
>>>>>>> greatly appreciated. Thank you very much!
>>>>>>>
>>>>>>> Best,
>>>>>>> Rita :)
>>>>>>>
>>>>>>> P.S. If my questions are not suitable for this mailing-list, please
>> let
>>>>>>>
>>>>>> me
>>>>>>
>>>>>>
>>>>>>> apologize, and then, could you please direct me to other
>> mailing-lists?
>>>>>>> Sorry, and thanks a lot! :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

Re: Hadoop basics

Posted by Rita Liu <cr...@gmail.com>.

Hi Harsh and Piyush! Thank you very much. So it seems like it would be best
if I use log4j to trace, and debugging with a debugger is still possible if
I set "mapred.job.tracker" to be "local" and "fs.default.name" to be
"local", in hadoop-site.xml. Plus: in hadoop-env.sh, I should specify
HADOOP_OPTS to be:

"-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000" (why
8000? also, what does "-agentlib:jdwp=transport=dt_socket" mean?)

... in order to use a debugger. Is my understanding correct? :)

If so -- then which debugger do you use? May I know? Thanks a lot! I am also
going to try log4j now!

Many thanks,
-Rita :))

On Sat, Aug 14, 2010 at 10:22 PM, Piyush Garg <pi...@gmail.com>wrote:

> Hi Smith,
>
> step debugging also works in hadoop as with other java applications.
> export
>
> HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
> 'suspend=y' is to let the jvm suspend until the remote debugger is
> attached.
>
> Thanks and Regards
> Piyush Garg
>
>
> On Sunday 15 August 2010 10:39 AM, smith jack wrote:
> > that means you can only trace by log,
> > and not possible to debug hadoop using step debug, haha
> > distributed system always introduce extra complexity and confusing
> issues.
> >
> > 2010/8/15 Piyush Garg <pi...@gmail.com>:
> >
> >> Hi Rita,
> >>
> >> You can put log4j logger debug statements in the code. log4j library is
> >> part of hadoop framework and there is already a log4j.properties file in
> >> hadoop conf directory and all the output logs are saved in hadoop logs
> >> directory.
> >>
> >> Thanks and Regards
> >> Piyush Garg
> >>
> >>
> >> On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
> >>
> >>> Thank you very much, Piyush! :) May I know more about how to use
> "traces"?
> >>>
> >>> And -- yes, please teach me if possible, experts! :)
> >>>
> >>> Thanks a lot,
> >>> -Rita :))
> >>>
> >>> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <pi...@gmail.com>
> wrote:
> >>>
> >>>
> >>>
> >>>> Hi Rita,
> >>>>
> >>>> I have just started to learn hadoop as well, I know there is a long
> way
> >>>> to go.
> >>>> I found some useful links which I am sharing with you.
> >>>>
> >>>> Hadoop Tutorial - YDN
> >>>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
> >>>> beginners tutorial and well organized.
> >>>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
> >>>> <
> >>>>
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
> >>>>
> >>>>
> >>>>>
> >>>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
> >>>> <
> >>>>
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
> >>>>
> >>>>
> >>>>>
> >>>> The tutorial on the hadoop wiki
> >>>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html>
> is
> >>>> too much for a beginner.
> >>>>
> >>>> Debugger:
> >>>> I do not think you can easily do debugging using remote debugger. This
> >>>> is natural since hadoop is not sequential programming, it would be
> very
> >>>> difficult to debug its apps.
> >>>> The only way to debug is to use traces.
> >>>>
> >>>> I think you can learn how to setup multi-node cluster, but for
> practice
> >>>> session you can use single node setup.
> >>>>
> >>>> Lets see what the experts say.
> >>>>
> >>>> Thanks and Regards
> >>>> Piyush Garg
> >>>>
> >>>>
> >>>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
> >>>>
> >>>>
> >>>>> Hi!
> >>>>>
> >>>>> I am a total beginner, but I am very interested in hadoop. I've
> already
> >>>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
> >>>>>
> >>>>>
> >>>> want
> >>>>
> >>>>
> >>>>> to do two things:
> >>>>>
> >>>>> 1. Explore how hadoop works internally with one of the example
> >>>>>
> >>>>>
> >>>> applications
> >>>>
> >>>>
> >>>>> hadoop provides
> >>>>> 2. Write an application on my own
> >>>>>
> >>>>> Those two things bring me following questions:
> >>>>>
> >>>>> a. debugger?
> >>>>> I am stuck since I don't know how to "explore" hadoop. I used to
> trace
> >>>>> through the code using a debugger, but in this case, I don't know if
> >>>>>
> >>>>>
> >>>> there
> >>>>
> >>>>
> >>>>> is a good debugger to use; or -- maybe a debugger is not necessary
> for
> >>>>> hadoop? If not, then how do you trace through the code to either
> debug or
> >>>>> just gain an understanding about the system? May I know what you,
> >>>>> experienced experts, do? :)
> >>>>>
> >>>>> b. Where to run hadoop?
> >>>>> Also -- may I know where you run your hadoop? Do you run on linux, or
> on
> >>>>>
> >>>>>
> >>>> VM
> >>>>
> >>>>
> >>>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
> >>>>> mapreduce applications with hadoop itself as a blackbox; is it true?
> If
> >>>>>
> >>>>>
> >>>> my
> >>>>
> >>>>
> >>>>> ultimate goal is to understand how hadoop works internally, would it
> be
> >>>>> better if I directly run it on linux?
> >>>>>
> >>>>> c. Single-node or multi-node?
> >>>>> In the beginning (just like my case :p) would it be better to use
> >>>>> single-node or multi-node? If the latter is true, should I obtain
> more
> >>>>> machines, or should I use more virtual machines to create more nodes?
> >>>>>
> >>>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
> >>>>> questions. If possible, please help me out? Any suggestion or advice
> will
> >>>>>
> >>>>>
> >>>> be
> >>>>
> >>>>
> >>>>> greatly appreciated. Thank you very much!
> >>>>>
> >>>>> Best,
> >>>>> Rita :)
> >>>>>
> >>>>> P.S. If my questions are not suitable for this mailing-list, please
> let
> >>>>>
> >>>>>
> >>>> me
> >>>>
> >>>>
> >>>>> apologize, and then, could you please direct me to other
> mailing-lists?
> >>>>> Sorry, and thanks a lot! :)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: Hadoop basics

Posted by Piyush Garg <pi...@gmail.com>.

Hi Smith,

step debugging also works in hadoop as with other java applications.
export
HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
'suspend=y' is to let the jvm suspend until the remote debugger is attached.

Thanks and Regards
Piyush Garg


On Sunday 15 August 2010 10:39 AM, smith jack wrote:
> that means you can only trace by log,
> and not possible to debug hadoop using step debug, haha
> distributed system always introduce extra complexity and confusing issues.
>
> 2010/8/15 Piyush Garg <pi...@gmail.com>:
>   
>> Hi Rita,
>>
>> You can put log4j logger debug statements in the code. log4j library is
>> part of hadoop framework and there is already a log4j.properties file in
>> hadoop conf directory and all the output logs are saved in hadoop logs
>> directory.
>>
>> Thanks and Regards
>> Piyush Garg
>>
>>
>> On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
>>     
>>> Thank you very much, Piyush! :) May I know more about how to use "traces"?
>>>
>>> And -- yes, please teach me if possible, experts! :)
>>>
>>> Thanks a lot,
>>> -Rita :))
>>>
>>> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <pi...@gmail.com> wrote:
>>>
>>>
>>>       
>>>> Hi Rita,
>>>>
>>>> I have just started to learn hadoop as well, I know there is a long way
>>>> to go.
>>>> I found some useful links which I am sharing with you.
>>>>
>>>> Hadoop Tutorial - YDN
>>>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
>>>> beginners tutorial and well organized.
>>>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
>>>> <
>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>>>>
>>>>         
>>>>>           
>>>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>>>> <
>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>
>>>>         
>>>>>           
>>>> The tutorial on the hadoop wiki
>>>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
>>>> too much for a beginner.
>>>>
>>>> Debugger:
>>>> I do not think you can easily do debugging using remote debugger. This
>>>> is natural since hadoop is not sequential programming, it would be very
>>>> difficult to debug its apps.
>>>> The only way to debug is to use traces.
>>>>
>>>> I think you can learn how to setup multi-node cluster, but for practice
>>>> session you can use single node setup.
>>>>
>>>> Lets see what the experts say.
>>>>
>>>> Thanks and Regards
>>>> Piyush Garg
>>>>
>>>>
>>>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>>>>
>>>>         
>>>>> Hi!
>>>>>
>>>>> I am a total beginner, but I am very interested in hadoop. I've already
>>>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
>>>>>
>>>>>           
>>>> want
>>>>
>>>>         
>>>>> to do two things:
>>>>>
>>>>> 1. Explore how hadoop works internally with one of the example
>>>>>
>>>>>           
>>>> applications
>>>>
>>>>         
>>>>> hadoop provides
>>>>> 2. Write an application on my own
>>>>>
>>>>> Those two things bring me following questions:
>>>>>
>>>>> a. debugger?
>>>>> I am stuck since I don't know how to "explore" hadoop. I used to trace
>>>>> through the code using a debugger, but in this case, I don't know if
>>>>>
>>>>>           
>>>> there
>>>>
>>>>         
>>>>> is a good debugger to use; or -- maybe a debugger is not necessary for
>>>>> hadoop? If not, then how do you trace through the code to either debug or
>>>>> just gain an understanding about the system? May I know what you,
>>>>> experienced experts, do? :)
>>>>>
>>>>> b. Where to run hadoop?
>>>>> Also -- may I know where you run your hadoop? Do you run on linux, or on
>>>>>
>>>>>           
>>>> VM
>>>>
>>>>         
>>>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
>>>>> mapreduce applications with hadoop itself as a blackbox; is it true? If
>>>>>
>>>>>           
>>>> my
>>>>
>>>>         
>>>>> ultimate goal is to understand how hadoop works internally, would it be
>>>>> better if I directly run it on linux?
>>>>>
>>>>> c. Single-node or multi-node?
>>>>> In the beginning (just like my case :p) would it be better to use
>>>>> single-node or multi-node? If the latter is true, should I obtain more
>>>>> machines, or should I use more virtual machines to create more nodes?
>>>>>
>>>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
>>>>> questions. If possible, please help me out? Any suggestion or advice will
>>>>>
>>>>>           
>>>> be
>>>>
>>>>         
>>>>> greatly appreciated. Thank you very much!
>>>>>
>>>>> Best,
>>>>> Rita :)
>>>>>
>>>>> P.S. If my questions are not suitable for this mailing-list, please let
>>>>>
>>>>>           
>>>> me
>>>>
>>>>         
>>>>> apologize, and then, could you please direct me to other mailing-lists?
>>>>> Sorry, and thanks a lot! :)
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>         
>>>       
>>

Re: Hadoop basics

Posted by Harsh J <qw...@gmail.com>.

If you start your job in a single-node cluster with a "local"
configuration (conf.set("mapred.job.tracker", "local") and
conf.set("fs.default.name", "local")), you can _almost_ debug all the
vital parts. I use this method (though its been deprecated) to debug
my Map and Reduce functions locally.

Remote debugging with multiple-nodes would still be cool to have though.

On Sun, Aug 15, 2010 at 10:40 AM, Rita Liu <cr...@gmail.com> wrote:
> Thank you very much, Piyush! I'll do as you say :DD Thanks a lot!!
>
> Thanks Smith :) hmm ... I see. ok :)
>
> Please give me more guidance and suggestions if possible, dear experts!
> -Rita :))
>
> On Sat, Aug 14, 2010 at 10:09 PM, smith jack <th...@gmail.com> wrote:
>
>> that means you can only trace by log,
>> and not possible to debug hadoop using step debug, haha
>> distributed system always introduce extra complexity and confusing issues.
>>
>> 2010/8/15 Piyush Garg <pi...@gmail.com>:
>> > Hi Rita,
>> >
>> > You can put log4j logger debug statements in the code. log4j library is
>> > part of hadoop framework and there is already a log4j.properties file in
>> > hadoop conf directory and all the output logs are saved in hadoop logs
>> > directory.
>> >
>> > Thanks and Regards
>> > Piyush Garg
>> >
>> >
>> > On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
>> >> Thank you very much, Piyush! :) May I know more about how to use
>> "traces"?
>> >>
>> >> And -- yes, please teach me if possible, experts! :)
>> >>
>> >> Thanks a lot,
>> >> -Rita :))
>> >>
>> >> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <pi...@gmail.com>
>> wrote:
>> >>
>> >>
>> >>> Hi Rita,
>> >>>
>> >>> I have just started to learn hadoop as well, I know there is a long way
>> >>> to go.
>> >>> I found some useful links which I am sharing with you.
>> >>>
>> >>> Hadoop Tutorial - YDN
>> >>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
>> >>> beginners tutorial and well organized.
>> >>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
>> >>> <
>> >>>
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>> >>>
>> >>>>
>> >>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>> >>> <
>> >>>
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>> >>>
>> >>>>
>> >>> The tutorial on the hadoop wiki
>> >>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
>> >>> too much for a beginner.
>> >>>
>> >>> Debugger:
>> >>> I do not think you can easily do debugging using remote debugger. This
>> >>> is natural since hadoop is not sequential programming, it would be very
>> >>> difficult to debug its apps.
>> >>> The only way to debug is to use traces.
>> >>>
>> >>> I think you can learn how to setup multi-node cluster, but for practice
>> >>> session you can use single node setup.
>> >>>
>> >>> Lets see what the experts say.
>> >>>
>> >>> Thanks and Regards
>> >>> Piyush Garg
>> >>>
>> >>>
>> >>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>> >>>
>> >>>> Hi!
>> >>>>
>> >>>> I am a total beginner, but I am very interested in hadoop. I've
>> already
>> >>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
>> >>>>
>> >>> want
>> >>>
>> >>>> to do two things:
>> >>>>
>> >>>> 1. Explore how hadoop works internally with one of the example
>> >>>>
>> >>> applications
>> >>>
>> >>>> hadoop provides
>> >>>> 2. Write an application on my own
>> >>>>
>> >>>> Those two things bring me following questions:
>> >>>>
>> >>>> a. debugger?
>> >>>> I am stuck since I don't know how to "explore" hadoop. I used to trace
>> >>>> through the code using a debugger, but in this case, I don't know if
>> >>>>
>> >>> there
>> >>>
>> >>>> is a good debugger to use; or -- maybe a debugger is not necessary for
>> >>>> hadoop? If not, then how do you trace through the code to either debug
>> or
>> >>>> just gain an understanding about the system? May I know what you,
>> >>>> experienced experts, do? :)
>> >>>>
>> >>>> b. Where to run hadoop?
>> >>>> Also -- may I know where you run your hadoop? Do you run on linux, or
>> on
>> >>>>
>> >>> VM
>> >>>
>> >>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
>> >>>> mapreduce applications with hadoop itself as a blackbox; is it true?
>> If
>> >>>>
>> >>> my
>> >>>
>> >>>> ultimate goal is to understand how hadoop works internally, would it
>> be
>> >>>> better if I directly run it on linux?
>> >>>>
>> >>>> c. Single-node or multi-node?
>> >>>> In the beginning (just like my case :p) would it be better to use
>> >>>> single-node or multi-node? If the latter is true, should I obtain more
>> >>>> machines, or should I use more virtual machines to create more nodes?
>> >>>>
>> >>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
>> >>>> questions. If possible, please help me out? Any suggestion or advice
>> will
>> >>>>
>> >>> be
>> >>>
>> >>>> greatly appreciated. Thank you very much!
>> >>>>
>> >>>> Best,
>> >>>> Rita :)
>> >>>>
>> >>>> P.S. If my questions are not suitable for this mailing-list, please
>> let
>> >>>>
>> >>> me
>> >>>
>> >>>> apologize, and then, could you please direct me to other
>> mailing-lists?
>> >>>> Sorry, and thanks a lot! :)
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>



-- 
Harsh J
www.harshj.com

Re: Hadoop basics

Posted by Rita Liu <cr...@gmail.com>.

Thank you very much, Piyush! I'll do as you say :DD Thanks a lot!!

Thanks Smith :) hmm ... I see. ok :)

Please give me more guidance and suggestions if possible, dear experts!
-Rita :))

On Sat, Aug 14, 2010 at 10:09 PM, smith jack <th...@gmail.com> wrote:

> that means you can only trace by log,
> and not possible to debug hadoop using step debug, haha
> distributed system always introduce extra complexity and confusing issues.
>
> 2010/8/15 Piyush Garg <pi...@gmail.com>:
> > Hi Rita,
> >
> > You can put log4j logger debug statements in the code. log4j library is
> > part of hadoop framework and there is already a log4j.properties file in
> > hadoop conf directory and all the output logs are saved in hadoop logs
> > directory.
> >
> > Thanks and Regards
> > Piyush Garg
> >
> >
> > On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
> >> Thank you very much, Piyush! :) May I know more about how to use
> "traces"?
> >>
> >> And -- yes, please teach me if possible, experts! :)
> >>
> >> Thanks a lot,
> >> -Rita :))
> >>
> >> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <pi...@gmail.com>
> wrote:
> >>
> >>
> >>> Hi Rita,
> >>>
> >>> I have just started to learn hadoop as well, I know there is a long way
> >>> to go.
> >>> I found some useful links which I am sharing with you.
> >>>
> >>> Hadoop Tutorial - YDN
> >>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
> >>> beginners tutorial and well organized.
> >>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
> >>> <
> >>>
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
> >>>
> >>>>
> >>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
> >>> <
> >>>
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
> >>>
> >>>>
> >>> The tutorial on the hadoop wiki
> >>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
> >>> too much for a beginner.
> >>>
> >>> Debugger:
> >>> I do not think you can easily do debugging using remote debugger. This
> >>> is natural since hadoop is not sequential programming, it would be very
> >>> difficult to debug its apps.
> >>> The only way to debug is to use traces.
> >>>
> >>> I think you can learn how to setup multi-node cluster, but for practice
> >>> session you can use single node setup.
> >>>
> >>> Lets see what the experts say.
> >>>
> >>> Thanks and Regards
> >>> Piyush Garg
> >>>
> >>>
> >>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
> >>>
> >>>> Hi!
> >>>>
> >>>> I am a total beginner, but I am very interested in hadoop. I've
> already
> >>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
> >>>>
> >>> want
> >>>
> >>>> to do two things:
> >>>>
> >>>> 1. Explore how hadoop works internally with one of the example
> >>>>
> >>> applications
> >>>
> >>>> hadoop provides
> >>>> 2. Write an application on my own
> >>>>
> >>>> Those two things bring me following questions:
> >>>>
> >>>> a. debugger?
> >>>> I am stuck since I don't know how to "explore" hadoop. I used to trace
> >>>> through the code using a debugger, but in this case, I don't know if
> >>>>
> >>> there
> >>>
> >>>> is a good debugger to use; or -- maybe a debugger is not necessary for
> >>>> hadoop? If not, then how do you trace through the code to either debug
> or
> >>>> just gain an understanding about the system? May I know what you,
> >>>> experienced experts, do? :)
> >>>>
> >>>> b. Where to run hadoop?
> >>>> Also -- may I know where you run your hadoop? Do you run on linux, or
> on
> >>>>
> >>> VM
> >>>
> >>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
> >>>> mapreduce applications with hadoop itself as a blackbox; is it true?
> If
> >>>>
> >>> my
> >>>
> >>>> ultimate goal is to understand how hadoop works internally, would it
> be
> >>>> better if I directly run it on linux?
> >>>>
> >>>> c. Single-node or multi-node?
> >>>> In the beginning (just like my case :p) would it be better to use
> >>>> single-node or multi-node? If the latter is true, should I obtain more
> >>>> machines, or should I use more virtual machines to create more nodes?
> >>>>
> >>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
> >>>> questions. If possible, please help me out? Any suggestion or advice
> will
> >>>>
> >>> be
> >>>
> >>>> greatly appreciated. Thank you very much!
> >>>>
> >>>> Best,
> >>>> Rita :)
> >>>>
> >>>> P.S. If my questions are not suitable for this mailing-list, please
> let
> >>>>
> >>> me
> >>>
> >>>> apologize, and then, could you please direct me to other
> mailing-lists?
> >>>> Sorry, and thanks a lot! :)
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: Hadoop basics

Posted by smith jack <th...@gmail.com>.

that means you can only trace by log,
and not possible to debug hadoop using step debug, haha
distributed system always introduce extra complexity and confusing issues.

2010/8/15 Piyush Garg <pi...@gmail.com>:
> Hi Rita,
>
> You can put log4j logger debug statements in the code. log4j library is
> part of hadoop framework and there is already a log4j.properties file in
> hadoop conf directory and all the output logs are saved in hadoop logs
> directory.
>
> Thanks and Regards
> Piyush Garg
>
>
> On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
>> Thank you very much, Piyush! :) May I know more about how to use "traces"?
>>
>> And -- yes, please teach me if possible, experts! :)
>>
>> Thanks a lot,
>> -Rita :))
>>
>> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <pi...@gmail.com> wrote:
>>
>>
>>> Hi Rita,
>>>
>>> I have just started to learn hadoop as well, I know there is a long way
>>> to go.
>>> I found some useful links which I am sharing with you.
>>>
>>> Hadoop Tutorial - YDN
>>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
>>> beginners tutorial and well organized.
>>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
>>> <
>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>>>
>>>>
>>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>>> <
>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>
>>>>
>>> The tutorial on the hadoop wiki
>>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
>>> too much for a beginner.
>>>
>>> Debugger:
>>> I do not think you can easily do debugging using remote debugger. This
>>> is natural since hadoop is not sequential programming, it would be very
>>> difficult to debug its apps.
>>> The only way to debug is to use traces.
>>>
>>> I think you can learn how to setup multi-node cluster, but for practice
>>> session you can use single node setup.
>>>
>>> Lets see what the experts say.
>>>
>>> Thanks and Regards
>>> Piyush Garg
>>>
>>>
>>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>>>
>>>> Hi!
>>>>
>>>> I am a total beginner, but I am very interested in hadoop. I've already
>>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
>>>>
>>> want
>>>
>>>> to do two things:
>>>>
>>>> 1. Explore how hadoop works internally with one of the example
>>>>
>>> applications
>>>
>>>> hadoop provides
>>>> 2. Write an application on my own
>>>>
>>>> Those two things bring me following questions:
>>>>
>>>> a. debugger?
>>>> I am stuck since I don't know how to "explore" hadoop. I used to trace
>>>> through the code using a debugger, but in this case, I don't know if
>>>>
>>> there
>>>
>>>> is a good debugger to use; or -- maybe a debugger is not necessary for
>>>> hadoop? If not, then how do you trace through the code to either debug or
>>>> just gain an understanding about the system? May I know what you,
>>>> experienced experts, do? :)
>>>>
>>>> b. Where to run hadoop?
>>>> Also -- may I know where you run your hadoop? Do you run on linux, or on
>>>>
>>> VM
>>>
>>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
>>>> mapreduce applications with hadoop itself as a blackbox; is it true? If
>>>>
>>> my
>>>
>>>> ultimate goal is to understand how hadoop works internally, would it be
>>>> better if I directly run it on linux?
>>>>
>>>> c. Single-node or multi-node?
>>>> In the beginning (just like my case :p) would it be better to use
>>>> single-node or multi-node? If the latter is true, should I obtain more
>>>> machines, or should I use more virtual machines to create more nodes?
>>>>
>>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
>>>> questions. If possible, please help me out? Any suggestion or advice will
>>>>
>>> be
>>>
>>>> greatly appreciated. Thank you very much!
>>>>
>>>> Best,
>>>> Rita :)
>>>>
>>>> P.S. If my questions are not suitable for this mailing-list, please let
>>>>
>>> me
>>>
>>>> apologize, and then, could you please direct me to other mailing-lists?
>>>> Sorry, and thanks a lot! :)
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hadoop basics

Posted by Piyush Garg <pi...@gmail.com>.

Hi Rita,

You can put log4j logger debug statements in the code. log4j library is
part of hadoop framework and there is already a log4j.properties file in
hadoop conf directory and all the output logs are saved in hadoop logs
directory.

Thanks and Regards
Piyush Garg


On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
> Thank you very much, Piyush! :) May I know more about how to use "traces"?
>
> And -- yes, please teach me if possible, experts! :)
>
> Thanks a lot,
> -Rita :))
>
> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <pi...@gmail.com> wrote:
>
>   
>> Hi Rita,
>>
>> I have just started to learn hadoop as well, I know there is a long way
>> to go.
>> I found some useful links which I am sharing with you.
>>
>> Hadoop Tutorial - YDN
>> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
>> beginners tutorial and well organized.
>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
>> <
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>>     
>>>       
>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>> <
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>     
>>>       
>> The tutorial on the hadoop wiki
>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
>> too much for a beginner.
>>
>> Debugger:
>> I do not think you can easily do debugging using remote debugger. This
>> is natural since hadoop is not sequential programming, it would be very
>> difficult to debug its apps.
>> The only way to debug is to use traces.
>>
>> I think you can learn how to setup multi-node cluster, but for practice
>> session you can use single node setup.
>>
>> Lets see what the experts say.
>>
>> Thanks and Regards
>> Piyush Garg
>>
>>
>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>>     
>>> Hi!
>>>
>>> I am a total beginner, but I am very interested in hadoop. I've already
>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
>>>       
>> want
>>     
>>> to do two things:
>>>
>>> 1. Explore how hadoop works internally with one of the example
>>>       
>> applications
>>     
>>> hadoop provides
>>> 2. Write an application on my own
>>>
>>> Those two things bring me following questions:
>>>
>>> a. debugger?
>>> I am stuck since I don't know how to "explore" hadoop. I used to trace
>>> through the code using a debugger, but in this case, I don't know if
>>>       
>> there
>>     
>>> is a good debugger to use; or -- maybe a debugger is not necessary for
>>> hadoop? If not, then how do you trace through the code to either debug or
>>> just gain an understanding about the system? May I know what you,
>>> experienced experts, do? :)
>>>
>>> b. Where to run hadoop?
>>> Also -- may I know where you run your hadoop? Do you run on linux, or on
>>>       
>> VM
>>     
>>> -- in particular, Cloudera? I heard that Cloudera is good for writing
>>> mapreduce applications with hadoop itself as a blackbox; is it true? If
>>>       
>> my
>>     
>>> ultimate goal is to understand how hadoop works internally, would it be
>>> better if I directly run it on linux?
>>>
>>> c. Single-node or multi-node?
>>> In the beginning (just like my case :p) would it be better to use
>>> single-node or multi-node? If the latter is true, should I obtain more
>>> machines, or should I use more virtual machines to create more nodes?
>>>
>>> As a newbie, I am sorry for all those basic (and silly, I know :$)
>>> questions. If possible, please help me out? Any suggestion or advice will
>>>       
>> be
>>     
>>> greatly appreciated. Thank you very much!
>>>
>>> Best,
>>> Rita :)
>>>
>>> P.S. If my questions are not suitable for this mailing-list, please let
>>>       
>> me
>>     
>>> apologize, and then, could you please direct me to other mailing-lists?
>>> Sorry, and thanks a lot! :)
>>>
>>>
>>>       
>>     
>

Re: Hadoop basics

Posted by Rita Liu <cr...@gmail.com>.

Thank you very much, Piyush! :) May I know more about how to use "traces"?

And -- yes, please teach me if possible, experts! :)

Thanks a lot,
-Rita :))

On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg <pi...@gmail.com> wrote:

> Hi Rita,
>
> I have just started to learn hadoop as well, I know there is a long way
> to go.
> I found some useful links which I am sharing with you.
>
> Hadoop Tutorial - YDN
> <http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
> beginners tutorial and well organized.
> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
> <
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
> >
> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
> <
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
> >
> The tutorial on the hadoop wiki
> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
> too much for a beginner.
>
> Debugger:
> I do not think you can easily do debugging using remote debugger. This
> is natural since hadoop is not sequential programming, it would be very
> difficult to debug its apps.
> The only way to debug is to use traces.
>
> I think you can learn how to setup multi-node cluster, but for practice
> session you can use single node setup.
>
> Lets see what the experts say.
>
> Thanks and Regards
> Piyush Garg
>
>
> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
> > Hi!
> >
> > I am a total beginner, but I am very interested in hadoop. I've already
> > downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I
> want
> > to do two things:
> >
> > 1. Explore how hadoop works internally with one of the example
> applications
> > hadoop provides
> > 2. Write an application on my own
> >
> > Those two things bring me following questions:
> >
> > a. debugger?
> > I am stuck since I don't know how to "explore" hadoop. I used to trace
> > through the code using a debugger, but in this case, I don't know if
> there
> > is a good debugger to use; or -- maybe a debugger is not necessary for
> > hadoop? If not, then how do you trace through the code to either debug or
> > just gain an understanding about the system? May I know what you,
> > experienced experts, do? :)
> >
> > b. Where to run hadoop?
> > Also -- may I know where you run your hadoop? Do you run on linux, or on
> VM
> > -- in particular, Cloudera? I heard that Cloudera is good for writing
> > mapreduce applications with hadoop itself as a blackbox; is it true? If
> my
> > ultimate goal is to understand how hadoop works internally, would it be
> > better if I directly run it on linux?
> >
> > c. Single-node or multi-node?
> > In the beginning (just like my case :p) would it be better to use
> > single-node or multi-node? If the latter is true, should I obtain more
> > machines, or should I use more virtual machines to create more nodes?
> >
> > As a newbie, I am sorry for all those basic (and silly, I know :$)
> > questions. If possible, please help me out? Any suggestion or advice will
> be
> > greatly appreciated. Thank you very much!
> >
> > Best,
> > Rita :)
> >
> > P.S. If my questions are not suitable for this mailing-list, please let
> me
> > apologize, and then, could you please direct me to other mailing-lists?
> > Sorry, and thanks a lot! :)
> >
> >
>

Re: Hadoop basics

Posted by Piyush Garg <pi...@gmail.com>.

Hi Rita,

I have just started to learn hadoop as well, I know there is a long way
to go.
I found some useful links which I am sharing with you.

Hadoop Tutorial - YDN
<http://developer.yahoo.com/hadoop/tutorial/index.html> excellent
beginners tutorial and well organized.
Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29>
Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29>
The tutorial on the hadoop wiki
<http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html> is
too much for a beginner.

Debugger:
I do not think you can easily do debugging using remote debugger. This
is natural since hadoop is not sequential programming, it would be very
difficult to debug its apps.
The only way to debug is to use traces.

I think you can learn how to setup multi-node cluster, but for practice
session you can use single node setup.

Lets see what the experts say.

Thanks and Regards
Piyush Garg

On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
> Hi!
>
> I am a total beginner, but I am very interested in hadoop. I've already
> downloaded hadoop 0.19.2 and run on Ubuntu in single-node mode. Now I want
> to do two things:
>
> 1. Explore how hadoop works internally with one of the example applications
> hadoop provides
> 2. Write an application on my own
>
> Those two things bring me following questions:
>
> a. debugger?
> I am stuck since I don't know how to "explore" hadoop. I used to trace
> through the code using a debugger, but in this case, I don't know if there
> is a good debugger to use; or -- maybe a debugger is not necessary for
> hadoop? If not, then how do you trace through the code to either debug or
> just gain an understanding about the system? May I know what you,
> experienced experts, do? :)
>
> b. Where to run hadoop?
> Also -- may I know where you run your hadoop? Do you run on linux, or on VM
> -- in particular, Cloudera? I heard that Cloudera is good for writing
> mapreduce applications with hadoop itself as a blackbox; is it true? If my
> ultimate goal is to understand how hadoop works internally, would it be
> better if I directly run it on linux?
>
> c. Single-node or multi-node?
> In the beginning (just like my case :p) would it be better to use
> single-node or multi-node? If the latter is true, should I obtain more
> machines, or should I use more virtual machines to create more nodes?
>
> As a newbie, I am sorry for all those basic (and silly, I know :$)
> questions. If possible, please help me out? Any suggestion or advice will be
> greatly appreciated. Thank you very much!
>
> Best,
> Rita :)
>
> P.S. If my questions are not suitable for this mailing-list, please let me
> apologize, and then, could you please direct me to other mailing-lists?
> Sorry, and thanks a lot! :)
>
>