You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Ajit Ratnaparkhi <aj...@gmail.com> on 2009/03/01 09:06:24 UTC

Re: Contributing to hadoop

Hi,
thanks for your help.

I tried the above mentioned script(one mentioned by Raghu), but whenever i
execute it, following message gets displayed,
*datanode running as process <process_id>. Stop it first*.
I am starting the single node cluster by command bin/start-dfs.sh first,
after which i am executing the above mentioned script to start second
datanode.

I also tried giving seperate changed configuration from a seperate directory
for config by executing command,
*bin/hadoop-daemons.sh --config <config-directory-path> start datanode*
Still it gives same message as above.

also in this thread before Ramya mentioned about DataNodeCluster.java. This
will help, but I am not getting how to execute this class. Can you please
help regarding this.

thanks,
-Ajit.

On Thu, Feb 26, 2009 at 6:43 PM, Raghu Angadi <ra...@yahoo-inc.com> wrote:

>
> You can run with a small shell script. You need to override couple of
> environment and config variables.
>
> something like :
>
> run_datanode () {
>        DN=$2
>        HADOOP_LOG_DIR=logs$DN
>        HADOOP_PID_DIR=$HADOOP_LOG_DIR
>        bin/hadoop-daemon.sh $1 datanode \
>          -Dhadoop.tmp.dir=/some/dir/dfs$DN \
>          -Ddfs.datanode.address=0.0.0.0:5001$DN \
>          -Ddfs.datanode.http.address=0.0.0.0:5008$DN \
>          -Ddfs.datanode.ipc.address=0.0.0.0:5002$DN
> }
>
> You can start second datanode like : run_datanode start 2
>
> Pretty useful for testing.
>
> Raghu.
>
>
> Ajit Ratnaparkhi wrote:
>
>> Raghu,
>>
>> Can you please tell me how to run multiple datanodes on one machine.
>>
>> thanks,
>> -Ajit.
>>
>> On Thu, Feb 26, 2009 at 9:23 AM, Pradeep Fernando <pradeepfn@gmail.com
>> >wrote:
>>
>>  Raghu,
>>>
>>> I guess you are asking if it would be more convenient if one had access
>>> to
>>> a
>>>
>>>> larger cluster for development.
>>>>
>>>
>>> exactly.....
>>>
>>>  I have access to many machines and clusters.. but about 99% of my
>>>
>>>> development happens using single machine for testing. I would guess that
>>>>
>>> is
>>>
>>>> true for most of the Hadoop developers.
>>>>
>>>
>>> well this is the answer I was looking for....  :D
>>> seems to be I have enough resources to contribute to this project.
>>> Thanks a lot raghu.
>>>
>>> regards,
>>> Pradeep Fernando.
>>>
>>>
>>
>

Re: Contributing to hadoop

Posted by Jakob Homan <jh...@yahoo-inc.com>.

There is definitely something to be said for developing via TDD as  
Lohit mentioned.

Hadoop has an extensive set of tools for writing unit tests that run  
on simulated clusters (see http://www.cloudera.com/blog/2008/12/16/testing-hadoop/ 
  for an excellent tutorial).  This will save you time in the long run  
because your testing can be contributed as well as the actual patch  
and there's no need to muck about with configuring clusters, manually  
starting datanodes, etc.

Actually needing a cluster to test or develop patches against is  
pretty rare and indicative of a problem somewhere else.

-Jakob



On Mar 4, 2009, at 11:08 AM, Raghu Angadi wrote:

> Ajit Ratnaparkhi wrote:
>> Hi,
>> thanks for your help.
>> I tried the above mentioned script(one mentioned by Raghu), but  
>> whenever i
>> execute it, following message gets displayed,
>> *datanode running as process <process_id>. Stop it first*.
>> I am starting the single node cluster by command bin/start-dfs.sh  
>> first,
>> after which i am executing the above mentioned script to start second
>> datanode.
>
> Did you try to do what the error message asks you to? Better still,  
> you should try to find where the message is coming from. I realize  
> this is not particularly a useful reply for a user but for a  
> developer, I hope it is.
>
> I just wrote the example script in the mail editor. I did not test  
> it.. may be 'export' before setting HADOOP_* env variables in the  
> script is required. Currently I use a different (a bit less elegant)  
> method for starting multiple nodes. When I switch to this method, I  
> will post the script.
>
> better still, post your script once you get it to working.
>
> Raghu.
>
>> I also tried giving seperate changed configuration from a seperate  
>> directory
>> for config by executing command,
>> *bin/hadoop-daemons.sh --config <config-directory-path> start  
>> datanode*
>> Still it gives same message as above.
>> also in this thread before Ramya mentioned about  
>> DataNodeCluster.java. This
>> will help, but I am not getting how to execute this class. Can you  
>> please
>> help regarding this.
>> thanks,
>> -Ajit.
>> On Thu, Feb 26, 2009 at 6:43 PM, Raghu Angadi <rangadi@yahoo- 
>> inc.com> wrote:
>>> You can run with a small shell script. You need to override couple  
>>> of
>>> environment and config variables.
>>>
>>> something like :
>>>
>>> run_datanode () {
>>>       DN=$2
>>>       HADOOP_LOG_DIR=logs$DN
>>>       HADOOP_PID_DIR=$HADOOP_LOG_DIR
>>>       bin/hadoop-daemon.sh $1 datanode \
>>>         -Dhadoop.tmp.dir=/some/dir/dfs$DN \
>>>         -Ddfs.datanode.address=0.0.0.0:5001$DN \
>>>         -Ddfs.datanode.http.address=0.0.0.0:5008$DN \
>>>         -Ddfs.datanode.ipc.address=0.0.0.0:5002$DN
>>> }
>>>
>>> You can start second datanode like : run_datanode start 2
>>>
>>> Pretty useful for testing.
>>>
>>> Raghu.
>>>
>>>
>>> Ajit Ratnaparkhi wrote:
>>>
>>>> Raghu,
>>>>
>>>> Can you please tell me how to run multiple datanodes on one  
>>>> machine.
>>>>
>>>> thanks,
>>>> -Ajit.
>>>>
>>>> On Thu, Feb 26, 2009 at 9:23 AM, Pradeep Fernando <pradeepfn@gmail.com
>>>>> wrote:
>>>> Raghu,
>>>>> I guess you are asking if it would be more convenient if one had  
>>>>> access
>>>>> to
>>>>> a
>>>>>
>>>>>> larger cluster for development.
>>>>>>
>>>>> exactly.....
>>>>>
>>>>> I have access to many machines and clusters.. but about 99% of my
>>>>>
>>>>>> development happens using single machine for testing. I would  
>>>>>> guess that
>>>>>>
>>>>> is
>>>>>
>>>>>> true for most of the Hadoop developers.
>>>>>>
>>>>> well this is the answer I was looking for....  :D
>>>>> seems to be I have enough resources to contribute to this project.
>>>>> Thanks a lot raghu.
>>>>>
>>>>> regards,
>>>>> Pradeep Fernando.
>>>>>
>>>>>
>

Re: Contributing to hadoop

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Ajit Ratnaparkhi wrote:
> Hi,
> thanks for your help.
> 
> I tried the above mentioned script(one mentioned by Raghu), but whenever i
> execute it, following message gets displayed,
> *datanode running as process <process_id>. Stop it first*.
> I am starting the single node cluster by command bin/start-dfs.sh first,
> after which i am executing the above mentioned script to start second
> datanode.

Did you try to do what the error message asks you to? Better still, you 
should try to find where the message is coming from. I realize this is 
not particularly a useful reply for a user but for a developer, I hope 
it is.

I just wrote the example script in the mail editor. I did not test it.. 
may be 'export' before setting HADOOP_* env variables in the script is 
required. Currently I use a different (a bit less elegant) method for 
starting multiple nodes. When I switch to this method, I will post the 
script.

better still, post your script once you get it to working.

Raghu.

> I also tried giving seperate changed configuration from a seperate directory
> for config by executing command,
> *bin/hadoop-daemons.sh --config <config-directory-path> start datanode*
> Still it gives same message as above.
> 
> also in this thread before Ramya mentioned about DataNodeCluster.java. This
> will help, but I am not getting how to execute this class. Can you please
> help regarding this.
> 
> thanks,
> -Ajit.
> 
> 
> 
> On Thu, Feb 26, 2009 at 6:43 PM, Raghu Angadi <ra...@yahoo-inc.com> wrote:
> 
>> You can run with a small shell script. You need to override couple of
>> environment and config variables.
>>
>> something like :
>>
>> run_datanode () {
>>        DN=$2
>>        HADOOP_LOG_DIR=logs$DN
>>        HADOOP_PID_DIR=$HADOOP_LOG_DIR
>>        bin/hadoop-daemon.sh $1 datanode \
>>          -Dhadoop.tmp.dir=/some/dir/dfs$DN \
>>          -Ddfs.datanode.address=0.0.0.0:5001$DN \
>>          -Ddfs.datanode.http.address=0.0.0.0:5008$DN \
>>          -Ddfs.datanode.ipc.address=0.0.0.0:5002$DN
>> }
>>
>> You can start second datanode like : run_datanode start 2
>>
>> Pretty useful for testing.
>>
>> Raghu.
>>
>>
>> Ajit Ratnaparkhi wrote:
>>
>>> Raghu,
>>>
>>> Can you please tell me how to run multiple datanodes on one machine.
>>>
>>> thanks,
>>> -Ajit.
>>>
>>> On Thu, Feb 26, 2009 at 9:23 AM, Pradeep Fernando <pradeepfn@gmail.com
>>>> wrote:
>>>  Raghu,
>>>> I guess you are asking if it would be more convenient if one had access
>>>> to
>>>> a
>>>>
>>>>> larger cluster for development.
>>>>>
>>>> exactly.....
>>>>
>>>>  I have access to many machines and clusters.. but about 99% of my
>>>>
>>>>> development happens using single machine for testing. I would guess that
>>>>>
>>>> is
>>>>
>>>>> true for most of the Hadoop developers.
>>>>>
>>>> well this is the answer I was looking for....  :D
>>>> seems to be I have enough resources to contribute to this project.
>>>> Thanks a lot raghu.
>>>>
>>>> regards,
>>>> Pradeep Fernando.
>>>>
>>>>
>

Re: Contributing to hadoop

Posted by Bharat Jain <bh...@gmail.com>.

Hi,

I have question regarding how to go about contributing. I have 7+ exp and
earlier worked in search related tech like lucene, solr, crawlers etc while
at AOL. I have setup hadoop on cluster earlier. Are there any issues or
problems that I can start looking into to get hands on? Basically how to go
about doing some serious work?

Thanks
Bharat Jain


On Sun, Mar 1, 2009 at 3:06 AM, Ajit Ratnaparkhi <ajit.ratnaparkhi@gmail.com
> wrote:

> Hi,
> thanks for your help.
>
> I tried the above mentioned script(one mentioned by Raghu), but whenever i
> execute it, following message gets displayed,
> *datanode running as process <process_id>. Stop it first*.
> I am starting the single node cluster by command bin/start-dfs.sh first,
> after which i am executing the above mentioned script to start second
> datanode.
>
> I also tried giving seperate changed configuration from a seperate
> directory
> for config by executing command,
> *bin/hadoop-daemons.sh --config <config-directory-path> start datanode*
> Still it gives same message as above.
>
> also in this thread before Ramya mentioned about DataNodeCluster.java. This
> will help, but I am not getting how to execute this class. Can you please
> help regarding this.
>
> thanks,
> -Ajit.
>
>
>
> On Thu, Feb 26, 2009 at 6:43 PM, Raghu Angadi <ra...@yahoo-inc.com>
> wrote:
>
> >
> > You can run with a small shell script. You need to override couple of
> > environment and config variables.
> >
> > something like :
> >
> > run_datanode () {
> >        DN=$2
> >        HADOOP_LOG_DIR=logs$DN
> >        HADOOP_PID_DIR=$HADOOP_LOG_DIR
> >        bin/hadoop-daemon.sh $1 datanode \
> >          -Dhadoop.tmp.dir=/some/dir/dfs$DN \
> >          -Ddfs.datanode.address=0.0.0.0:5001$DN \
> >          -Ddfs.datanode.http.address=0.0.0.0:5008$DN \
> >          -Ddfs.datanode.ipc.address=0.0.0.0:5002$DN
> > }
> >
> > You can start second datanode like : run_datanode start 2
> >
> > Pretty useful for testing.
> >
> > Raghu.
> >
> >
> > Ajit Ratnaparkhi wrote:
> >
> >> Raghu,
> >>
> >> Can you please tell me how to run multiple datanodes on one machine.
> >>
> >> thanks,
> >> -Ajit.
> >>
> >> On Thu, Feb 26, 2009 at 9:23 AM, Pradeep Fernando <pradeepfn@gmail.com
> >> >wrote:
> >>
> >>  Raghu,
> >>>
> >>> I guess you are asking if it would be more convenient if one had access
> >>> to
> >>> a
> >>>
> >>>> larger cluster for development.
> >>>>
> >>>
> >>> exactly.....
> >>>
> >>>  I have access to many machines and clusters.. but about 99% of my
> >>>
> >>>> development happens using single machine for testing. I would guess
> that
> >>>>
> >>> is
> >>>
> >>>> true for most of the Hadoop developers.
> >>>>
> >>>
> >>> well this is the answer I was looking for....  :D
> >>> seems to be I have enough resources to contribute to this project.
> >>> Thanks a lot raghu.
> >>>
> >>> regards,
> >>> Pradeep Fernando.
> >>>
> >>>
> >>
> >
>