You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Dmitry Pushkarev <um...@stanford.edu> on 2008/09/01 09:28:50 UTC

datanodes in virtual networks.

Dear hadoop users,

 

Our lab in slowly switching from SGE to hadoop, however not everything seems
to be easy and obvious. We are in no way computer scientists, we're just
physicists, biologist and couple of statisticians trying to solve our
computational problems, please take this into consideration if questions
will look to you obvious..  

Our setup:

1.       Data cluster - 4 Raided and Hadooped servers, with 2TB of storage
each, they all have real IP addresses, one of them reserved for NameNode.

2.       Computational cluster:  100 dualcore servers running Sun Grid
Engine, they live on virtual network (10.0.0.X) and can connect to outside
world, but not accessible from out of the cluster. On these we don't have
root access, and these are shared via SGE with other people, who get
reasonably nervous when see idle reserved servers. 

 

Basic Idea is to create on-demand computational cluster,  which when needed
will reserve servers from second cluster run jobs and let them go.

 

Currently it is done via script that reserves server for namenode 25 servers
for datanode copies data from first cluster, runs job, send result back and
releases servers. I still want to make them work together using one
namenode. 

 

After a week playing with hadoop I couldn't answer some of my question vie
thorough RTFM, so I'd really appreciate is you can answer at least some of
them in our context:

 

1.       Is it possible to connect servers from second cluster to first
namenode? What worries me is implementation of data-transfer protocol,
because some of the nodes cannot be reached but they can easily reach any
other node.  Will hadoop try to establish connection both ways to transfer
data between nodes?

 

2.       It is possible to specify "reliability" of the node, that is to
make replica on the node with raid installed counts as two replicas as
probability of failure is much lower. 

 

3.       I also bumped into problems with decommissioning, after I add hosts
to free to dfs.hosts.exclude file and refreshNodes, they are marked as
"Decommission in progress" for days, even though data is removed from them
within first several minutes. What I currently do is shoot them down with
some delay, but I really hope to see "Decommissioned" one day. What am I
probably doing wrong?

 

4.       The same question about dead hosts. I do a simple exercise: I
create 20 datanodes on empty cluster, then I kill 15 of them and try to
store a file on HDFS, hadoop fails because some nodes that it thinks "in
service" aren't accessible. Is it possible to tell hadoop to remove these
nodes from the list and do not try to store data on them? My current
solution is hadoop-stop/start via cron every hour.

 

5.       We also have some external secure storage that can be accesses via
NFS from fists DATA cluster,  and it'd be great if I could somehow mount
this storage to HDFS folder and tell hadoop that all data written to that
folder shouldn't be replicated rather they should go directly to NFS.

 

6.       Ironically none of us who uses cluster knows java, and most tasks
are launched via streaming with C++ programs/perl scripts.  The problem is
how to write/read files from HDFS in this context, we currently use things
like   -moveFromLocal  but it doesn't seems to be right answer, because it
slows things down a lot.

 

7.       On one of the DataCluster machines with run pretty large MySQL
database, and just thinking whether it is possible to spread database across
the cluster, has anyone tried that?

 

8.       Fuse-hdfs works great, but we really hope to be able to write to
HDFS someday, how to enable it?

 

9.       And may be someone can point out where to look for ways to specify
how to partition data for the map jobs, in some our tasks processing of one
line of input file takes several minutes, currently we split these files to
many one-line files and process them independently, but a simple
streaming-compatible way to tell hadoop that for example we want each job to
take 10 lines or to split the 10kb input file into 10000 map tasks would
help as a lot!

 

 

 

Thanks in advance.

Re: datanodes in virtual networks.

Posted by Andrey Pankov <ap...@iponweb.net>.

Hi, Dmitry!

Please, take a look into Webdav server for HDFS. It supports
read/write already, more details at http://www.hadoop.iponweb.net/

On Mon, Sep 1, 2008 at 7:28 AM, Dmitry Pushkarev <um...@stanford.edu> wrote:
> Dear hadoop users,
>
>
>
> Our lab in slowly switching from SGE to hadoop, however not everything seems
> to be easy and obvious. We are in no way computer scientists, we're just
> physicists, biologist and couple of statisticians trying to solve our
> computational problems, please take this into consideration if questions
> will look to you obvious..
>
> Our setup:
>
> 1.       Data cluster - 4 Raided and Hadooped servers, with 2TB of storage
> each, they all have real IP addresses, one of them reserved for NameNode.
>
> 2.       Computational cluster:  100 dualcore servers running Sun Grid
> Engine, they live on virtual network (10.0.0.X) and can connect to outside
> world, but not accessible from out of the cluster. On these we don't have
> root access, and these are shared via SGE with other people, who get
> reasonably nervous when see idle reserved servers.
>
>
>
> Basic Idea is to create on-demand computational cluster,  which when needed
> will reserve servers from second cluster run jobs and let them go.
>
>
>
> Currently it is done via script that reserves server for namenode 25 servers
> for datanode copies data from first cluster, runs job, send result back and
> releases servers. I still want to make them work together using one
> namenode.
>
>
>
> After a week playing with hadoop I couldn't answer some of my question vie
> thorough RTFM, so I'd really appreciate is you can answer at least some of
> them in our context:
>
>
>
> 1.       Is it possible to connect servers from second cluster to first
> namenode? What worries me is implementation of data-transfer protocol,
> because some of the nodes cannot be reached but they can easily reach any
> other node.  Will hadoop try to establish connection both ways to transfer
> data between nodes?
>
>
>
> 2.       It is possible to specify "reliability" of the node, that is to
> make replica on the node with raid installed counts as two replicas as
> probability of failure is much lower.
>
>
>
> 3.       I also bumped into problems with decommissioning, after I add hosts
> to free to dfs.hosts.exclude file and refreshNodes, they are marked as
> "Decommission in progress" for days, even though data is removed from them
> within first several minutes. What I currently do is shoot them down with
> some delay, but I really hope to see "Decommissioned" one day. What am I
> probably doing wrong?
>
>
>
> 4.       The same question about dead hosts. I do a simple exercise: I
> create 20 datanodes on empty cluster, then I kill 15 of them and try to
> store a file on HDFS, hadoop fails because some nodes that it thinks "in
> service" aren't accessible. Is it possible to tell hadoop to remove these
> nodes from the list and do not try to store data on them? My current
> solution is hadoop-stop/start via cron every hour.
>
>
>
> 5.       We also have some external secure storage that can be accesses via
> NFS from fists DATA cluster,  and it'd be great if I could somehow mount
> this storage to HDFS folder and tell hadoop that all data written to that
> folder shouldn't be replicated rather they should go directly to NFS.
>
>
>
> 6.       Ironically none of us who uses cluster knows java, and most tasks
> are launched via streaming with C++ programs/perl scripts.  The problem is
> how to write/read files from HDFS in this context, we currently use things
> like   -moveFromLocal  but it doesn't seems to be right answer, because it
> slows things down a lot.
>
>
>
> 7.       On one of the DataCluster machines with run pretty large MySQL
> database, and just thinking whether it is possible to spread database across
> the cluster, has anyone tried that?
>
>
>
> 8.       Fuse-hdfs works great, but we really hope to be able to write to
> HDFS someday, how to enable it?
>
>
>
> 9.       And may be someone can point out where to look for ways to specify
> how to partition data for the map jobs, in some our tasks processing of one
> line of input file takes several minutes, currently we split these files to
> many one-line files and process them independently, but a simple
> streaming-compatible way to tell hadoop that for example we want each job to
> take 10 lines or to split the 10kb input file into 10000 map tasks would
> help as a lot!
>
>
>
>
>
>
>
> Thanks in advance.
>
>
>
>
>
>



-- 
Andrey Pankov

Re: datanodes in virtual networks.

Posted by Steve Loughran <st...@apache.org>.

Dmitry Pushkarev wrote:
> Dear hadoop users,
> 
>  
> 
> Our lab in slowly switching from SGE to hadoop, however not everything seems
> to be easy and obvious. We are in no way computer scientists, we're just
> physicists, biologist and couple of statisticians trying to solve our
> computational problems, please take this into consideration if questions
> will look to you obvious..  
> 
> Our setup:
> 
> 1.       Data cluster - 4 Raided and Hadooped servers, with 2TB of storage
> each, they all have real IP addresses, one of them reserved for NameNode.
> 
> 2.       Computational cluster:  100 dualcore servers running Sun Grid
> Engine, they live on virtual network (10.0.0.X) and can connect to outside
> world, but not accessible from out of the cluster. On these we don't have
> root access, and these are shared via SGE with other people, who get
> reasonably nervous when see idle reserved servers. 
> 
>  
> 
> Basic Idea is to create on-demand computational cluster,  which when needed
> will reserve servers from second cluster run jobs and let them go.

OK. There are some hints that you can run hadoop atop SGE, though I've 
not tried it.

> 
>  
> 
> Currently it is done via script that reserves server for namenode 25 servers
> for datanode copies data from first cluster, runs job, send result back and
> releases servers. I still want to make them work together using one
> namenode. 
> 
>  
> 
> After a week playing with hadoop I couldn't answer some of my question vie
> thorough RTFM, so I'd really appreciate is you can answer at least some of
> them in our context:
> 

I'll answer the questions I can; leave the other q's to others

>  
> 
> 1.       Is it possible to connect servers from second cluster to first
> namenode? What worries me is implementation of data-transfer protocol,
> because some of the nodes cannot be reached but they can easily reach any
> other node.  Will hadoop try to establish connection both ways to transfer
> data between nodes?

There's an assumption that every datanode belongs to a single namenode. 
You can bring up task trackers on separate machines/networks from the 
job tracker, as long as they are set up to point to it. The task-tracker 
to job tracker communications should be ok; its the transfer of between 
the task tracker and the filesystem that you have to worry about.

>  
> 
> 2.       It is possible to specify "reliability" of the node, that is to
> make replica on the node with raid installed counts as two replicas as
> probability of failure is much lower. 

Not that I'm aware of.

> 
>  
> 
> 3.       I also bumped into problems with decommissioning, after I add hosts
> to free to dfs.hosts.exclude file and refreshNodes, they are marked as
> "Decommission in progress" for days, even though data is removed from them
> within first several minutes. What I currently do is shoot them down with
> some delay, but I really hope to see "Decommissioned" one day. What am I
> probably doing wrong?
> 
>  
> 
> 4.       The same question about dead hosts. I do a simple exercise: I
> create 20 datanodes on empty cluster, then I kill 15 of them and try to
> store a file on HDFS, hadoop fails because some nodes that it thinks "in
> service" aren't accessible. Is it possible to tell hadoop to remove these
> nodes from the list and do not try to store data on them? My current
> solution is hadoop-stop/start via cron every hour.

It sounds like the namenode should be doing more checking that the 
datanodes are live.

> 
>  
> 
> 5.       We also have some external secure storage that can be accesses via
> NFS from fists DATA cluster,  and it'd be great if I could somehow mount
> this storage to HDFS folder and tell hadoop that all data written to that
> folder shouldn't be replicated rather they should go directly to NFS.

You can certainly copy data in and out to NFS filestores without using 
HDFS; you can run tasks against NFS data without even running an HDFS 
filesystem. That is probably your best tactic. Trying to run HDFS on top 
of NFS is something that worries me; too many points of failure are 
being stacked up.


> 
>  
> 
> 6.       Ironically none of us who uses cluster knows java, and most tasks
> are launched via streaming with C++ programs/perl scripts.  The problem is
> how to write/read files from HDFS in this context, we currently use things
> like   -moveFromLocal  but it doesn't seems to be right answer, because it
> slows things down a lot.
> 
>  
> 
> 7.       On one of the DataCluster machines with run pretty large MySQL
> database, and just thinking whether it is possible to spread database across
> the cluster, has anyone tried that?


HBase

> 
>  
> 
> 8.       Fuse-hdfs works great, but we really hope to be able to write to
> HDFS someday, how to enable it?

There is a patch in SVN_HEAD for a thrift API to HDFS; this is 
accessible from C++ and perl