You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by firantika <fi...@gmail.com> on 2011/10/21 02:03:40 UTC

running sqoop on hadoop cluster

Hi All,
i'm newbie on hadoop,

if i installed hadoop on 2 node, where is hdfs running ? on master or slave
node ?

and then if i running sqoop for export dbms to hive, is it give effect on
speed up system between hadoop which running on single node and hadoop multi
node ?

please give me explaining ? 


Tks


-- 
View this message in context: http://old.nabble.com/running-sqoop-on-hadoop-cluster-tp32693398p32693398.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: running sqoop on hadoop cluster

Posted by Bejoy KS <be...@gmail.com>.

Hi Firantika
           HDFS is the underlying file system and the meta data to HDFS is
stored in Name Node and the actual data blocks are in DataNode. You can have
a NameNode and DataNode running on the same physical machine then the
metadata and some data blocks would be on same physical machine. But ideally
in production clusters it never happens so. Better to have a little larger
cluster so that data reliability  holds good with replication in hdfs. AFAIK
hdfs is not a process that runs on any node, there are 5 basic process in
hadoop they are

   1. Name Node
   2. Secondary Name Node
   3. Job Tracker
   4. Data Node
   5. Task Tracker

            SQOOP uses map reduce under the hood for import/export
processes. If you have more nodes or rather more task tracker slots(map task
slots) with optimal memory for each, you can spawn more no of parallel tasks
for a single sqoop import. But parallelism with sqoop is agin dependent on
your source db, on how many parallel connections it can handle.
             Bottom Line you need to have more number of nodes in your
cluster to use it in production. For development purposes this configuration
would be fine. There are a good number of tutorials available in cloudera
and yahoo blogs which would give you a better insight on your queries.

Hope it helps!...

Thank You

Regards
Bejoy.K.S



On Fri, Oct 21, 2011 at 12:42 PM, Alexander C.H. Lorenz <
wget.null@googlemail.com> wrote:

> Hi,
>
> first setup a valid cluster:
> namenode, secondary namenode, jobtracker + datanodes with tasktracker.
>
> After that install sqoop on a datanode and play with ;)
>
> Here a howto for RedHat (CentOS)
> http://mapredit.blogspot.com/p/get-hadoop-cluster-running-in-20.html
>
> and for Ubuntu:
>
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>
> regards,
>  Alex
>
> On Fri, Oct 21, 2011 at 2:03 AM, firantika <firantika.agustina@gmail.com
> >wrote:
>
> >
> > Hi All,
> > i'm newbie on hadoop,
> >
> > if i installed hadoop on 2 node, where is hdfs running ? on master or
> slave
> > node ?
> >
> > and then if i running sqoop for export dbms to hive, is it give effect on
> > speed up system between hadoop which running on single node and hadoop
> > multi
> > node ?
> >
> > please give me explaining ?
> >
> >
> > Tks
> >
> >
> > --
> > View this message in context:
> >
> http://old.nabble.com/running-sqoop-on-hadoop-cluster-tp32693398p32693398.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>

Re: running sqoop on hadoop cluster

Posted by "Alexander C.H. Lorenz" <wg...@googlemail.com>.

Hi,

first setup a valid cluster:
namenode, secondary namenode, jobtracker + datanodes with tasktracker.

After that install sqoop on a datanode and play with ;)

Here a howto for RedHat (CentOS)
http://mapredit.blogspot.com/p/get-hadoop-cluster-running-in-20.html

and for Ubuntu:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

regards,
 Alex

On Fri, Oct 21, 2011 at 2:03 AM, firantika <fi...@gmail.com>wrote:

>
> Hi All,
> i'm newbie on hadoop,
>
> if i installed hadoop on 2 node, where is hdfs running ? on master or slave
> node ?
>
> and then if i running sqoop for export dbms to hive, is it give effect on
> speed up system between hadoop which running on single node and hadoop
> multi
> node ?
>
> please give me explaining ?
>
>
> Tks
>
>
> --
> View this message in context:
> http://old.nabble.com/running-sqoop-on-hadoop-cluster-tp32693398p32693398.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


-- 
Alexander Lorenz
http://mapredit.blogspot.com

Re: running sqoop on hadoop cluster

Posted by Luca Pireddu <pi...@crs4.it>.

On 10/24/2011 12:26 PM, oleksiy wrote:
>
> Hello,
>
> As for the first question, I can say that you can understand where is the
> master node by looking at "masters" file in the "conf" folder of hadoop.
> There you specify master node.

Actually, the "masters" file contains the name of the secondary namenode 
(see http://developer.yahoo.com/hadoop/tutorial/module7.html#configs). 
You probably want to leave that blank on a two node cluster.

> firantika wrote:
>>
>> Hi All,
>> i'm newbie on hadoop,
>>
>> if i installed hadoop on 2 node, where is hdfs running ? on master or
>> slave node ?

HDFS runs on all nodes.  The nameserver runs on the head node (the one 
from where you started the HDFS with start-dfs.sh or start-all.sh).  The 
slave nodes each run a DataNode daemon.

>> and then if i running sqoop for export dbms to hive, is it give effect on
>> speed up system between hadoop which running on single node and hadoop
>> multi node ?
>>
>> please give me explaining ?

Sorry, don't know.

-- 
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel: +39 0709250452

Re: running sqoop on hadoop cluster

Posted by oleksiy <ga...@mail.ru>.

Hello,

As for the first question, I can say that you can understand where is the
master node by looking at "masters" file in the "conf" folder of hadoop.
There you specify master node.


firantika wrote:
> 
> Hi All,
> i'm newbie on hadoop,
> 
> if i installed hadoop on 2 node, where is hdfs running ? on master or
> slave node ?
> 
> and then if i running sqoop for export dbms to hive, is it give effect on
> speed up system between hadoop which running on single node and hadoop
> multi node ?
> 
> please give me explaining ? 
> 
> 
> Tks
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/running-sqoop-on-hadoop-cluster-tp32693398p32709265.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.