You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Cao Yi <ir...@gmail.com> on 2015/01/14 09:32:41 UTC

How to run a mapreduce program not on the node of hadoop cluster?

Hi,

I write some mapreduce code in my project *my_prj*. *my_prj *will be
deployed on the machine which is not a node of the cluster.
how does *my_prj* to run a mapreduce job in this case?

thank you!

Best Regards,
Iridium

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

Thanks, Ahmed!

As my hadoop cluster is built on virtual machines, I cloned a node
(namenode and datanode are both ok), changed another hostname and  static
IP, it works!

Best Regards,
Iridium

On Tue, Jan 20, 2015 at 4:46 PM, Ahmed Ossama <ah...@aossama.com> wrote:

>  The naming differs, Dell call it edge node, Cloudera call it gateway.
>
> In the end, it's just a machine which has hadoop libraries and ecosystem
> deployed to it and working as a client.
>
> Building this node is similar to the rest of the nodes except that it
> doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and
> submit jobs to your cluster from this node.
>
> I guess you came across this link, but it's worth mentioning
> http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html
>
>
> On 01/20/2015 10:15 AM, Cao Yi wrote:
>
>  thank you, Ahmed! I have another question: how to build an edge node and
> how to use it? can you refer some docs?
>
>  PS, I searched and found so many pages, some called it "client node",
> but no page tells the details of building an edge node, and how to use it.
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ah...@aossama.com> wrote:
>
>>  The node that the project will be deployed on should have the same
>> configuration as the cluster, and hadoop executables as well. It doesn't
>> have to be one of the cluster nodes.
>>
>> This node is typically called a gateway or edge node, where it have all
>> the client programs (hadoop execs, pig, etc...) and you use this node to
>> submit jobs to the cluster.
>>
>> The executable use the configuration to know where to submit jobs and
>> where is your hdfs nn located and so on.
>>
>> On 01/14/2015 04:39 PM, Cao Yi wrote:
>>
>> The program will be used in product environment.
>> Does you mean that the program must be deployed on any node of the
>> cluster?
>>
>>
>>  I have some experience in operating database, I can
>> query/edit/add/remove data on the OS witch the database installed on, or
>> operate from the other machine remotely. Can I use Hadoop remotely as to
>> use database in a similar way?
>>
>>  Best Regards,
>> Iridium
>>
>> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
>> wrote:
>>
>>> Your data wont get splitted. so your program runs as single mapper and
>>> single reducer. And your intermediate data is not shuffeld and sorted, But
>>> u can use this for debuging
>>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>>>
>>>>  Hi,
>>>>
>>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>>> deployed on the machine which is not a node of the cluster.
>>>> how does *my_prj* to run a mapreduce job in this case?
>>>>
>>>>  thank you!
>>>>
>>>>  Best Regards,
>>>> Iridium
>>>>
>>>
>>
>> --
>> Regards,
>> Ahmed Ossama
>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

Thanks, Ahmed!

As my hadoop cluster is built on virtual machines, I cloned a node
(namenode and datanode are both ok), changed another hostname and  static
IP, it works!

Best Regards,
Iridium

On Tue, Jan 20, 2015 at 4:46 PM, Ahmed Ossama <ah...@aossama.com> wrote:

>  The naming differs, Dell call it edge node, Cloudera call it gateway.
>
> In the end, it's just a machine which has hadoop libraries and ecosystem
> deployed to it and working as a client.
>
> Building this node is similar to the rest of the nodes except that it
> doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and
> submit jobs to your cluster from this node.
>
> I guess you came across this link, but it's worth mentioning
> http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html
>
>
> On 01/20/2015 10:15 AM, Cao Yi wrote:
>
>  thank you, Ahmed! I have another question: how to build an edge node and
> how to use it? can you refer some docs?
>
>  PS, I searched and found so many pages, some called it "client node",
> but no page tells the details of building an edge node, and how to use it.
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ah...@aossama.com> wrote:
>
>>  The node that the project will be deployed on should have the same
>> configuration as the cluster, and hadoop executables as well. It doesn't
>> have to be one of the cluster nodes.
>>
>> This node is typically called a gateway or edge node, where it have all
>> the client programs (hadoop execs, pig, etc...) and you use this node to
>> submit jobs to the cluster.
>>
>> The executable use the configuration to know where to submit jobs and
>> where is your hdfs nn located and so on.
>>
>> On 01/14/2015 04:39 PM, Cao Yi wrote:
>>
>> The program will be used in product environment.
>> Does you mean that the program must be deployed on any node of the
>> cluster?
>>
>>
>>  I have some experience in operating database, I can
>> query/edit/add/remove data on the OS witch the database installed on, or
>> operate from the other machine remotely. Can I use Hadoop remotely as to
>> use database in a similar way?
>>
>>  Best Regards,
>> Iridium
>>
>> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
>> wrote:
>>
>>> Your data wont get splitted. so your program runs as single mapper and
>>> single reducer. And your intermediate data is not shuffeld and sorted, But
>>> u can use this for debuging
>>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>>>
>>>>  Hi,
>>>>
>>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>>> deployed on the machine which is not a node of the cluster.
>>>> how does *my_prj* to run a mapreduce job in this case?
>>>>
>>>>  thank you!
>>>>
>>>>  Best Regards,
>>>> Iridium
>>>>
>>>
>>
>> --
>> Regards,
>> Ahmed Ossama
>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

Thanks, Ahmed!

As my hadoop cluster is built on virtual machines, I cloned a node
(namenode and datanode are both ok), changed another hostname and  static
IP, it works!

Best Regards,
Iridium

On Tue, Jan 20, 2015 at 4:46 PM, Ahmed Ossama <ah...@aossama.com> wrote:

>  The naming differs, Dell call it edge node, Cloudera call it gateway.
>
> In the end, it's just a machine which has hadoop libraries and ecosystem
> deployed to it and working as a client.
>
> Building this node is similar to the rest of the nodes except that it
> doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and
> submit jobs to your cluster from this node.
>
> I guess you came across this link, but it's worth mentioning
> http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html
>
>
> On 01/20/2015 10:15 AM, Cao Yi wrote:
>
>  thank you, Ahmed! I have another question: how to build an edge node and
> how to use it? can you refer some docs?
>
>  PS, I searched and found so many pages, some called it "client node",
> but no page tells the details of building an edge node, and how to use it.
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ah...@aossama.com> wrote:
>
>>  The node that the project will be deployed on should have the same
>> configuration as the cluster, and hadoop executables as well. It doesn't
>> have to be one of the cluster nodes.
>>
>> This node is typically called a gateway or edge node, where it have all
>> the client programs (hadoop execs, pig, etc...) and you use this node to
>> submit jobs to the cluster.
>>
>> The executable use the configuration to know where to submit jobs and
>> where is your hdfs nn located and so on.
>>
>> On 01/14/2015 04:39 PM, Cao Yi wrote:
>>
>> The program will be used in product environment.
>> Does you mean that the program must be deployed on any node of the
>> cluster?
>>
>>
>>  I have some experience in operating database, I can
>> query/edit/add/remove data on the OS witch the database installed on, or
>> operate from the other machine remotely. Can I use Hadoop remotely as to
>> use database in a similar way?
>>
>>  Best Regards,
>> Iridium
>>
>> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
>> wrote:
>>
>>> Your data wont get splitted. so your program runs as single mapper and
>>> single reducer. And your intermediate data is not shuffeld and sorted, But
>>> u can use this for debuging
>>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>>>
>>>>  Hi,
>>>>
>>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>>> deployed on the machine which is not a node of the cluster.
>>>> how does *my_prj* to run a mapreduce job in this case?
>>>>
>>>>  thank you!
>>>>
>>>>  Best Regards,
>>>> Iridium
>>>>
>>>
>>
>> --
>> Regards,
>> Ahmed Ossama
>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

Thanks, Ahmed!

As my hadoop cluster is built on virtual machines, I cloned a node
(namenode and datanode are both ok), changed another hostname and  static
IP, it works!

Best Regards,
Iridium

On Tue, Jan 20, 2015 at 4:46 PM, Ahmed Ossama <ah...@aossama.com> wrote:

>  The naming differs, Dell call it edge node, Cloudera call it gateway.
>
> In the end, it's just a machine which has hadoop libraries and ecosystem
> deployed to it and working as a client.
>
> Building this node is similar to the rest of the nodes except that it
> doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and
> submit jobs to your cluster from this node.
>
> I guess you came across this link, but it's worth mentioning
> http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html
>
>
> On 01/20/2015 10:15 AM, Cao Yi wrote:
>
>  thank you, Ahmed! I have another question: how to build an edge node and
> how to use it? can you refer some docs?
>
>  PS, I searched and found so many pages, some called it "client node",
> but no page tells the details of building an edge node, and how to use it.
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ah...@aossama.com> wrote:
>
>>  The node that the project will be deployed on should have the same
>> configuration as the cluster, and hadoop executables as well. It doesn't
>> have to be one of the cluster nodes.
>>
>> This node is typically called a gateway or edge node, where it have all
>> the client programs (hadoop execs, pig, etc...) and you use this node to
>> submit jobs to the cluster.
>>
>> The executable use the configuration to know where to submit jobs and
>> where is your hdfs nn located and so on.
>>
>> On 01/14/2015 04:39 PM, Cao Yi wrote:
>>
>> The program will be used in product environment.
>> Does you mean that the program must be deployed on any node of the
>> cluster?
>>
>>
>>  I have some experience in operating database, I can
>> query/edit/add/remove data on the OS witch the database installed on, or
>> operate from the other machine remotely. Can I use Hadoop remotely as to
>> use database in a similar way?
>>
>>  Best Regards,
>> Iridium
>>
>> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
>> wrote:
>>
>>> Your data wont get splitted. so your program runs as single mapper and
>>> single reducer. And your intermediate data is not shuffeld and sorted, But
>>> u can use this for debuging
>>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>>>
>>>>  Hi,
>>>>
>>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>>> deployed on the machine which is not a node of the cluster.
>>>> how does *my_prj* to run a mapreduce job in this case?
>>>>
>>>>  thank you!
>>>>
>>>>  Best Regards,
>>>> Iridium
>>>>
>>>
>>
>> --
>> Regards,
>> Ahmed Ossama
>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Ahmed Ossama <ah...@aossama.com>.

The naming differs, Dell call it edge node, Cloudera call it gateway.

In the end, it's just a machine which has hadoop libraries and ecosystem 
deployed to it and working as a client.

Building this node is similar to the rest of the nodes except that it 
doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and 
submit jobs to your cluster from this node.

I guess you came across this link, but it's worth mentioning 
http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html

On 01/20/2015 10:15 AM, Cao Yi wrote:
> thank you, Ahmed! I have another question: how to build an edge node 
> and how to use it? can you refer some docs?
>
> PS, I searched and found so many pages, some called it "client node", 
> but no page tells the details of building an edge node, and how to use it.
>
> Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ahmed@aossama.com 
> <ma...@aossama.com>> wrote:
>
>     The node that the project will be deployed on should have the same
>     configuration as the cluster, and hadoop executables as well. It
>     doesn't have to be one of the cluster nodes.
>
>     This node is typically called a gateway or edge node, where it
>     have all the client programs (hadoop execs, pig, etc...) and you
>     use this node to submit jobs to the cluster.
>
>     The executable use the configuration to know where to submit jobs
>     and where is your hdfs nn located and so on.
>
>     On 01/14/2015 04:39 PM, Cao Yi wrote:
>>     The program will be used in product environment.
>>     Does you mean that the program must be deployed on any node of
>>     the cluster?
>>
>>
>>     I have some experience in operating database, I can
>>     query/edit/add/remove data on the OS witch the database installed
>>     on, or operate from the other machine remotely. Can I use Hadoop
>>     remotely as to use database in a similar way?
>>
>>     Best Regards,
>>     Iridium
>>
>>     On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni
>>     <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Your data wont get splitted. so your program runs as single
>>         mapper and single reducer. And your intermediate data is not
>>         shuffeld and sorted, But u can use this for debuging
>>
>>         On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com
>>         <ma...@gmail.com>> wrote:
>>
>>             Hi,
>>
>>             I write some mapreduce code in my project /my_prj/.
>>             /my_prj /will be deployed on the machine which is not a
>>             node of the cluster.
>>             how does /my_prj/ to run a mapreduce job in this case?
>>
>>             thank you!
>>
>>             Best Regards,
>>             Iridium
>>
>>
>
>     -- 
>     Regards,
>     Ahmed Ossama
>
>

-- 
Regards,
Ahmed Ossama

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Ahmed Ossama <ah...@aossama.com>.

The naming differs, Dell call it edge node, Cloudera call it gateway.

In the end, it's just a machine which has hadoop libraries and ecosystem 
deployed to it and working as a client.

Building this node is similar to the rest of the nodes except that it 
doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and 
submit jobs to your cluster from this node.

I guess you came across this link, but it's worth mentioning 
http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html

On 01/20/2015 10:15 AM, Cao Yi wrote:
> thank you, Ahmed! I have another question: how to build an edge node 
> and how to use it? can you refer some docs?
>
> PS, I searched and found so many pages, some called it "client node", 
> but no page tells the details of building an edge node, and how to use it.
>
> Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ahmed@aossama.com 
> <ma...@aossama.com>> wrote:
>
>     The node that the project will be deployed on should have the same
>     configuration as the cluster, and hadoop executables as well. It
>     doesn't have to be one of the cluster nodes.
>
>     This node is typically called a gateway or edge node, where it
>     have all the client programs (hadoop execs, pig, etc...) and you
>     use this node to submit jobs to the cluster.
>
>     The executable use the configuration to know where to submit jobs
>     and where is your hdfs nn located and so on.
>
>     On 01/14/2015 04:39 PM, Cao Yi wrote:
>>     The program will be used in product environment.
>>     Does you mean that the program must be deployed on any node of
>>     the cluster?
>>
>>
>>     I have some experience in operating database, I can
>>     query/edit/add/remove data on the OS witch the database installed
>>     on, or operate from the other machine remotely. Can I use Hadoop
>>     remotely as to use database in a similar way?
>>
>>     Best Regards,
>>     Iridium
>>
>>     On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni
>>     <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Your data wont get splitted. so your program runs as single
>>         mapper and single reducer. And your intermediate data is not
>>         shuffeld and sorted, But u can use this for debuging
>>
>>         On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com
>>         <ma...@gmail.com>> wrote:
>>
>>             Hi,
>>
>>             I write some mapreduce code in my project /my_prj/.
>>             /my_prj /will be deployed on the machine which is not a
>>             node of the cluster.
>>             how does /my_prj/ to run a mapreduce job in this case?
>>
>>             thank you!
>>
>>             Best Regards,
>>             Iridium
>>
>>
>
>     -- 
>     Regards,
>     Ahmed Ossama
>
>

-- 
Regards,
Ahmed Ossama

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Ahmed Ossama <ah...@aossama.com>.

The naming differs, Dell call it edge node, Cloudera call it gateway.

In the end, it's just a machine which has hadoop libraries and ecosystem 
deployed to it and working as a client.

Building this node is similar to the rest of the nodes except that it 
doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and 
submit jobs to your cluster from this node.

I guess you came across this link, but it's worth mentioning 
http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html

On 01/20/2015 10:15 AM, Cao Yi wrote:
> thank you, Ahmed! I have another question: how to build an edge node 
> and how to use it? can you refer some docs?
>
> PS, I searched and found so many pages, some called it "client node", 
> but no page tells the details of building an edge node, and how to use it.
>
> Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ahmed@aossama.com 
> <ma...@aossama.com>> wrote:
>
>     The node that the project will be deployed on should have the same
>     configuration as the cluster, and hadoop executables as well. It
>     doesn't have to be one of the cluster nodes.
>
>     This node is typically called a gateway or edge node, where it
>     have all the client programs (hadoop execs, pig, etc...) and you
>     use this node to submit jobs to the cluster.
>
>     The executable use the configuration to know where to submit jobs
>     and where is your hdfs nn located and so on.
>
>     On 01/14/2015 04:39 PM, Cao Yi wrote:
>>     The program will be used in product environment.
>>     Does you mean that the program must be deployed on any node of
>>     the cluster?
>>
>>
>>     I have some experience in operating database, I can
>>     query/edit/add/remove data on the OS witch the database installed
>>     on, or operate from the other machine remotely. Can I use Hadoop
>>     remotely as to use database in a similar way?
>>
>>     Best Regards,
>>     Iridium
>>
>>     On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni
>>     <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Your data wont get splitted. so your program runs as single
>>         mapper and single reducer. And your intermediate data is not
>>         shuffeld and sorted, But u can use this for debuging
>>
>>         On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com
>>         <ma...@gmail.com>> wrote:
>>
>>             Hi,
>>
>>             I write some mapreduce code in my project /my_prj/.
>>             /my_prj /will be deployed on the machine which is not a
>>             node of the cluster.
>>             how does /my_prj/ to run a mapreduce job in this case?
>>
>>             thank you!
>>
>>             Best Regards,
>>             Iridium
>>
>>
>
>     -- 
>     Regards,
>     Ahmed Ossama
>
>

-- 
Regards,
Ahmed Ossama

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Ahmed Ossama <ah...@aossama.com>.

The naming differs, Dell call it edge node, Cloudera call it gateway.

In the end, it's just a machine which has hadoop libraries and ecosystem 
deployed to it and working as a client.

Building this node is similar to the rest of the nodes except that it 
doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and 
submit jobs to your cluster from this node.

I guess you came across this link, but it's worth mentioning 
http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html

On 01/20/2015 10:15 AM, Cao Yi wrote:
> thank you, Ahmed! I have another question: how to build an edge node 
> and how to use it? can you refer some docs?
>
> PS, I searched and found so many pages, some called it "client node", 
> but no page tells the details of building an edge node, and how to use it.
>
> Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ahmed@aossama.com 
> <ma...@aossama.com>> wrote:
>
>     The node that the project will be deployed on should have the same
>     configuration as the cluster, and hadoop executables as well. It
>     doesn't have to be one of the cluster nodes.
>
>     This node is typically called a gateway or edge node, where it
>     have all the client programs (hadoop execs, pig, etc...) and you
>     use this node to submit jobs to the cluster.
>
>     The executable use the configuration to know where to submit jobs
>     and where is your hdfs nn located and so on.
>
>     On 01/14/2015 04:39 PM, Cao Yi wrote:
>>     The program will be used in product environment.
>>     Does you mean that the program must be deployed on any node of
>>     the cluster?
>>
>>
>>     I have some experience in operating database, I can
>>     query/edit/add/remove data on the OS witch the database installed
>>     on, or operate from the other machine remotely. Can I use Hadoop
>>     remotely as to use database in a similar way?
>>
>>     Best Regards,
>>     Iridium
>>
>>     On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni
>>     <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Your data wont get splitted. so your program runs as single
>>         mapper and single reducer. And your intermediate data is not
>>         shuffeld and sorted, But u can use this for debuging
>>
>>         On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com
>>         <ma...@gmail.com>> wrote:
>>
>>             Hi,
>>
>>             I write some mapreduce code in my project /my_prj/.
>>             /my_prj /will be deployed on the machine which is not a
>>             node of the cluster.
>>             how does /my_prj/ to run a mapreduce job in this case?
>>
>>             thank you!
>>
>>             Best Regards,
>>             Iridium
>>
>>
>
>     -- 
>     Regards,
>     Ahmed Ossama
>
>

-- 
Regards,
Ahmed Ossama

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

thank you, Ahmed! I have another question: how to build an edge node and
how to use it? can you refer some docs?

PS, I searched and found so many pages, some called it "client node", but
no page tells the details of building an edge node, and how to use it.

Best Regards,
Iridium

On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ah...@aossama.com> wrote:

>  The node that the project will be deployed on should have the same
> configuration as the cluster, and hadoop executables as well. It doesn't
> have to be one of the cluster nodes.
>
> This node is typically called a gateway or edge node, where it have all
> the client programs (hadoop execs, pig, etc...) and you use this node to
> submit jobs to the cluster.
>
> The executable use the configuration to know where to submit jobs and
> where is your hdfs nn located and so on.
>
> On 01/14/2015 04:39 PM, Cao Yi wrote:
>
> The program will be used in product environment.
> Does you mean that the program must be deployed on any node of the cluster?
>
>
>  I have some experience in operating database, I can
> query/edit/add/remove data on the OS witch the database installed on, or
> operate from the other machine remotely. Can I use Hadoop remotely as to
> use database in a similar way?
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
> wrote:
>
>> Your data wont get splitted. so your program runs as single mapper and
>> single reducer. And your intermediate data is not shuffeld and sorted, But
>> u can use this for debuging
>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>>
>>>  Hi,
>>>
>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>> deployed on the machine which is not a node of the cluster.
>>> how does *my_prj* to run a mapreduce job in this case?
>>>
>>>  thank you!
>>>
>>>  Best Regards,
>>> Iridium
>>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

thank you, Ahmed! I have another question: how to build an edge node and
how to use it? can you refer some docs?

PS, I searched and found so many pages, some called it "client node", but
no page tells the details of building an edge node, and how to use it.

Best Regards,
Iridium

On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ah...@aossama.com> wrote:

>  The node that the project will be deployed on should have the same
> configuration as the cluster, and hadoop executables as well. It doesn't
> have to be one of the cluster nodes.
>
> This node is typically called a gateway or edge node, where it have all
> the client programs (hadoop execs, pig, etc...) and you use this node to
> submit jobs to the cluster.
>
> The executable use the configuration to know where to submit jobs and
> where is your hdfs nn located and so on.
>
> On 01/14/2015 04:39 PM, Cao Yi wrote:
>
> The program will be used in product environment.
> Does you mean that the program must be deployed on any node of the cluster?
>
>
>  I have some experience in operating database, I can
> query/edit/add/remove data on the OS witch the database installed on, or
> operate from the other machine remotely. Can I use Hadoop remotely as to
> use database in a similar way?
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
> wrote:
>
>> Your data wont get splitted. so your program runs as single mapper and
>> single reducer. And your intermediate data is not shuffeld and sorted, But
>> u can use this for debuging
>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>>
>>>  Hi,
>>>
>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>> deployed on the machine which is not a node of the cluster.
>>> how does *my_prj* to run a mapreduce job in this case?
>>>
>>>  thank you!
>>>
>>>  Best Regards,
>>> Iridium
>>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

thank you, Ahmed! I have another question: how to build an edge node and
how to use it? can you refer some docs?

PS, I searched and found so many pages, some called it "client node", but
no page tells the details of building an edge node, and how to use it.

Best Regards,
Iridium

On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ah...@aossama.com> wrote:

>  The node that the project will be deployed on should have the same
> configuration as the cluster, and hadoop executables as well. It doesn't
> have to be one of the cluster nodes.
>
> This node is typically called a gateway or edge node, where it have all
> the client programs (hadoop execs, pig, etc...) and you use this node to
> submit jobs to the cluster.
>
> The executable use the configuration to know where to submit jobs and
> where is your hdfs nn located and so on.
>
> On 01/14/2015 04:39 PM, Cao Yi wrote:
>
> The program will be used in product environment.
> Does you mean that the program must be deployed on any node of the cluster?
>
>
>  I have some experience in operating database, I can
> query/edit/add/remove data on the OS witch the database installed on, or
> operate from the other machine remotely. Can I use Hadoop remotely as to
> use database in a similar way?
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
> wrote:
>
>> Your data wont get splitted. so your program runs as single mapper and
>> single reducer. And your intermediate data is not shuffeld and sorted, But
>> u can use this for debuging
>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>>
>>>  Hi,
>>>
>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>> deployed on the machine which is not a node of the cluster.
>>> how does *my_prj* to run a mapreduce job in this case?
>>>
>>>  thank you!
>>>
>>>  Best Regards,
>>> Iridium
>>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

thank you, Ahmed! I have another question: how to build an edge node and
how to use it? can you refer some docs?

PS, I searched and found so many pages, some called it "client node", but
no page tells the details of building an edge node, and how to use it.

Best Regards,
Iridium

On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ah...@aossama.com> wrote:

>  The node that the project will be deployed on should have the same
> configuration as the cluster, and hadoop executables as well. It doesn't
> have to be one of the cluster nodes.
>
> This node is typically called a gateway or edge node, where it have all
> the client programs (hadoop execs, pig, etc...) and you use this node to
> submit jobs to the cluster.
>
> The executable use the configuration to know where to submit jobs and
> where is your hdfs nn located and so on.
>
> On 01/14/2015 04:39 PM, Cao Yi wrote:
>
> The program will be used in product environment.
> Does you mean that the program must be deployed on any node of the cluster?
>
>
>  I have some experience in operating database, I can
> query/edit/add/remove data on the OS witch the database installed on, or
> operate from the other machine remotely. Can I use Hadoop remotely as to
> use database in a similar way?
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
> wrote:
>
>> Your data wont get splitted. so your program runs as single mapper and
>> single reducer. And your intermediate data is not shuffeld and sorted, But
>> u can use this for debuging
>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>>
>>>  Hi,
>>>
>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>> deployed on the machine which is not a node of the cluster.
>>> how does *my_prj* to run a mapreduce job in this case?
>>>
>>>  thank you!
>>>
>>>  Best Regards,
>>> Iridium
>>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Ahmed Ossama <ah...@aossama.com>.

The node that the project will be deployed on should have the same 
configuration as the cluster, and hadoop executables as well. It doesn't 
have to be one of the cluster nodes.

This node is typically called a gateway or edge node, where it have all 
the client programs (hadoop execs, pig, etc...) and you use this node to 
submit jobs to the cluster.

The executable use the configuration to know where to submit jobs and 
where is your hdfs nn located and so on.

On 01/14/2015 04:39 PM, Cao Yi wrote:
> The program will be used in product environment.
> Does you mean that the program must be deployed on any node of the 
> cluster?
>
>
> I have some experience in operating database, I can 
> query/edit/add/remove data on the OS witch the database installed on, 
> or operate from the other machine remotely. Can I use Hadoop remotely 
> as to use database in a similar way?
>
> Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni 
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>     Your data wont get splitted. so your program runs as single mapper
>     and single reducer. And your intermediate data is not shuffeld and
>     sorted, But u can use this for debuging
>
>     On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         Hi,
>
>         I write some mapreduce code in my project /my_prj/. /my_prj
>         /will be deployed on the machine which is not a node of the
>         cluster.
>         how does /my_prj/ to run a mapreduce job in this case?
>
>         thank you!
>
>         Best Regards,
>         Iridium
>
>

-- 
Regards,
Ahmed Ossama

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Ahmed Ossama <ah...@aossama.com>.

The node that the project will be deployed on should have the same 
configuration as the cluster, and hadoop executables as well. It doesn't 
have to be one of the cluster nodes.

This node is typically called a gateway or edge node, where it have all 
the client programs (hadoop execs, pig, etc...) and you use this node to 
submit jobs to the cluster.

The executable use the configuration to know where to submit jobs and 
where is your hdfs nn located and so on.

On 01/14/2015 04:39 PM, Cao Yi wrote:
> The program will be used in product environment.
> Does you mean that the program must be deployed on any node of the 
> cluster?
>
>
> I have some experience in operating database, I can 
> query/edit/add/remove data on the OS witch the database installed on, 
> or operate from the other machine remotely. Can I use Hadoop remotely 
> as to use database in a similar way?
>
> Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni 
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>     Your data wont get splitted. so your program runs as single mapper
>     and single reducer. And your intermediate data is not shuffeld and
>     sorted, But u can use this for debuging
>
>     On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         Hi,
>
>         I write some mapreduce code in my project /my_prj/. /my_prj
>         /will be deployed on the machine which is not a node of the
>         cluster.
>         how does /my_prj/ to run a mapreduce job in this case?
>
>         thank you!
>
>         Best Regards,
>         Iridium
>
>

-- 
Regards,
Ahmed Ossama

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Ahmed Ossama <ah...@aossama.com>.

The node that the project will be deployed on should have the same 
configuration as the cluster, and hadoop executables as well. It doesn't 
have to be one of the cluster nodes.

This node is typically called a gateway or edge node, where it have all 
the client programs (hadoop execs, pig, etc...) and you use this node to 
submit jobs to the cluster.

The executable use the configuration to know where to submit jobs and 
where is your hdfs nn located and so on.

On 01/14/2015 04:39 PM, Cao Yi wrote:
> The program will be used in product environment.
> Does you mean that the program must be deployed on any node of the 
> cluster?
>
>
> I have some experience in operating database, I can 
> query/edit/add/remove data on the OS witch the database installed on, 
> or operate from the other machine remotely. Can I use Hadoop remotely 
> as to use database in a similar way?
>
> Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni 
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>     Your data wont get splitted. so your program runs as single mapper
>     and single reducer. And your intermediate data is not shuffeld and
>     sorted, But u can use this for debuging
>
>     On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         Hi,
>
>         I write some mapreduce code in my project /my_prj/. /my_prj
>         /will be deployed on the machine which is not a node of the
>         cluster.
>         how does /my_prj/ to run a mapreduce job in this case?
>
>         thank you!
>
>         Best Regards,
>         Iridium
>
>

-- 
Regards,
Ahmed Ossama

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Ahmed Ossama <ah...@aossama.com>.

The node that the project will be deployed on should have the same 
configuration as the cluster, and hadoop executables as well. It doesn't 
have to be one of the cluster nodes.

This node is typically called a gateway or edge node, where it have all 
the client programs (hadoop execs, pig, etc...) and you use this node to 
submit jobs to the cluster.

The executable use the configuration to know where to submit jobs and 
where is your hdfs nn located and so on.

On 01/14/2015 04:39 PM, Cao Yi wrote:
> The program will be used in product environment.
> Does you mean that the program must be deployed on any node of the 
> cluster?
>
>
> I have some experience in operating database, I can 
> query/edit/add/remove data on the OS witch the database installed on, 
> or operate from the other machine remotely. Can I use Hadoop remotely 
> as to use database in a similar way?
>
> Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni 
> <unmeshabiju@gmail.com <ma...@gmail.com>> wrote:
>
>     Your data wont get splitted. so your program runs as single mapper
>     and single reducer. And your intermediate data is not shuffeld and
>     sorted, But u can use this for debuging
>
>     On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         Hi,
>
>         I write some mapreduce code in my project /my_prj/. /my_prj
>         /will be deployed on the machine which is not a node of the
>         cluster.
>         how does /my_prj/ to run a mapreduce job in this case?
>
>         thank you!
>
>         Best Regards,
>         Iridium
>
>

-- 
Regards,
Ahmed Ossama

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

The program will be used in product environment.
Does you mean that the program must be deployed on any node of the cluster?

I have some experience in operating database, I can query/edit/add/remove
data on the OS witch the database installed on, or operate from the other
machine remotely. Can I use Hadoop remotely as to use database in a similar
way?

Best Regards,
Iridium

On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
wrote:

> Your data wont get splitted. so your program runs as single mapper and
> single reducer. And your intermediate data is not shuffeld and sorted, But
> u can use this for debuging
> On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>
>> Hi,
>>
>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>> deployed on the machine which is not a node of the cluster.
>> how does *my_prj* to run a mapreduce job in this case?
>>
>> thank you!
>>
>> Best Regards,
>> Iridium
>>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

The program will be used in product environment.
Does you mean that the program must be deployed on any node of the cluster?

I have some experience in operating database, I can query/edit/add/remove
data on the OS witch the database installed on, or operate from the other
machine remotely. Can I use Hadoop remotely as to use database in a similar
way?

Best Regards,
Iridium

On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
wrote:

> Your data wont get splitted. so your program runs as single mapper and
> single reducer. And your intermediate data is not shuffeld and sorted, But
> u can use this for debuging
> On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>
>> Hi,
>>
>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>> deployed on the machine which is not a node of the cluster.
>> how does *my_prj* to run a mapreduce job in this case?
>>
>> thank you!
>>
>> Best Regards,
>> Iridium
>>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

The program will be used in product environment.
Does you mean that the program must be deployed on any node of the cluster?

I have some experience in operating database, I can query/edit/add/remove
data on the OS witch the database installed on, or operate from the other
machine remotely. Can I use Hadoop remotely as to use database in a similar
way?

Best Regards,
Iridium

On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
wrote:

> Your data wont get splitted. so your program runs as single mapper and
> single reducer. And your intermediate data is not shuffeld and sorted, But
> u can use this for debuging
> On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>
>> Hi,
>>
>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>> deployed on the machine which is not a node of the cluster.
>> how does *my_prj* to run a mapreduce job in this case?
>>
>> thank you!
>>
>> Best Regards,
>> Iridium
>>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by Cao Yi <ir...@gmail.com>.

The program will be used in product environment.
Does you mean that the program must be deployed on any node of the cluster?

I have some experience in operating database, I can query/edit/add/remove
data on the OS witch the database installed on, or operate from the other
machine remotely. Can I use Hadoop remotely as to use database in a similar
way?

Best Regards,
Iridium

On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <un...@gmail.com>
wrote:

> Your data wont get splitted. so your program runs as single mapper and
> single reducer. And your intermediate data is not shuffeld and sorted, But
> u can use this for debuging
> On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:
>
>> Hi,
>>
>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>> deployed on the machine which is not a node of the cluster.
>> how does *my_prj* to run a mapreduce job in this case?
>>
>> thank you!
>>
>> Best Regards,
>> Iridium
>>
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by unmesha sreeveni <un...@gmail.com>.

Your data wont get splitted. so your program runs as single mapper and
single reducer. And your intermediate data is not shuffeld and sorted, But
u can use this for debuging
On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:

> Hi,
>
> I write some mapreduce code in my project *my_prj*. *my_prj *will be
> deployed on the machine which is not a node of the cluster.
> how does *my_prj* to run a mapreduce job in this case?
>
> thank you!
>
> Best Regards,
> Iridium
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by unmesha sreeveni <un...@gmail.com>.

Your data wont get splitted. so your program runs as single mapper and
single reducer. And your intermediate data is not shuffeld and sorted, But
u can use this for debuging
On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:

> Hi,
>
> I write some mapreduce code in my project *my_prj*. *my_prj *will be
> deployed on the machine which is not a node of the cluster.
> how does *my_prj* to run a mapreduce job in this case?
>
> thank you!
>
> Best Regards,
> Iridium
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by unmesha sreeveni <un...@gmail.com>.

Your data wont get splitted. so your program runs as single mapper and
single reducer. And your intermediate data is not shuffeld and sorted, But
u can use this for debuging
On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:

> Hi,
>
> I write some mapreduce code in my project *my_prj*. *my_prj *will be
> deployed on the machine which is not a node of the cluster.
> how does *my_prj* to run a mapreduce job in this case?
>
> thank you!
>
> Best Regards,
> Iridium
>

Re: How to run a mapreduce program not on the node of hadoop cluster?

Posted by unmesha sreeveni <un...@gmail.com>.

Your data wont get splitted. so your program runs as single mapper and
single reducer. And your intermediate data is not shuffeld and sorted, But
u can use this for debuging
On Jan 14, 2015 2:04 PM, "Cao Yi" <ir...@gmail.com> wrote:

> Hi,
>
> I write some mapreduce code in my project *my_prj*. *my_prj *will be
> deployed on the machine which is not a node of the cluster.
> how does *my_prj* to run a mapreduce job in this case?
>
> thank you!
>
> Best Regards,
> Iridium
>