You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by fab wol <da...@gmail.com> on 2014/07/03 16:59:07 UTC

Multi-Cluster Setup

hey everyone,

MapR is offering the possibility to acces from one cluster (e.g. a compute
only cluster without much storage capabilities) another cluster's
HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). In
times of Hadoop-as-a-Service this becomes very interesting. Is this somehow
possible with the "normal" Hadoop Distributions possible (CDH and HDP, I'm
looking at you ;- ) ) or with even without this help from those
distributors? Any Hacks and Tricks or even specific Functions are welcome.
If this is not possible, has anyone issued this as a Ticket or
something?`Ticket Number forwarding is also appreciated ...

Cheers
Wolli

Re: Multi-Cluster Setup

Posted by fab wol <da...@gmail.com>.

hey Rahul,

thanks for pointing me to that page. It's definately worth a read. Need
both clusters to be at least V2.3 for that?

I was digging also a little bit further. There is the property setting
fs.defaultFS whchi might be the exact setting I was actually looking for.
Unfortuantely MapR restricts access to the CLDB and not directly to the
Namenode, which makes this command right now useless (we have a lot of data
in a MapR Cluster, but want to access it in another way) for us.

Thanks everyone, who helped here.

Cheers
Wolli


2014-07-03 18:33 GMT+02:00 Rahul Chaudhari <ra...@gmail.com>:

> Fabian,
>    I see this as the classic case of federation of hadoop clusters. The MR
> or job can refer to the specific hdfs://<file location> as input but at the
> same time run on another cluster.
> You can refer to following link for further details on federation.
>
>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html
>
> Regards,
> Rahul Chaudhari
>
>
> On Thu, Jul 3, 2014 at 9:06 PM, fab wol <da...@gmail.com> wrote:
>
>> Hey Nitin,
>>
>> I'm not talking about concept-wise. I'm takling about how to actually do
>> it technically and how to set it up. Imagine this: I have two clusters,
>> both running fine and they are both (setup-wise) the same, besides that one
>> has way more tasktrackers/Nodemanagers than the other one. Now I want to
>> incorporate some data from the small cluster in the analysis of the big
>> cluster. How could i access the data natively (Just giving the input job
>> another HDFS folder)? In MapR I configure the specified file and then i
>> have another folder in the MapRFS with all the content from the other
>> cluster ... Could i somehow specify one Namenode to lookup another Namenode
>> and incorporate all the uncommon files?
>>
>> Cheers
>> Fabian
>>
>>
>> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:
>>
>> Nothing is stopping you to implement cluster the way you want.
>>> You can have storage only nodes for your HDFS and do not run
>>> tasktrackers on them.
>>>
>>> Start bunch of machines with High RAM and high CPUs but no storage.
>>>
>>> Only thing to worry then would be network bandwidth to carry data from
>>> hdfs to tasks and back to hdfs.
>>>
>>>
>>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>>>
>>>> hey everyone,
>>>>
>>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>>> compute only cluster without much storage capabilities) another cluster's
>>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>>> If this is not possible, has anyone issued this as a Ticket or
>>>> something?`Ticket Number forwarding is also appreciated ...
>>>>
>>>> Cheers
>>>> Wolli
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Regards,
> Rahul Chaudhari
>

Re: Multi-Cluster Setup

Posted by fab wol <da...@gmail.com>.

hey Rahul,

thanks for pointing me to that page. It's definately worth a read. Need
both clusters to be at least V2.3 for that?

I was digging also a little bit further. There is the property setting
fs.defaultFS whchi might be the exact setting I was actually looking for.
Unfortuantely MapR restricts access to the CLDB and not directly to the
Namenode, which makes this command right now useless (we have a lot of data
in a MapR Cluster, but want to access it in another way) for us.

Thanks everyone, who helped here.

Cheers
Wolli


2014-07-03 18:33 GMT+02:00 Rahul Chaudhari <ra...@gmail.com>:

> Fabian,
>    I see this as the classic case of federation of hadoop clusters. The MR
> or job can refer to the specific hdfs://<file location> as input but at the
> same time run on another cluster.
> You can refer to following link for further details on federation.
>
>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html
>
> Regards,
> Rahul Chaudhari
>
>
> On Thu, Jul 3, 2014 at 9:06 PM, fab wol <da...@gmail.com> wrote:
>
>> Hey Nitin,
>>
>> I'm not talking about concept-wise. I'm takling about how to actually do
>> it technically and how to set it up. Imagine this: I have two clusters,
>> both running fine and they are both (setup-wise) the same, besides that one
>> has way more tasktrackers/Nodemanagers than the other one. Now I want to
>> incorporate some data from the small cluster in the analysis of the big
>> cluster. How could i access the data natively (Just giving the input job
>> another HDFS folder)? In MapR I configure the specified file and then i
>> have another folder in the MapRFS with all the content from the other
>> cluster ... Could i somehow specify one Namenode to lookup another Namenode
>> and incorporate all the uncommon files?
>>
>> Cheers
>> Fabian
>>
>>
>> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:
>>
>> Nothing is stopping you to implement cluster the way you want.
>>> You can have storage only nodes for your HDFS and do not run
>>> tasktrackers on them.
>>>
>>> Start bunch of machines with High RAM and high CPUs but no storage.
>>>
>>> Only thing to worry then would be network bandwidth to carry data from
>>> hdfs to tasks and back to hdfs.
>>>
>>>
>>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>>>
>>>> hey everyone,
>>>>
>>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>>> compute only cluster without much storage capabilities) another cluster's
>>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>>> If this is not possible, has anyone issued this as a Ticket or
>>>> something?`Ticket Number forwarding is also appreciated ...
>>>>
>>>> Cheers
>>>> Wolli
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Regards,
> Rahul Chaudhari
>

Re: Multi-Cluster Setup

Posted by fab wol <da...@gmail.com>.

hey Rahul,

thanks for pointing me to that page. It's definately worth a read. Need
both clusters to be at least V2.3 for that?

I was digging also a little bit further. There is the property setting
fs.defaultFS whchi might be the exact setting I was actually looking for.
Unfortuantely MapR restricts access to the CLDB and not directly to the
Namenode, which makes this command right now useless (we have a lot of data
in a MapR Cluster, but want to access it in another way) for us.

Thanks everyone, who helped here.

Cheers
Wolli


2014-07-03 18:33 GMT+02:00 Rahul Chaudhari <ra...@gmail.com>:

> Fabian,
>    I see this as the classic case of federation of hadoop clusters. The MR
> or job can refer to the specific hdfs://<file location> as input but at the
> same time run on another cluster.
> You can refer to following link for further details on federation.
>
>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html
>
> Regards,
> Rahul Chaudhari
>
>
> On Thu, Jul 3, 2014 at 9:06 PM, fab wol <da...@gmail.com> wrote:
>
>> Hey Nitin,
>>
>> I'm not talking about concept-wise. I'm takling about how to actually do
>> it technically and how to set it up. Imagine this: I have two clusters,
>> both running fine and they are both (setup-wise) the same, besides that one
>> has way more tasktrackers/Nodemanagers than the other one. Now I want to
>> incorporate some data from the small cluster in the analysis of the big
>> cluster. How could i access the data natively (Just giving the input job
>> another HDFS folder)? In MapR I configure the specified file and then i
>> have another folder in the MapRFS with all the content from the other
>> cluster ... Could i somehow specify one Namenode to lookup another Namenode
>> and incorporate all the uncommon files?
>>
>> Cheers
>> Fabian
>>
>>
>> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:
>>
>> Nothing is stopping you to implement cluster the way you want.
>>> You can have storage only nodes for your HDFS and do not run
>>> tasktrackers on them.
>>>
>>> Start bunch of machines with High RAM and high CPUs but no storage.
>>>
>>> Only thing to worry then would be network bandwidth to carry data from
>>> hdfs to tasks and back to hdfs.
>>>
>>>
>>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>>>
>>>> hey everyone,
>>>>
>>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>>> compute only cluster without much storage capabilities) another cluster's
>>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>>> If this is not possible, has anyone issued this as a Ticket or
>>>> something?`Ticket Number forwarding is also appreciated ...
>>>>
>>>> Cheers
>>>> Wolli
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Regards,
> Rahul Chaudhari
>

Re: Multi-Cluster Setup

Posted by fab wol <da...@gmail.com>.

hey Rahul,

thanks for pointing me to that page. It's definately worth a read. Need
both clusters to be at least V2.3 for that?

I was digging also a little bit further. There is the property setting
fs.defaultFS whchi might be the exact setting I was actually looking for.
Unfortuantely MapR restricts access to the CLDB and not directly to the
Namenode, which makes this command right now useless (we have a lot of data
in a MapR Cluster, but want to access it in another way) for us.

Thanks everyone, who helped here.

Cheers
Wolli


2014-07-03 18:33 GMT+02:00 Rahul Chaudhari <ra...@gmail.com>:

> Fabian,
>    I see this as the classic case of federation of hadoop clusters. The MR
> or job can refer to the specific hdfs://<file location> as input but at the
> same time run on another cluster.
> You can refer to following link for further details on federation.
>
>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html
>
> Regards,
> Rahul Chaudhari
>
>
> On Thu, Jul 3, 2014 at 9:06 PM, fab wol <da...@gmail.com> wrote:
>
>> Hey Nitin,
>>
>> I'm not talking about concept-wise. I'm takling about how to actually do
>> it technically and how to set it up. Imagine this: I have two clusters,
>> both running fine and they are both (setup-wise) the same, besides that one
>> has way more tasktrackers/Nodemanagers than the other one. Now I want to
>> incorporate some data from the small cluster in the analysis of the big
>> cluster. How could i access the data natively (Just giving the input job
>> another HDFS folder)? In MapR I configure the specified file and then i
>> have another folder in the MapRFS with all the content from the other
>> cluster ... Could i somehow specify one Namenode to lookup another Namenode
>> and incorporate all the uncommon files?
>>
>> Cheers
>> Fabian
>>
>>
>> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:
>>
>> Nothing is stopping you to implement cluster the way you want.
>>> You can have storage only nodes for your HDFS and do not run
>>> tasktrackers on them.
>>>
>>> Start bunch of machines with High RAM and high CPUs but no storage.
>>>
>>> Only thing to worry then would be network bandwidth to carry data from
>>> hdfs to tasks and back to hdfs.
>>>
>>>
>>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>>>
>>>> hey everyone,
>>>>
>>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>>> compute only cluster without much storage capabilities) another cluster's
>>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>>> If this is not possible, has anyone issued this as a Ticket or
>>>> something?`Ticket Number forwarding is also appreciated ...
>>>>
>>>> Cheers
>>>> Wolli
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Regards,
> Rahul Chaudhari
>

Re: Multi-Cluster Setup

Posted by Rahul Chaudhari <ra...@gmail.com>.

Fabian,
   I see this as the classic case of federation of hadoop clusters. The MR
or job can refer to the specific hdfs://<file location> as input but at the
same time run on another cluster.
You can refer to following link for further details on federation.

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html

Regards,
Rahul Chaudhari


On Thu, Jul 3, 2014 at 9:06 PM, fab wol <da...@gmail.com> wrote:

> Hey Nitin,
>
> I'm not talking about concept-wise. I'm takling about how to actually do
> it technically and how to set it up. Imagine this: I have two clusters,
> both running fine and they are both (setup-wise) the same, besides that one
> has way more tasktrackers/Nodemanagers than the other one. Now I want to
> incorporate some data from the small cluster in the analysis of the big
> cluster. How could i access the data natively (Just giving the input job
> another HDFS folder)? In MapR I configure the specified file and then i
> have another folder in the MapRFS with all the content from the other
> cluster ... Could i somehow specify one Namenode to lookup another Namenode
> and incorporate all the uncommon files?
>
> Cheers
> Fabian
>
>
> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:
>
> Nothing is stopping you to implement cluster the way you want.
>> You can have storage only nodes for your HDFS and do not run tasktrackers
>> on them.
>>
>> Start bunch of machines with High RAM and high CPUs but no storage.
>>
>> Only thing to worry then would be network bandwidth to carry data from
>> hdfs to tasks and back to hdfs.
>>
>>
>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>>
>>> hey everyone,
>>>
>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>> compute only cluster without much storage capabilities) another cluster's
>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>> If this is not possible, has anyone issued this as a Ticket or
>>> something?`Ticket Number forwarding is also appreciated ...
>>>
>>> Cheers
>>> Wolli
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Regards,
Rahul Chaudhari

Re: Multi-Cluster Setup

Posted by Rahul Chaudhari <ra...@gmail.com>.

Fabian,
   I see this as the classic case of federation of hadoop clusters. The MR
or job can refer to the specific hdfs://<file location> as input but at the
same time run on another cluster.
You can refer to following link for further details on federation.

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html

Regards,
Rahul Chaudhari


On Thu, Jul 3, 2014 at 9:06 PM, fab wol <da...@gmail.com> wrote:

> Hey Nitin,
>
> I'm not talking about concept-wise. I'm takling about how to actually do
> it technically and how to set it up. Imagine this: I have two clusters,
> both running fine and they are both (setup-wise) the same, besides that one
> has way more tasktrackers/Nodemanagers than the other one. Now I want to
> incorporate some data from the small cluster in the analysis of the big
> cluster. How could i access the data natively (Just giving the input job
> another HDFS folder)? In MapR I configure the specified file and then i
> have another folder in the MapRFS with all the content from the other
> cluster ... Could i somehow specify one Namenode to lookup another Namenode
> and incorporate all the uncommon files?
>
> Cheers
> Fabian
>
>
> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:
>
> Nothing is stopping you to implement cluster the way you want.
>> You can have storage only nodes for your HDFS and do not run tasktrackers
>> on them.
>>
>> Start bunch of machines with High RAM and high CPUs but no storage.
>>
>> Only thing to worry then would be network bandwidth to carry data from
>> hdfs to tasks and back to hdfs.
>>
>>
>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>>
>>> hey everyone,
>>>
>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>> compute only cluster without much storage capabilities) another cluster's
>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>> If this is not possible, has anyone issued this as a Ticket or
>>> something?`Ticket Number forwarding is also appreciated ...
>>>
>>> Cheers
>>> Wolli
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Regards,
Rahul Chaudhari

Re: Multi-Cluster Setup

Posted by Rahul Chaudhari <ra...@gmail.com>.

Fabian,
   I see this as the classic case of federation of hadoop clusters. The MR
or job can refer to the specific hdfs://<file location> as input but at the
same time run on another cluster.
You can refer to following link for further details on federation.

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html

Regards,
Rahul Chaudhari


On Thu, Jul 3, 2014 at 9:06 PM, fab wol <da...@gmail.com> wrote:

> Hey Nitin,
>
> I'm not talking about concept-wise. I'm takling about how to actually do
> it technically and how to set it up. Imagine this: I have two clusters,
> both running fine and they are both (setup-wise) the same, besides that one
> has way more tasktrackers/Nodemanagers than the other one. Now I want to
> incorporate some data from the small cluster in the analysis of the big
> cluster. How could i access the data natively (Just giving the input job
> another HDFS folder)? In MapR I configure the specified file and then i
> have another folder in the MapRFS with all the content from the other
> cluster ... Could i somehow specify one Namenode to lookup another Namenode
> and incorporate all the uncommon files?
>
> Cheers
> Fabian
>
>
> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:
>
> Nothing is stopping you to implement cluster the way you want.
>> You can have storage only nodes for your HDFS and do not run tasktrackers
>> on them.
>>
>> Start bunch of machines with High RAM and high CPUs but no storage.
>>
>> Only thing to worry then would be network bandwidth to carry data from
>> hdfs to tasks and back to hdfs.
>>
>>
>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>>
>>> hey everyone,
>>>
>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>> compute only cluster without much storage capabilities) another cluster's
>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>> If this is not possible, has anyone issued this as a Ticket or
>>> something?`Ticket Number forwarding is also appreciated ...
>>>
>>> Cheers
>>> Wolli
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Regards,
Rahul Chaudhari

Re: Multi-Cluster Setup

Posted by Rahul Chaudhari <ra...@gmail.com>.

Fabian,
   I see this as the classic case of federation of hadoop clusters. The MR
or job can refer to the specific hdfs://<file location> as input but at the
same time run on another cluster.
You can refer to following link for further details on federation.

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/Federation.html

Regards,
Rahul Chaudhari


On Thu, Jul 3, 2014 at 9:06 PM, fab wol <da...@gmail.com> wrote:

> Hey Nitin,
>
> I'm not talking about concept-wise. I'm takling about how to actually do
> it technically and how to set it up. Imagine this: I have two clusters,
> both running fine and they are both (setup-wise) the same, besides that one
> has way more tasktrackers/Nodemanagers than the other one. Now I want to
> incorporate some data from the small cluster in the analysis of the big
> cluster. How could i access the data natively (Just giving the input job
> another HDFS folder)? In MapR I configure the specified file and then i
> have another folder in the MapRFS with all the content from the other
> cluster ... Could i somehow specify one Namenode to lookup another Namenode
> and incorporate all the uncommon files?
>
> Cheers
> Fabian
>
>
> 2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:
>
> Nothing is stopping you to implement cluster the way you want.
>> You can have storage only nodes for your HDFS and do not run tasktrackers
>> on them.
>>
>> Start bunch of machines with High RAM and high CPUs but no storage.
>>
>> Only thing to worry then would be network bandwidth to carry data from
>> hdfs to tasks and back to hdfs.
>>
>>
>> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>>
>>> hey everyone,
>>>
>>> MapR is offering the possibility to acces from one cluster (e.g. a
>>> compute only cluster without much storage capabilities) another cluster's
>>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>>> If this is not possible, has anyone issued this as a Ticket or
>>> something?`Ticket Number forwarding is also appreciated ...
>>>
>>> Cheers
>>> Wolli
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Regards,
Rahul Chaudhari

Re: Multi-Cluster Setup

Posted by fab wol <da...@gmail.com>.

Hey Nitin,

I'm not talking about concept-wise. I'm takling about how to actually do it
technically and how to set it up. Imagine this: I have two clusters, both
running fine and they are both (setup-wise) the same, besides that one has
way more tasktrackers/Nodemanagers than the other one. Now I want to
incorporate some data from the small cluster in the analysis of the big
cluster. How could i access the data natively (Just giving the input job
another HDFS folder)? In MapR I configure the specified file and then i
have another folder in the MapRFS with all the content from the other
cluster ... Could i somehow specify one Namenode to lookup another Namenode
and incorporate all the uncommon files?

Cheers
Fabian

2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:

> Nothing is stopping you to implement cluster the way you want.
> You can have storage only nodes for your HDFS and do not run tasktrackers
> on them.
>
> Start bunch of machines with High RAM and high CPUs but no storage.
>
> Only thing to worry then would be network bandwidth to carry data from
> hdfs to tasks and back to hdfs.
>
>
> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>
>> hey everyone,
>>
>> MapR is offering the possibility to acces from one cluster (e.g. a
>> compute only cluster without much storage capabilities) another cluster's
>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>> If this is not possible, has anyone issued this as a Ticket or
>> something?`Ticket Number forwarding is also appreciated ...
>>
>> Cheers
>> Wolli
>>
>
>
>
> --
> Nitin Pawar
>

Re: Multi-Cluster Setup

Posted by fab wol <da...@gmail.com>.

Hey Nitin,

I'm not talking about concept-wise. I'm takling about how to actually do it
technically and how to set it up. Imagine this: I have two clusters, both
running fine and they are both (setup-wise) the same, besides that one has
way more tasktrackers/Nodemanagers than the other one. Now I want to
incorporate some data from the small cluster in the analysis of the big
cluster. How could i access the data natively (Just giving the input job
another HDFS folder)? In MapR I configure the specified file and then i
have another folder in the MapRFS with all the content from the other
cluster ... Could i somehow specify one Namenode to lookup another Namenode
and incorporate all the uncommon files?

Cheers
Fabian

2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:

> Nothing is stopping you to implement cluster the way you want.
> You can have storage only nodes for your HDFS and do not run tasktrackers
> on them.
>
> Start bunch of machines with High RAM and high CPUs but no storage.
>
> Only thing to worry then would be network bandwidth to carry data from
> hdfs to tasks and back to hdfs.
>
>
> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>
>> hey everyone,
>>
>> MapR is offering the possibility to acces from one cluster (e.g. a
>> compute only cluster without much storage capabilities) another cluster's
>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>> If this is not possible, has anyone issued this as a Ticket or
>> something?`Ticket Number forwarding is also appreciated ...
>>
>> Cheers
>> Wolli
>>
>
>
>
> --
> Nitin Pawar
>

Re: Multi-Cluster Setup

Posted by fab wol <da...@gmail.com>.

Hey Nitin,

I'm not talking about concept-wise. I'm takling about how to actually do it
technically and how to set it up. Imagine this: I have two clusters, both
running fine and they are both (setup-wise) the same, besides that one has
way more tasktrackers/Nodemanagers than the other one. Now I want to
incorporate some data from the small cluster in the analysis of the big
cluster. How could i access the data natively (Just giving the input job
another HDFS folder)? In MapR I configure the specified file and then i
have another folder in the MapRFS with all the content from the other
cluster ... Could i somehow specify one Namenode to lookup another Namenode
and incorporate all the uncommon files?

Cheers
Fabian

2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:

> Nothing is stopping you to implement cluster the way you want.
> You can have storage only nodes for your HDFS and do not run tasktrackers
> on them.
>
> Start bunch of machines with High RAM and high CPUs but no storage.
>
> Only thing to worry then would be network bandwidth to carry data from
> hdfs to tasks and back to hdfs.
>
>
> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>
>> hey everyone,
>>
>> MapR is offering the possibility to acces from one cluster (e.g. a
>> compute only cluster without much storage capabilities) another cluster's
>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>> If this is not possible, has anyone issued this as a Ticket or
>> something?`Ticket Number forwarding is also appreciated ...
>>
>> Cheers
>> Wolli
>>
>
>
>
> --
> Nitin Pawar
>

Re: Multi-Cluster Setup

Posted by fab wol <da...@gmail.com>.

Hey Nitin,

I'm not talking about concept-wise. I'm takling about how to actually do it
technically and how to set it up. Imagine this: I have two clusters, both
running fine and they are both (setup-wise) the same, besides that one has
way more tasktrackers/Nodemanagers than the other one. Now I want to
incorporate some data from the small cluster in the analysis of the big
cluster. How could i access the data natively (Just giving the input job
another HDFS folder)? In MapR I configure the specified file and then i
have another folder in the MapRFS with all the content from the other
cluster ... Could i somehow specify one Namenode to lookup another Namenode
and incorporate all the uncommon files?

Cheers
Fabian

2014-07-03 17:09 GMT+02:00 Nitin Pawar <ni...@gmail.com>:

> Nothing is stopping you to implement cluster the way you want.
> You can have storage only nodes for your HDFS and do not run tasktrackers
> on them.
>
> Start bunch of machines with High RAM and high CPUs but no storage.
>
> Only thing to worry then would be network bandwidth to carry data from
> hdfs to tasks and back to hdfs.
>
>
> On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:
>
>> hey everyone,
>>
>> MapR is offering the possibility to acces from one cluster (e.g. a
>> compute only cluster without much storage capabilities) another cluster's
>> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf).
>> In times of Hadoop-as-a-Service this becomes very interesting. Is this
>> somehow possible with the "normal" Hadoop Distributions possible (CDH and
>> HDP, I'm looking at you ;- ) ) or with even without this help from those
>> distributors? Any Hacks and Tricks or even specific Functions are welcome.
>> If this is not possible, has anyone issued this as a Ticket or
>> something?`Ticket Number forwarding is also appreciated ...
>>
>> Cheers
>> Wolli
>>
>
>
>
> --
> Nitin Pawar
>

Re: Multi-Cluster Setup

Posted by Nitin Pawar <ni...@gmail.com>.

Nothing is stopping you to implement cluster the way you want.
You can have storage only nodes for your HDFS and do not run tasktrackers
on them.

Start bunch of machines with High RAM and high CPUs but no storage.

Only thing to worry then would be network bandwidth to carry data from hdfs
to tasks and back to hdfs.

On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:

> hey everyone,
>
> MapR is offering the possibility to acces from one cluster (e.g. a compute
> only cluster without much storage capabilities) another cluster's
> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). In
> times of Hadoop-as-a-Service this becomes very interesting. Is this somehow
> possible with the "normal" Hadoop Distributions possible (CDH and HDP, I'm
> looking at you ;- ) ) or with even without this help from those
> distributors? Any Hacks and Tricks or even specific Functions are welcome.
> If this is not possible, has anyone issued this as a Ticket or
> something?`Ticket Number forwarding is also appreciated ...
>
> Cheers
> Wolli
>

-- 
Nitin Pawar

Re: Multi-Cluster Setup

Posted by Nitin Pawar <ni...@gmail.com>.

Nothing is stopping you to implement cluster the way you want.
You can have storage only nodes for your HDFS and do not run tasktrackers
on them.

Start bunch of machines with High RAM and high CPUs but no storage.

Only thing to worry then would be network bandwidth to carry data from hdfs
to tasks and back to hdfs.

On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:

> hey everyone,
>
> MapR is offering the possibility to acces from one cluster (e.g. a compute
> only cluster without much storage capabilities) another cluster's
> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). In
> times of Hadoop-as-a-Service this becomes very interesting. Is this somehow
> possible with the "normal" Hadoop Distributions possible (CDH and HDP, I'm
> looking at you ;- ) ) or with even without this help from those
> distributors? Any Hacks and Tricks or even specific Functions are welcome.
> If this is not possible, has anyone issued this as a Ticket or
> something?`Ticket Number forwarding is also appreciated ...
>
> Cheers
> Wolli
>

-- 
Nitin Pawar

Re: Multi-Cluster Setup

Posted by Nitin Pawar <ni...@gmail.com>.

Nothing is stopping you to implement cluster the way you want.
You can have storage only nodes for your HDFS and do not run tasktrackers
on them.

Start bunch of machines with High RAM and high CPUs but no storage.

Only thing to worry then would be network bandwidth to carry data from hdfs
to tasks and back to hdfs.

On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:

> hey everyone,
>
> MapR is offering the possibility to acces from one cluster (e.g. a compute
> only cluster without much storage capabilities) another cluster's
> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). In
> times of Hadoop-as-a-Service this becomes very interesting. Is this somehow
> possible with the "normal" Hadoop Distributions possible (CDH and HDP, I'm
> looking at you ;- ) ) or with even without this help from those
> distributors? Any Hacks and Tricks or even specific Functions are welcome.
> If this is not possible, has anyone issued this as a Ticket or
> something?`Ticket Number forwarding is also appreciated ...
>
> Cheers
> Wolli
>

-- 
Nitin Pawar

Re: Multi-Cluster Setup

Posted by Nitin Pawar <ni...@gmail.com>.

Nothing is stopping you to implement cluster the way you want.
You can have storage only nodes for your HDFS and do not run tasktrackers
on them.

Start bunch of machines with High RAM and high CPUs but no storage.

Only thing to worry then would be network bandwidth to carry data from hdfs
to tasks and back to hdfs.

On Thu, Jul 3, 2014 at 8:29 PM, fab wol <da...@gmail.com> wrote:

> hey everyone,
>
> MapR is offering the possibility to acces from one cluster (e.g. a compute
> only cluster without much storage capabilities) another cluster's
> HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). In
> times of Hadoop-as-a-Service this becomes very interesting. Is this somehow
> possible with the "normal" Hadoop Distributions possible (CDH and HDP, I'm
> looking at you ;- ) ) or with even without this help from those
> distributors? Any Hacks and Tricks or even specific Functions are welcome.
> If this is not possible, has anyone issued this as a Ticket or
> something?`Ticket Number forwarding is also appreciated ...
>
> Cheers
> Wolli
>

-- 
Nitin Pawar