You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Avery Ching <ac...@apache.org> on 2011/12/06 01:01:25 UTC
Running YARN on top of legacy HDFS (i.e. 0.20)
Hi,
I've been playing with 0.23.0, really nice stuff! I was able to setup a
small test cluster (40 nodes) and launch the example jobs. I was also
able to recompile old Hadoop programs with the new jars and start up
those programs as well. My question is the following:
We have an HDFS instance based on 0.20 that I would like to hook up to
YARN. This appears to be a bit of work. Launching the jobs gives me
the following error:
2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
(ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
(HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc
proxy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
(ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,133 INFO mapreduce.Cluster
(Cluster.java:initialize(116)) - Failed to use
org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
Exception in thread "main" java.io.IOException: Cannot initialize
Cluster. Please check your configuration for mapreduce.framework.name
and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
at
org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
at
org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
After doing a little digging it appears that YarnClientProtocolProvider
creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that
is not available available in older versions of HDFS.
What versions of HDFS are currently supported and what HDFS versions are
planned for support? It would be great to be able to run YARN on legacy
HDFS installations.
Thanks,
Avery
Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Posted by Avery Ching <ac...@apache.org>.
Well, UserGroupInformation and its subclasses in 0.20 are very different
than 0.23. For instance, in 0.23, UserGroupInformation has a private
constructor, this causes issues as UnixUserGroupInformation extends
UserGroupInformation. Also, the RPC class appears to be very different
for 0.23 and 0.20. Yarn's ClientRMProtocolPBClientImpl uses methods
like RPC.setProtocolEngine which aren't available in the 0.20 RPC.
For what it's worth, I was able to get YARN to start a job with the 0.20
RPC (after a lot of hacks), but then ran into
Caused by: com.google.protobuf.ServiceException:
org.apache.hadoop.ipc.RPC$VersionMismatch: Server IPC version 5 cannot
communicate with client version 6
at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
at $Proxy0.getClusterMetrics(Unknown Source)
at
org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getClusterMetrics(ClientRMProtocolPBClientImpl.java:128)
... 24 more
Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Server IPC version
5 cannot communicate with client version 6
at org.apache.hadoop.ipc.Client.call(Client.java:1125)
at org.apache.hadoop.ipc.Client.call(Client.java:1095)
Avery
On 12/9/11 3:15 PM, Arun C Murthy wrote:
> I assume you have security switched off.
>
> What issues are you running into?
>
> On Dec 8, 2011, at 1:30 PM, Avery Ching wrote:
>
>> I was able to convert FileContext to FileSystem and related methods fairly straightforwardly, but am running into issues of dealing with security incompatibilites (i.e. UserGroupInformation, etc.). Yuck.
>>
>> Avery
>>
>> On 12/6/11 3:50 PM, Arun C Murthy wrote:
>>> Avery,
>>>
>>> If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.
>>>
>>> thanks,
>>> Arun
>>>
>>> On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:
>>>
>>>> I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.
>>>>
>>>> Avery
>>>>
>>>> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>>>>> Avery,
>>>>>
>>>>> They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>>>>>
>>>>> We have used the new HDFS apis in YARN in some places.
>>>>>
>>>>> hth,
>>>>> Arun
>>>>>
>>>>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>>>>>
>>>>>> Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>>>>> Avery,
>>>>>>> Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>>>>> using filecontext api's initially but have transitioned back to the
>>>>>>> old API's.
>>>>>>>
>>>>>>> Hope that helps.
>>>>>>>
>>>>>>> mahadev
>>>>>>>
>>>>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've been playing with 0.23.0, really nice stuff! I was able to setup a
>>>>>>>> small test cluster (40 nodes) and launch the example jobs. I was also able
>>>>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>>>>> programs as well. My question is the following:
>>>>>>>>
>>>>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>>>>> This appears to be a bit of work. Launching the jobs gives me the
>>>>>>>> following error:
>>>>>>>>
>>>>>>>> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>>>>> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
>>>>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>>>> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
>>>>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>>>>> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
>>>>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>>>> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster
>>>>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>>>>> correspond server addresses.
>>>>>>>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>>>> at
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>>>> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>>>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>>>> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>>>> at
>>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>>>> at
>>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>> at
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>> at
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>>>>>
>>>>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>>>>>> not available available in older versions of HDFS.
>>>>>>>>
>>>>>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>>>>>> planned for support? It would be great to be able to run YARN on legacy
>>>>>>>> HDFS installations.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Avery
Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Posted by Arun C Murthy <ac...@hortonworks.com>.
I assume you have security switched off.
What issues are you running into?
On Dec 8, 2011, at 1:30 PM, Avery Ching wrote:
> I was able to convert FileContext to FileSystem and related methods fairly straightforwardly, but am running into issues of dealing with security incompatibilites (i.e. UserGroupInformation, etc.). Yuck.
>
> Avery
>
> On 12/6/11 3:50 PM, Arun C Murthy wrote:
>> Avery,
>>
>> If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.
>>
>> thanks,
>> Arun
>>
>> On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:
>>
>>> I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.
>>>
>>> Avery
>>>
>>> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>>>> Avery,
>>>>
>>>> They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>>>>
>>>> We have used the new HDFS apis in YARN in some places.
>>>>
>>>> hth,
>>>> Arun
>>>>
>>>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>>>>
>>>>> Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!
>>>>>
>>>>> Avery
>>>>>
>>>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>>>> Avery,
>>>>>> Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>>>> using filecontext api's initially but have transitioned back to the
>>>>>> old API's.
>>>>>>
>>>>>> Hope that helps.
>>>>>>
>>>>>> mahadev
>>>>>>
>>>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've been playing with 0.23.0, really nice stuff! I was able to setup a
>>>>>>> small test cluster (40 nodes) and launch the example jobs. I was also able
>>>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>>>> programs as well. My question is the following:
>>>>>>>
>>>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>>>> This appears to be a bit of work. Launching the jobs gives me the
>>>>>>> following error:
>>>>>>>
>>>>>>> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>>>> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
>>>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>>> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
>>>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>>>> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
>>>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>>> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster
>>>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>>>> correspond server addresses.
>>>>>>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>>> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>>> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>>> at
>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>>> at
>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>> at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>> at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>>>>
>>>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>>>>> not available available in older versions of HDFS.
>>>>>>>
>>>>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>>>>> planned for support? It would be great to be able to run YARN on legacy
>>>>>>> HDFS installations.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Avery
>
Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Posted by Avery Ching <ac...@apache.org>.
I was able to convert FileContext to FileSystem and related methods
fairly straightforwardly, but am running into issues of dealing with
security incompatibilites (i.e. UserGroupInformation, etc.). Yuck.
Avery
On 12/6/11 3:50 PM, Arun C Murthy wrote:
> Avery,
>
> If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.
>
> thanks,
> Arun
>
> On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:
>
>> I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.
>>
>> Avery
>>
>> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>>> Avery,
>>>
>>> They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>>>
>>> We have used the new HDFS apis in YARN in some places.
>>>
>>> hth,
>>> Arun
>>>
>>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>>>
>>>> Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!
>>>>
>>>> Avery
>>>>
>>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>>> Avery,
>>>>> Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>>> using filecontext api's initially but have transitioned back to the
>>>>> old API's.
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>> mahadev
>>>>>
>>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been playing with 0.23.0, really nice stuff! I was able to setup a
>>>>>> small test cluster (40 nodes) and launch the example jobs. I was also able
>>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>>> programs as well. My question is the following:
>>>>>>
>>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>>> This appears to be a bit of work. Launching the jobs gives me the
>>>>>> following error:
>>>>>>
>>>>>> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>>> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
>>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
>>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>>> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
>>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster
>>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>>> correspond server addresses.
>>>>>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>> at
>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>> at
>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>> at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>>>
>>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>>>> not available available in older versions of HDFS.
>>>>>>
>>>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>>>> planned for support? It would be great to be able to run YARN on legacy
>>>>>> HDFS installations.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Avery
Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Posted by Arun C Murthy <ac...@hortonworks.com>.
Avery,
If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.
thanks,
Arun
On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:
> I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it. I could also go that route I guess.
>
> Avery
>
> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>> Avery,
>>
>> They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>>
>> We have used the new HDFS apis in YARN in some places.
>>
>> hth,
>> Arun
>>
>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>>
>>> Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!
>>>
>>> Avery
>>>
>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>> Avery,
>>>> Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>> using filecontext api's initially but have transitioned back to the
>>>> old API's.
>>>>
>>>> Hope that helps.
>>>>
>>>> mahadev
>>>>
>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org> wrote:
>>>>> Hi,
>>>>>
>>>>> I've been playing with 0.23.0, really nice stuff! I was able to setup a
>>>>> small test cluster (40 nodes) and launch the example jobs. I was also able
>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>> programs as well. My question is the following:
>>>>>
>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>> This appears to be a bit of work. Launching the jobs gives me the
>>>>> following error:
>>>>>
>>>>> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>>> {removed}.{xxx}/{removed}:50177
>>>>> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>>> {removed}.{xxx}/{removed}:50177
>>>>> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster
>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>> correspond server addresses.
>>>>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>> at
>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>> at
>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>> at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>> at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>>
>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>>> not available available in older versions of HDFS.
>>>>>
>>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>>> planned for support? It would be great to be able to run YARN on legacy
>>>>> HDFS installations.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Avery
>
Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Posted by Avery Ching <ac...@apache.org>.
I think it would be nice if YARN could work on existing older HDFS
instances, a lot of folks will be slow to upgrade HDFS with all their
important data on it. I could also go that route I guess.
Avery
On 12/6/11 8:51 AM, Arun C Murthy wrote:
> Avery,
>
> They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>
> We have used the new HDFS apis in YARN in some places.
>
> hth,
> Arun
>
> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>
>> Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!
>>
>> Avery
>>
>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>> Avery,
>>> Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>> using filecontext api's initially but have transitioned back to the
>>> old API's.
>>>
>>> Hope that helps.
>>>
>>> mahadev
>>>
>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org> wrote:
>>>> Hi,
>>>>
>>>> I've been playing with 0.23.0, really nice stuff! I was able to setup a
>>>> small test cluster (40 nodes) and launch the example jobs. I was also able
>>>> to recompile old Hadoop programs with the new jars and start up those
>>>> programs as well. My question is the following:
>>>>
>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>> This appears to be a bit of work. Launching the jobs gives me the
>>>> following error:
>>>>
>>>> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>> {removed}.{xxx}/{removed}:50177
>>>> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>> {removed}.{xxx}/{removed}:50177
>>>> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster
>>>> (Cluster.java:initialize(116)) - Failed to use
>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>> Please check your configuration for mapreduce.framework.name and the
>>>> correspond server addresses.
>>>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>> at
>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>> at
>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>
>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>> not available available in older versions of HDFS.
>>>>
>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>> planned for support? It would be great to be able to run YARN on legacy
>>>> HDFS installations.
>>>>
>>>> Thanks,
>>>>
>>>> Avery
Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Posted by Arun C Murthy <ac...@hortonworks.com>.
Avery,
They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
We have used the new HDFS apis in YARN in some places.
hth,
Arun
On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
> Thank you for the response, that's what I thought as well =). I spent the day trying to port the required 0.23 APIs to 0.20 HDFS. There have been a lot of API changes!
>
> Avery
>
> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>> Avery,
>> Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>> wrong but looking at the HDFS apis' it doesnt look like that it would
>> be a lot of work to getting it to work with 0.20 apis. We had been
>> using filecontext api's initially but have transitioned back to the
>> old API's.
>>
>> Hope that helps.
>>
>> mahadev
>>
>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org> wrote:
>>> Hi,
>>>
>>> I've been playing with 0.23.0, really nice stuff! I was able to setup a
>>> small test cluster (40 nodes) and launch the example jobs. I was also able
>>> to recompile old Hadoop programs with the new jars and start up those
>>> programs as well. My question is the following:
>>>
>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>> This appears to be a bit of work. Launching the jobs gives me the
>>> following error:
>>>
>>> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>> {removed}.{xxx}/{removed}:50177
>>> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>> {removed}.{xxx}/{removed}:50177
>>> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster
>>> (Cluster.java:initialize(116)) - Failed to use
>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>> Please check your configuration for mapreduce.framework.name and the
>>> correspond server addresses.
>>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>> at
>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>> at
>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>
>>> After doing a little digging it appears that YarnClientProtocolProvider
>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>> not available available in older versions of HDFS.
>>>
>>> What versions of HDFS are currently supported and what HDFS versions are
>>> planned for support? It would be great to be able to run YARN on legacy
>>> HDFS installations.
>>>
>>> Thanks,
>>>
>>> Avery
>
Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Posted by Avery Ching <ac...@apache.org>.
Thank you for the response, that's what I thought as well =). I spent
the day trying to port the required 0.23 APIs to 0.20 HDFS. There have
been a lot of API changes!
Avery
On 12/5/11 9:14 PM, Mahadev Konar wrote:
> Avery,
> Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
> wrong but looking at the HDFS apis' it doesnt look like that it would
> be a lot of work to getting it to work with 0.20 apis. We had been
> using filecontext api's initially but have transitioned back to the
> old API's.
>
> Hope that helps.
>
> mahadev
>
> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org> wrote:
>> Hi,
>>
>> I've been playing with 0.23.0, really nice stuff! I was able to setup a
>> small test cluster (40 nodes) and launch the example jobs. I was also able
>> to recompile old Hadoop programs with the new jars and start up those
>> programs as well. My question is the following:
>>
>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>> This appears to be a bit of work. Launching the jobs gives me the
>> following error:
>>
>> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>> {removed}.{xxx}/{removed}:50177
>> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>> {removed}.{xxx}/{removed}:50177
>> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster
>> (Cluster.java:initialize(116)) - Failed to use
>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>> Please check your configuration for mapreduce.framework.name and the
>> correspond server addresses.
>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>> at
>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>> at
>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>
>> After doing a little digging it appears that YarnClientProtocolProvider
>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>> not available available in older versions of HDFS.
>>
>> What versions of HDFS are currently supported and what HDFS versions are
>> planned for support? It would be great to be able to run YARN on legacy
>> HDFS installations.
>>
>> Thanks,
>>
>> Avery
Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Posted by Mahadev Konar <ma...@hortonworks.com>.
Avery,
Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
wrong but looking at the HDFS apis' it doesnt look like that it would
be a lot of work to getting it to work with 0.20 apis. We had been
using filecontext api's initially but have transitioned back to the
old API's.
Hope that helps.
mahadev
On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching <ac...@apache.org> wrote:
> Hi,
>
> I've been playing with 0.23.0, really nice stuff! I was able to setup a
> small test cluster (40 nodes) and launch the example jobs. I was also able
> to recompile old Hadoop programs with the new jars and start up those
> programs as well. My question is the following:
>
> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
> This appears to be a bit of work. Launching the jobs gives me the
> following error:
>
> 2011-12-05 15:48:05,023 INFO ipc.YarnRPC (YarnRPC.java:create(47)) -
> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
> 2011-12-05 15:48:05,040 INFO mapred.ResourceMgrDelegate
> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
> {removed}.{xxx}/{removed}:50177
> 2011-12-05 15:48:05,041 INFO ipc.HadoopYarnRPC
> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
> 2011-12-05 15:48:05,121 INFO mapred.ResourceMgrDelegate
> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
> {removed}.{xxx}/{removed}:50177
> 2011-12-05 15:48:05,133 INFO mapreduce.Cluster
> (Cluster.java:initialize(116)) - Failed to use
> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
> Please check your configuration for mapreduce.framework.name and the
> correspond server addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
> at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
> at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
> at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
> at
> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
> at
> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>
> After doing a little digging it appears that YarnClientProtocolProvider
> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
> not available available in older versions of HDFS.
>
> What versions of HDFS are currently supported and what HDFS versions are
> planned for support? It would be great to be able to run YARN on legacy
> HDFS installations.
>
> Thanks,
>
> Avery