You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Avery Ching <ac...@apache.org> on 2011/12/06 01:01:25 UTC

Running YARN on top of legacy HDFS (i.e. 0.20)

Hi,

I've been playing with 0.23.0, really nice stuff!  I was able to setup a 
small test cluster (40 nodes) and launch the example jobs.  I was also 
able to recompile old Hadoop programs with the new jars and start up 
those programs as well.  My question is the following:

We have an HDFS instance based on 0.20 that I would like to hook up to 
YARN.  This appears to be a bit of work.  Launching the jobs gives me 
the following error:

2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) - 
Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate 
(ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at 
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC 
(HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc 
proxy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate 
(ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at 
{removed}.{xxx}/{removed}:50177
2011-12-05 15:48:05,133 INFO  mapreduce.Cluster 
(Cluster.java:initialize(116)) - Failed to use 
org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: 
java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
Exception in thread "main" java.io.IOException: Cannot initialize 
Cluster. Please check your configuration for mapreduce.framework.name 
and the correspond server addresses.
     at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
     at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
     at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
     at 
org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
     at 
org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

After doing a little digging it appears that YarnClientProtocolProvider 
creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that 
is not available available in older versions of HDFS.

What versions of HDFS are currently supported and what HDFS versions are 
planned for support?  It would be great to be able to run YARN on legacy 
HDFS installations.

Thanks,

Avery

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

Posted by Avery Ching <ac...@apache.org>.

Well, UserGroupInformation and its subclasses in 0.20 are very different 
than 0.23.  For instance, in 0.23, UserGroupInformation has a private 
constructor, this causes issues as UnixUserGroupInformation extends 
UserGroupInformation.  Also, the RPC class appears to be very different 
for 0.23 and 0.20.  Yarn's ClientRMProtocolPBClientImpl uses methods 
like RPC.setProtocolEngine which aren't available in the 0.20 RPC.

For what it's worth, I was able to get YARN to start a job with the 0.20 
RPC (after a lot of hacks), but then ran into

Caused by: com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RPC$VersionMismatch: Server IPC version 5 cannot 
communicate with client version 6
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
at $Proxy0.getClusterMetrics(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getClusterMetrics(ClientRMProtocolPBClientImpl.java:128)
... 24 more
Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Server IPC version 
5 cannot communicate with client version 6
at org.apache.hadoop.ipc.Client.call(Client.java:1125)
at org.apache.hadoop.ipc.Client.call(Client.java:1095)

Avery

On 12/9/11 3:15 PM, Arun C Murthy wrote:
> I assume you have security switched off.
>
> What issues are you running into?
>
> On Dec 8, 2011, at 1:30 PM, Avery Ching wrote:
>
>> I was able to convert FileContext to FileSystem and related methods fairly straightforwardly, but am running into issues of dealing with security incompatibilites (i.e. UserGroupInformation, etc.).  Yuck.
>>
>> Avery
>>
>> On 12/6/11 3:50 PM, Arun C Murthy wrote:
>>> Avery,
>>>
>>> If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.
>>>
>>> thanks,
>>> Arun
>>>
>>> On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:
>>>
>>>> I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it.  I could also go that route I guess.
>>>>
>>>> Avery
>>>>
>>>> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>>>>> Avery,
>>>>>
>>>>>   They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>>>>>
>>>>>   We have used the new HDFS apis in YARN in some places.
>>>>>
>>>>> hth,
>>>>> Arun
>>>>>
>>>>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>>>>>
>>>>>> Thank you for the response, that's what I thought as well =).  I spent the day trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of API changes!
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>>>>> Avery,
>>>>>>>   Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>>>>> using filecontext api's initially but have transitioned back to the
>>>>>>> old API's.
>>>>>>>
>>>>>>> Hope that helps.
>>>>>>>
>>>>>>> mahadev
>>>>>>>
>>>>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org>     wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've been playing with 0.23.0, really nice stuff!  I was able to setup a
>>>>>>>> small test cluster (40 nodes) and launch the example jobs.  I was also able
>>>>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>>>>> programs as well.  My question is the following:
>>>>>>>>
>>>>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>>>>>   This appears to be a bit of work.  Launching the jobs gives me the
>>>>>>>> following error:
>>>>>>>>
>>>>>>>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>>>>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>>>>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>>>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>>>>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>>>>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>>>>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>>>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>>>>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>>>>> correspond server addresses.
>>>>>>>>     at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>>>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>>>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>>>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>>>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>>>>     at
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>>>>     at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>>>>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>>>>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>>>>     at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>>>>     at
>>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>>>>     at
>>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>     at
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>>     at
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>>>>>
>>>>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>>>>>> not available available in older versions of HDFS.
>>>>>>>>
>>>>>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>>>>>> planned for support?  It would be great to be able to run YARN on legacy
>>>>>>>> HDFS installations.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Avery

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

Posted by Arun C Murthy <ac...@hortonworks.com>.

I assume you have security switched off.

What issues are you running into?

On Dec 8, 2011, at 1:30 PM, Avery Ching wrote:

> I was able to convert FileContext to FileSystem and related methods fairly straightforwardly, but am running into issues of dealing with security incompatibilites (i.e. UserGroupInformation, etc.).  Yuck.
> 
> Avery
> 
> On 12/6/11 3:50 PM, Arun C Murthy wrote:
>> Avery,
>> 
>> If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.
>> 
>> thanks,
>> Arun
>> 
>> On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:
>> 
>>> I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it.  I could also go that route I guess.
>>> 
>>> Avery
>>> 
>>> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>>>> Avery,
>>>> 
>>>>  They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>>>> 
>>>>  We have used the new HDFS apis in YARN in some places.
>>>> 
>>>> hth,
>>>> Arun
>>>> 
>>>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>>>> 
>>>>> Thank you for the response, that's what I thought as well =).  I spent the day trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of API changes!
>>>>> 
>>>>> Avery
>>>>> 
>>>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>>>> Avery,
>>>>>>  Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>>>> using filecontext api's initially but have transitioned back to the
>>>>>> old API's.
>>>>>> 
>>>>>> Hope that helps.
>>>>>> 
>>>>>> mahadev
>>>>>> 
>>>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org>    wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I've been playing with 0.23.0, really nice stuff!  I was able to setup a
>>>>>>> small test cluster (40 nodes) and launch the example jobs.  I was also able
>>>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>>>> programs as well.  My question is the following:
>>>>>>> 
>>>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>>>>  This appears to be a bit of work.  Launching the jobs gives me the
>>>>>>> following error:
>>>>>>> 
>>>>>>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>>>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>>>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>>>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>>>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>>>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>>>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>>>> correspond server addresses.
>>>>>>>    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>>>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>>>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>>>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>>>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>>>    at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>>>    at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>>>    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>>>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>>>    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>>>    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>>>    at
>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>>>    at
>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>    at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>    at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>>>> 
>>>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>>>>> not available available in older versions of HDFS.
>>>>>>> 
>>>>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>>>>> planned for support?  It would be great to be able to run YARN on legacy
>>>>>>> HDFS installations.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Avery
>

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

Posted by Avery Ching <ac...@apache.org>.

I was able to convert FileContext to FileSystem and related methods 
fairly straightforwardly, but am running into issues of dealing with 
security incompatibilites (i.e. UserGroupInformation, etc.).  Yuck.

Avery

On 12/6/11 3:50 PM, Arun C Murthy wrote:
> Avery,
>
> If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.
>
> thanks,
> Arun
>
> On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:
>
>> I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it.  I could also go that route I guess.
>>
>> Avery
>>
>> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>>> Avery,
>>>
>>>   They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>>>
>>>   We have used the new HDFS apis in YARN in some places.
>>>
>>> hth,
>>> Arun
>>>
>>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>>>
>>>> Thank you for the response, that's what I thought as well =).  I spent the day trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of API changes!
>>>>
>>>> Avery
>>>>
>>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>>> Avery,
>>>>>   Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>>> using filecontext api's initially but have transitioned back to the
>>>>> old API's.
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>> mahadev
>>>>>
>>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org>    wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been playing with 0.23.0, really nice stuff!  I was able to setup a
>>>>>> small test cluster (40 nodes) and launch the example jobs.  I was also able
>>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>>> programs as well.  My question is the following:
>>>>>>
>>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>>>   This appears to be a bit of work.  Launching the jobs gives me the
>>>>>> following error:
>>>>>>
>>>>>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>>> correspond server addresses.
>>>>>>     at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>>     at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>>     at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>>     at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>>     at
>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>>     at
>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>     at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>     at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>>>
>>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>>>> not available available in older versions of HDFS.
>>>>>>
>>>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>>>> planned for support?  It would be great to be able to run YARN on legacy
>>>>>> HDFS installations.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Avery

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

Posted by Arun C Murthy <ac...@hortonworks.com>.

Avery,

If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very much effort.

thanks,
Arun

On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:

> I think it would be nice if YARN could work on existing older HDFS instances, a lot of folks will be slow to upgrade HDFS with all their important data on it.  I could also go that route I guess.
> 
> Avery
> 
> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>> Avery,
>> 
>>  They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>> 
>>  We have used the new HDFS apis in YARN in some places.
>> 
>> hth,
>> Arun
>> 
>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>> 
>>> Thank you for the response, that's what I thought as well =).  I spent the day trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of API changes!
>>> 
>>> Avery
>>> 
>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>> Avery,
>>>>  Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>> using filecontext api's initially but have transitioned back to the
>>>> old API's.
>>>> 
>>>> Hope that helps.
>>>> 
>>>> mahadev
>>>> 
>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org>   wrote:
>>>>> Hi,
>>>>> 
>>>>> I've been playing with 0.23.0, really nice stuff!  I was able to setup a
>>>>> small test cluster (40 nodes) and launch the example jobs.  I was also able
>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>> programs as well.  My question is the following:
>>>>> 
>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>>  This appears to be a bit of work.  Launching the jobs gives me the
>>>>> following error:
>>>>> 
>>>>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>>> {removed}.{xxx}/{removed}:50177
>>>>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>>> {removed}.{xxx}/{removed}:50177
>>>>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>> correspond server addresses.
>>>>>    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>    at java.security.AccessController.doPrivileged(Native Method)
>>>>>    at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>    at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>    at
>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>    at
>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>    at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>    at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>> 
>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>>> not available available in older versions of HDFS.
>>>>> 
>>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>>> planned for support?  It would be great to be able to run YARN on legacy
>>>>> HDFS installations.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Avery
>

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

Posted by Avery Ching <ac...@apache.org>.

I think it would be nice if YARN could work on existing older HDFS 
instances, a lot of folks will be slow to upgrade HDFS with all their 
important data on it.  I could also go that route I guess.

Avery

On 12/6/11 8:51 AM, Arun C Murthy wrote:
> Avery,
>
>   They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>
>   We have used the new HDFS apis in YARN in some places.
>
> hth,
> Arun
>
> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>
>> Thank you for the response, that's what I thought as well =).  I spent the day trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of API changes!
>>
>> Avery
>>
>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>> Avery,
>>>   Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>> using filecontext api's initially but have transitioned back to the
>>> old API's.
>>>
>>> Hope that helps.
>>>
>>> mahadev
>>>
>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org>   wrote:
>>>> Hi,
>>>>
>>>> I've been playing with 0.23.0, really nice stuff!  I was able to setup a
>>>> small test cluster (40 nodes) and launch the example jobs.  I was also able
>>>> to recompile old Hadoop programs with the new jars and start up those
>>>> programs as well.  My question is the following:
>>>>
>>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>>   This appears to be a bit of work.  Launching the jobs gives me the
>>>> following error:
>>>>
>>>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>>> {removed}.{xxx}/{removed}:50177
>>>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>>> {removed}.{xxx}/{removed}:50177
>>>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>>>> (Cluster.java:initialize(116)) - Failed to use
>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>> Please check your configuration for mapreduce.framework.name and the
>>>> correspond server addresses.
>>>>     at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>     at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>     at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>     at
>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>     at
>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>
>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>>> not available available in older versions of HDFS.
>>>>
>>>> What versions of HDFS are currently supported and what HDFS versions are
>>>> planned for support?  It would be great to be able to run YARN on legacy
>>>> HDFS installations.
>>>>
>>>> Thanks,
>>>>
>>>> Avery

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

Posted by Arun C Murthy <ac...@hortonworks.com>.

Avery, 

 They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.

 We have used the new HDFS apis in YARN in some places.

hth,
Arun

On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:

> Thank you for the response, that's what I thought as well =).  I spent the day trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of API changes!
> 
> Avery
> 
> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>> Avery,
>>  Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>> wrong but looking at the HDFS apis' it doesnt look like that it would
>> be a lot of work to getting it to work with 0.20 apis. We had been
>> using filecontext api's initially but have transitioned back to the
>> old API's.
>> 
>> Hope that helps.
>> 
>> mahadev
>> 
>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org>  wrote:
>>> Hi,
>>> 
>>> I've been playing with 0.23.0, really nice stuff!  I was able to setup a
>>> small test cluster (40 nodes) and launch the example jobs.  I was also able
>>> to recompile old Hadoop programs with the new jars and start up those
>>> programs as well.  My question is the following:
>>> 
>>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>>  This appears to be a bit of work.  Launching the jobs gives me the
>>> following error:
>>> 
>>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>>> {removed}.{xxx}/{removed}:50177
>>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>>> {removed}.{xxx}/{removed}:50177
>>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>>> (Cluster.java:initialize(116)) - Failed to use
>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>> Please check your configuration for mapreduce.framework.name and the
>>> correspond server addresses.
>>>    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>    at java.security.AccessController.doPrivileged(Native Method)
>>>    at javax.security.auth.Subject.doAs(Subject.java:396)
>>>    at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>    at
>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>    at
>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>    at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>    at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>> 
>>> After doing a little digging it appears that YarnClientProtocolProvider
>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>>> not available available in older versions of HDFS.
>>> 
>>> What versions of HDFS are currently supported and what HDFS versions are
>>> planned for support?  It would be great to be able to run YARN on legacy
>>> HDFS installations.
>>> 
>>> Thanks,
>>> 
>>> Avery
>

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

Posted by Avery Ching <ac...@apache.org>.

Thank you for the response, that's what I thought as well =).  I spent 
the day trying to port the required 0.23 APIs to 0.20 HDFS.  There have 
been a lot of API changes!

Avery

On 12/5/11 9:14 PM, Mahadev Konar wrote:
> Avery,
>   Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
> wrong but looking at the HDFS apis' it doesnt look like that it would
> be a lot of work to getting it to work with 0.20 apis. We had been
> using filecontext api's initially but have transitioned back to the
> old API's.
>
> Hope that helps.
>
> mahadev
>
> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<ac...@apache.org>  wrote:
>> Hi,
>>
>> I've been playing with 0.23.0, really nice stuff!  I was able to setup a
>> small test cluster (40 nodes) and launch the example jobs.  I was also able
>> to recompile old Hadoop programs with the new jars and start up those
>> programs as well.  My question is the following:
>>
>> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>>   This appears to be a bit of work.  Launching the jobs gives me the
>> following error:
>>
>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
>> {removed}.{xxx}/{removed}:50177
>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
>> {removed}.{xxx}/{removed}:50177
>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>> (Cluster.java:initialize(116)) - Failed to use
>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>> Please check your configuration for mapreduce.framework.name and the
>> correspond server addresses.
>>     at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>     at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>     at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>     at
>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>     at
>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>
>> After doing a little digging it appears that YarnClientProtocolProvider
>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
>> not available available in older versions of HDFS.
>>
>> What versions of HDFS are currently supported and what HDFS versions are
>> planned for support?  It would be great to be able to run YARN on legacy
>> HDFS installations.
>>
>> Thanks,
>>
>> Avery

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

Posted by Mahadev Konar <ma...@hortonworks.com>.

Avery,
 Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
wrong but looking at the HDFS apis' it doesnt look like that it would
be a lot of work to getting it to work with 0.20 apis. We had been
using filecontext api's initially but have transitioned back to the
old API's.

Hope that helps.

mahadev

On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching <ac...@apache.org> wrote:
> Hi,
>
> I've been playing with 0.23.0, really nice stuff!  I was able to setup a
> small test cluster (40 nodes) and launch the example jobs.  I was also able
> to recompile old Hadoop programs with the new jars and start up those
> programs as well.  My question is the following:
>
> We have an HDFS instance based on 0.20 that I would like to hook up to YARN.
>  This appears to be a bit of work.  Launching the jobs gives me the
> following error:
>
> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager at
> {removed}.{xxx}/{removed}:50177
> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy
> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager at
> {removed}.{xxx}/{removed}:50177
> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
> (Cluster.java:initialize(116)) - Failed to use
> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
> Please check your configuration for mapreduce.framework.name and the
> correspond server addresses.
>    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>    at
> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>    at
> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>
> After doing a little digging it appears that YarnClientProtocolProvider
> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that is
> not available available in older versions of HDFS.
>
> What versions of HDFS are currently supported and what HDFS versions are
> planned for support?  It would be great to be able to run YARN on legacy
> HDFS installations.
>
> Thanks,
>
> Avery