You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Chester Chen <ch...@alpinenow.com> on 2014/08/21 01:39:06 UTC

is Branch-1.1 SBT build broken for yarn-alpha ?

I just updated today's build and tried branch-1.1 for both yarn and
yarn-alpha.

For yarn build, this command seem to work fine.

sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects

for yarn-alpha

sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects

I got the following

Any ideas


Chester

᚛ |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
projects*

Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
as default JAVA_HOME.

Note, this will be overridden by -java-home if it is set.

[info] Loading project definition from
/Users/chester/projects/spark/project/project

[info] Loading project definition from
/Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project

[warn] Multiple resolvers having different access mechanism configured with
same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).

[info] Loading project definition from /Users/chester/projects/spark/project

org.apache.maven.model.building.ModelBuildingException: 1 problem was
encountered while building the effective model for
org.apache.spark:spark-yarn-alpha_2.10:1.1.0

*[FATAL] Non-resolvable parent POM: Could not find artifact
org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
http://repo.maven.apache.org/maven2 <http://repo.maven.apache.org/maven2>)
and 'parent.relativePath' points at wrong local POM @ line 20, column 11*


 at
org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)

at
org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)

at
org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)

at
org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)

at
org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)

at
com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)

at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)

at
com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)

at
com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

at
com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

at scala.collection.AbstractTraversable.map(Traversable.scala:105)

at
com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)

at
com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

at
com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)

at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

at scala.collection.AbstractTraversable.map(Traversable.scala:105)

at
com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)

at
com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)

at com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)

at SparkBuild$.projectDefinitions(SparkBuild.scala:165)

at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)

at sbt.Load$$anonfun$24.apply(Load.scala:415)

at sbt.Load$$anonfun$24.apply(Load.scala:415)

at scala.collection.immutable.Stream.flatMap(Stream.scala:442)

at sbt.Load$.loadUnit(Load.scala:415)

at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)

at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)

at
sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93)

at
sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92)

at sbt.BuildLoader.apply(BuildLoader.scala:143)

at sbt.Load$.loadAll(Load.scala:312)

at sbt.Load$.loadURI(Load.scala:264)

at sbt.Load$.load(Load.scala:260)

at sbt.Load$.load(Load.scala:251)

at sbt.Load$.apply(Load.scala:134)

at sbt.Load$.defaultLoad(Load.scala:37)

at sbt.BuiltinCommands$.doLoadProject(Main.scala:473)

at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)

at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)

at
sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)

at
sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)

at
sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)

at
sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)

at sbt.Command$.process(Command.scala:95)

at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)

at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)

at sbt.State$$anon$1.process(State.scala:179)

at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)

at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)

at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)

at sbt.MainLoop$.next(MainLoop.scala:100)

at sbt.MainLoop$.run(MainLoop.scala:93)

at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:71)

at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:66)

at sbt.Using.apply(Using.scala:25)

at sbt.MainLoop$.runWithNewLog(MainLoop.scala:66)

at sbt.MainLoop$.runAndClearLast(MainLoop.scala:49)

at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:33)

at sbt.MainLoop$.runLogged(MainLoop.scala:25)

at sbt.StandardMain$.runManaged(Main.scala:57)

at sbt.xMain.run(Main.scala:29)

at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)

at xsbt.boot.Launch$.withContextLoader(Launch.scala:129)

at xsbt.boot.Launch$.run(Launch.scala:109)

at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:36)

at xsbt.boot.Launch$.launch(Launch.scala:117)

at xsbt.boot.Launch$.apply(Launch.scala:19)

at xsbt.boot.Boot$.runImpl(Boot.scala:44)

at xsbt.boot.Boot$.main(Boot.scala:20)

at xsbt.boot.Boot.main(Boot.scala)

[error] org.apache.maven.model.building.ModelBuildingException: 1 problem
was encountered while building the effective model for
org.apache.spark:spark-yarn-alpha_2.10:1.1.0

[error] [FATAL] Non-resolvable parent POM: Could not find artifact
org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
http://repo.maven.apache.org/maven2) and 'parent.relativePath' points at
wrong local POM @ line 20, column 11

[error] Use 'last' for the full log.

Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

Posted by Sean Owen <so...@cloudera.com>.

Maven is just telling you that there is no version 1.1.0 of
yarn-parent, and indeed, it has not been released. To build the branch
you would need to "mvn install" to compile and make available local
copies of artifacts along the way. (You may have these for
1.1.0-SNAPSHOT locally already). Use Maven, not SBT, for building
releases.

On Thu, Aug 21, 2014 at 12:39 AM, Chester Chen <ch...@alpinenow.com> wrote:
> *[FATAL] Non-resolvable parent POM: Could not find artifact
> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> http://repo.maven.apache.org/maven2 <http://repo.maven.apache.org/maven2>)
> and 'parent.relativePath' points at wrong local POM @ line 20, column 11*

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

Posted by Chester Chen <ch...@alpinenow.com>.

Mridul,
     Thanks for the suggestion.

     I just updated the build today and changed the yarn/alpha/pom.xml to

   <version>1.1.1-SNAPSHOT</version>

then the command worked.

I will create a JIRA and PR for it.


Chester




On Thu, Aug 21, 2014 at 8:03 AM, Chester @work <ch...@alpinenow.com>
wrote:

> Do we have Jenkins tests these ? Should be pretty easy to setup just to
> test basic build
>
> Sent from my iPhone
>
> > On Aug 21, 2014, at 6:45 AM, Mridul Muralidharan <mr...@gmail.com>
> wrote:
> >
> > Weird that Patrick did not face this while creating the RC.
> > Essentially the yarn alpha pom.xml has not been updated properly in
> > the 1.1 branch.
> >
> > Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to
> > make it same as any other pom).
> >
> >
> > Regards,
> > Mridul
> >
> >
> >> On Thu, Aug 21, 2014 at 5:09 AM, Chester Chen <ch...@alpinenow.com>
> wrote:
> >> I just updated today's build and tried branch-1.1 for both yarn and
> >> yarn-alpha.
> >>
> >> For yarn build, this command seem to work fine.
> >>
> >> sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects
> >>
> >> for yarn-alpha
> >>
> >> sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects
> >>
> >> I got the following
> >>
> >> Any ideas
> >>
> >>
> >> Chester
> >>
> >> ᚛ |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
> >> projects*
> >>
> >> Using
> /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
> >> as default JAVA_HOME.
> >>
> >> Note, this will be overridden by -java-home if it is set.
> >>
> >> [info] Loading project definition from
> >> /Users/chester/projects/spark/project/project
> >>
> >> [info] Loading project definition from
> >>
> /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project
> >>
> >> [warn] Multiple resolvers having different access mechanism configured
> with
> >> same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
> >> project resolvers (`resolvers`) or rename publishing resolver
> (`publishTo`).
> >>
> >> [info] Loading project definition from
> /Users/chester/projects/spark/project
> >>
> >> org.apache.maven.model.building.ModelBuildingException: 1 problem was
> >> encountered while building the effective model for
> >> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
> >>
> >> *[FATAL] Non-resolvable parent POM: Could not find artifact
> >> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> >> http://repo.maven.apache.org/maven2 <
> http://repo.maven.apache.org/maven2>)
> >> and 'parent.relativePath' points at wrong local POM @ line 20, column
> 11*
> >>
> >>
> >> at
> >>
> org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)
> >>
> >> at
> >>
> org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)
> >>
> >> at
> >>
> org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)
> >>
> >> at
> >>
> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)
> >>
> >> at
> >>
> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)
> >>
> >> at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
> >>
> >> at
> >>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> >>
> >> at
> >>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> >>
> >> at
> >>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >>
> >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >>
> >> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> >>
> >> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
> >>
> >> at
> >>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> >>
> >> at
> >>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> >>
> >> at
> >>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >>
> >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >>
> >> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> >>
> >> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
> >>
> >> at
> >>
> com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)
> >>
> >> at
> com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)
> >>
> >> at SparkBuild$.projectDefinitions(SparkBuild.scala:165)
> >>
> >> at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)
> >>
> >> at sbt.Load$$anonfun$24.apply(Load.scala:415)
> >>
> >> at sbt.Load$$anonfun$24.apply(Load.scala:415)
> >>
> >> at scala.collection.immutable.Stream.flatMap(Stream.scala:442)
> >>
> >> at sbt.Load$.loadUnit(Load.scala:415)
> >>
> >> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
> >>
> >> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
> >>
> >> at
> >>
> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93)
> >>
> >> at
> >>
> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92)
> >>
> >> at sbt.BuildLoader.apply(BuildLoader.scala:143)
> >>
> >> at sbt.Load$.loadAll(Load.scala:312)
> >>
> >> at sbt.Load$.loadURI(Load.scala:264)
> >>
> >> at sbt.Load$.load(Load.scala:260)
> >>
> >> at sbt.Load$.load(Load.scala:251)
> >>
> >> at sbt.Load$.apply(Load.scala:134)
> >>
> >> at sbt.Load$.defaultLoad(Load.scala:37)
> >>
> >> at sbt.BuiltinCommands$.doLoadProject(Main.scala:473)
> >>
> >> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
> >>
> >> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
> >>
> >> at
> >>
> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
> >>
> >> at
> >>
> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
> >>
> >> at
> >>
> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
> >>
> >> at
> >>
> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
> >>
> >> at sbt.Command$.process(Command.scala:95)
> >>
> >> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
> >>
> >> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
> >>
> >> at sbt.State$$anon$1.process(State.scala:179)
> >>
> >> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
> >>
> >> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
> >>
> >> at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
> >>
> >> at sbt.MainLoop$.next(MainLoop.scala:100)
> >>
> >> at sbt.MainLoop$.run(MainLoop.scala:93)
> >>
> >> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:71)
> >>
> >> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:66)
> >>
> >> at sbt.Using.apply(Using.scala:25)
> >>
> >> at sbt.MainLoop$.runWithNewLog(MainLoop.scala:66)
> >>
> >> at sbt.MainLoop$.runAndClearLast(MainLoop.scala:49)
> >>
> >> at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:33)
> >>
> >> at sbt.MainLoop$.runLogged(MainLoop.scala:25)
> >>
> >> at sbt.StandardMain$.runManaged(Main.scala:57)
> >>
> >> at sbt.xMain.run(Main.scala:29)
> >>
> >> at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)
> >>
> >> at xsbt.boot.Launch$.withContextLoader(Launch.scala:129)
> >>
> >> at xsbt.boot.Launch$.run(Launch.scala:109)
> >>
> >> at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:36)
> >>
> >> at xsbt.boot.Launch$.launch(Launch.scala:117)
> >>
> >> at xsbt.boot.Launch$.apply(Launch.scala:19)
> >>
> >> at xsbt.boot.Boot$.runImpl(Boot.scala:44)
> >>
> >> at xsbt.boot.Boot$.main(Boot.scala:20)
> >>
> >> at xsbt.boot.Boot.main(Boot.scala)
> >>
> >> [error] org.apache.maven.model.building.ModelBuildingException: 1
> problem
> >> was encountered while building the effective model for
> >> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
> >>
> >> [error] [FATAL] Non-resolvable parent POM: Could not find artifact
> >> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> >> http://repo.maven.apache.org/maven2) and 'parent.relativePath' points
> at
> >> wrong local POM @ line 20, column 11
> >>
> >> [error] Use 'last' for the full log.
> >>
> >> Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q
>

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

Posted by "Chester @work" <ch...@alpinenow.com>.

Do we have Jenkins tests these ? Should be pretty easy to setup just to test basic build

Sent from my iPhone

> On Aug 21, 2014, at 6:45 AM, Mridul Muralidharan <mr...@gmail.com> wrote:
> 
> Weird that Patrick did not face this while creating the RC.
> Essentially the yarn alpha pom.xml has not been updated properly in
> the 1.1 branch.
> 
> Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to
> make it same as any other pom).
> 
> 
> Regards,
> Mridul
> 
> 
>> On Thu, Aug 21, 2014 at 5:09 AM, Chester Chen <ch...@alpinenow.com> wrote:
>> I just updated today's build and tried branch-1.1 for both yarn and
>> yarn-alpha.
>> 
>> For yarn build, this command seem to work fine.
>> 
>> sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects
>> 
>> for yarn-alpha
>> 
>> sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects
>> 
>> I got the following
>> 
>> Any ideas
>> 
>> 
>> Chester
>> 
>> ᚛ |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
>> projects*
>> 
>> Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
>> as default JAVA_HOME.
>> 
>> Note, this will be overridden by -java-home if it is set.
>> 
>> [info] Loading project definition from
>> /Users/chester/projects/spark/project/project
>> 
>> [info] Loading project definition from
>> /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project
>> 
>> [warn] Multiple resolvers having different access mechanism configured with
>> same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
>> project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
>> 
>> [info] Loading project definition from /Users/chester/projects/spark/project
>> 
>> org.apache.maven.model.building.ModelBuildingException: 1 problem was
>> encountered while building the effective model for
>> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
>> 
>> *[FATAL] Non-resolvable parent POM: Could not find artifact
>> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
>> http://repo.maven.apache.org/maven2 <http://repo.maven.apache.org/maven2>)
>> and 'parent.relativePath' points at wrong local POM @ line 20, column 11*
>> 
>> 
>> at
>> org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)
>> 
>> at
>> org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)
>> 
>> at
>> org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)
>> 
>> at
>> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)
>> 
>> at
>> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)
>> 
>> at
>> com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)
>> 
>> at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)
>> 
>> at
>> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)
>> 
>> at
>> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>> 
>> at
>> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>> 
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> 
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> 
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> 
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> 
>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>> 
>> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>> 
>> at
>> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
>> 
>> at
>> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>> 
>> at
>> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>> 
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> 
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>> 
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> 
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> 
>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>> 
>> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>> 
>> at
>> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
>> 
>> at
>> com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)
>> 
>> at com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)
>> 
>> at SparkBuild$.projectDefinitions(SparkBuild.scala:165)
>> 
>> at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)
>> 
>> at sbt.Load$$anonfun$24.apply(Load.scala:415)
>> 
>> at sbt.Load$$anonfun$24.apply(Load.scala:415)
>> 
>> at scala.collection.immutable.Stream.flatMap(Stream.scala:442)
>> 
>> at sbt.Load$.loadUnit(Load.scala:415)
>> 
>> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
>> 
>> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
>> 
>> at
>> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93)
>> 
>> at
>> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92)
>> 
>> at sbt.BuildLoader.apply(BuildLoader.scala:143)
>> 
>> at sbt.Load$.loadAll(Load.scala:312)
>> 
>> at sbt.Load$.loadURI(Load.scala:264)
>> 
>> at sbt.Load$.load(Load.scala:260)
>> 
>> at sbt.Load$.load(Load.scala:251)
>> 
>> at sbt.Load$.apply(Load.scala:134)
>> 
>> at sbt.Load$.defaultLoad(Load.scala:37)
>> 
>> at sbt.BuiltinCommands$.doLoadProject(Main.scala:473)
>> 
>> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
>> 
>> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
>> 
>> at
>> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
>> 
>> at
>> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
>> 
>> at
>> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
>> 
>> at
>> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
>> 
>> at sbt.Command$.process(Command.scala:95)
>> 
>> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
>> 
>> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
>> 
>> at sbt.State$$anon$1.process(State.scala:179)
>> 
>> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
>> 
>> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
>> 
>> at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
>> 
>> at sbt.MainLoop$.next(MainLoop.scala:100)
>> 
>> at sbt.MainLoop$.run(MainLoop.scala:93)
>> 
>> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:71)
>> 
>> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:66)
>> 
>> at sbt.Using.apply(Using.scala:25)
>> 
>> at sbt.MainLoop$.runWithNewLog(MainLoop.scala:66)
>> 
>> at sbt.MainLoop$.runAndClearLast(MainLoop.scala:49)
>> 
>> at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:33)
>> 
>> at sbt.MainLoop$.runLogged(MainLoop.scala:25)
>> 
>> at sbt.StandardMain$.runManaged(Main.scala:57)
>> 
>> at sbt.xMain.run(Main.scala:29)
>> 
>> at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)
>> 
>> at xsbt.boot.Launch$.withContextLoader(Launch.scala:129)
>> 
>> at xsbt.boot.Launch$.run(Launch.scala:109)
>> 
>> at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:36)
>> 
>> at xsbt.boot.Launch$.launch(Launch.scala:117)
>> 
>> at xsbt.boot.Launch$.apply(Launch.scala:19)
>> 
>> at xsbt.boot.Boot$.runImpl(Boot.scala:44)
>> 
>> at xsbt.boot.Boot$.main(Boot.scala:20)
>> 
>> at xsbt.boot.Boot.main(Boot.scala)
>> 
>> [error] org.apache.maven.model.building.ModelBuildingException: 1 problem
>> was encountered while building the effective model for
>> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
>> 
>> [error] [FATAL] Non-resolvable parent POM: Could not find artifact
>> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
>> http://repo.maven.apache.org/maven2) and 'parent.relativePath' points at
>> wrong local POM @ line 20, column 11
>> 
>> [error] Use 'last' for the full log.
>> 
>> Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

Posted by Mridul Muralidharan <mr...@gmail.com>.

Weird that Patrick did not face this while creating the RC.
Essentially the yarn alpha pom.xml has not been updated properly in
the 1.1 branch.

Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to
make it same as any other pom).


Regards,
Mridul


On Thu, Aug 21, 2014 at 5:09 AM, Chester Chen <ch...@alpinenow.com> wrote:
> I just updated today's build and tried branch-1.1 for both yarn and
> yarn-alpha.
>
> For yarn build, this command seem to work fine.
>
> sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects
>
> for yarn-alpha
>
> sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects
>
> I got the following
>
> Any ideas
>
>
> Chester
>
> ᚛ |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
> projects*
>
> Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
> as default JAVA_HOME.
>
> Note, this will be overridden by -java-home if it is set.
>
> [info] Loading project definition from
> /Users/chester/projects/spark/project/project
>
> [info] Loading project definition from
> /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project
>
> [warn] Multiple resolvers having different access mechanism configured with
> same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
> project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
>
> [info] Loading project definition from /Users/chester/projects/spark/project
>
> org.apache.maven.model.building.ModelBuildingException: 1 problem was
> encountered while building the effective model for
> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
>
> *[FATAL] Non-resolvable parent POM: Could not find artifact
> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> http://repo.maven.apache.org/maven2 <http://repo.maven.apache.org/maven2>)
> and 'parent.relativePath' points at wrong local POM @ line 20, column 11*
>
>
>  at
> org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)
>
> at
> com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)
>
> at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)
>
> at com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)
>
> at SparkBuild$.projectDefinitions(SparkBuild.scala:165)
>
> at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)
>
> at sbt.Load$$anonfun$24.apply(Load.scala:415)
>
> at sbt.Load$$anonfun$24.apply(Load.scala:415)
>
> at scala.collection.immutable.Stream.flatMap(Stream.scala:442)
>
> at sbt.Load$.loadUnit(Load.scala:415)
>
> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
>
> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
>
> at
> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93)
>
> at
> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92)
>
> at sbt.BuildLoader.apply(BuildLoader.scala:143)
>
> at sbt.Load$.loadAll(Load.scala:312)
>
> at sbt.Load$.loadURI(Load.scala:264)
>
> at sbt.Load$.load(Load.scala:260)
>
> at sbt.Load$.load(Load.scala:251)
>
> at sbt.Load$.apply(Load.scala:134)
>
> at sbt.Load$.defaultLoad(Load.scala:37)
>
> at sbt.BuiltinCommands$.doLoadProject(Main.scala:473)
>
> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
>
> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
>
> at
> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
>
> at
> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
>
> at
> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
>
> at
> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
>
> at sbt.Command$.process(Command.scala:95)
>
> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
>
> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
>
> at sbt.State$$anon$1.process(State.scala:179)
>
> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
>
> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
>
> at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
>
> at sbt.MainLoop$.next(MainLoop.scala:100)
>
> at sbt.MainLoop$.run(MainLoop.scala:93)
>
> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:71)
>
> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:66)
>
> at sbt.Using.apply(Using.scala:25)
>
> at sbt.MainLoop$.runWithNewLog(MainLoop.scala:66)
>
> at sbt.MainLoop$.runAndClearLast(MainLoop.scala:49)
>
> at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:33)
>
> at sbt.MainLoop$.runLogged(MainLoop.scala:25)
>
> at sbt.StandardMain$.runManaged(Main.scala:57)
>
> at sbt.xMain.run(Main.scala:29)
>
> at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)
>
> at xsbt.boot.Launch$.withContextLoader(Launch.scala:129)
>
> at xsbt.boot.Launch$.run(Launch.scala:109)
>
> at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:36)
>
> at xsbt.boot.Launch$.launch(Launch.scala:117)
>
> at xsbt.boot.Launch$.apply(Launch.scala:19)
>
> at xsbt.boot.Boot$.runImpl(Boot.scala:44)
>
> at xsbt.boot.Boot$.main(Boot.scala:20)
>
> at xsbt.boot.Boot.main(Boot.scala)
>
> [error] org.apache.maven.model.building.ModelBuildingException: 1 problem
> was encountered while building the effective model for
> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
>
> [error] [FATAL] Non-resolvable parent POM: Could not find artifact
> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> http://repo.maven.apache.org/maven2) and 'parent.relativePath' points at
> wrong local POM @ line 20, column 11
>
> [error] Use 'last' for the full log.
>
> Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

RE: RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.

Can you please send out ZKFC logs and configurations..

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 2:20 PM
To: user
Subject: Re: RE: Failed to active namenode when config HA

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>;
Subject:  RE: Failed to active namenode when config HA

You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

RE: RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.

Can you please send out ZKFC logs and configurations..

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 2:20 PM
To: user
Subject: Re: RE: Failed to active namenode when config HA

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>;
Subject:  RE: Failed to active namenode when config HA

You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

RE: RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.

Can you please send out ZKFC logs and configurations..

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 2:20 PM
To: user
Subject: Re: RE: Failed to active namenode when config HA

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>;
Subject:  RE: Failed to active namenode when config HA

You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

RE: RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.

Can you please send out ZKFC logs and configurations..

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 2:20 PM
To: user
Subject: Re: RE: Failed to active namenode when config HA

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>;
Subject:  RE: Failed to active namenode when config HA

You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

Re: RE: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA

 You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation  of automatic HDFS failover relies on ZooKeeper for the following things:

 Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating  that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:

 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's  support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the  election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active  state.

 Please go through following link for more details..

 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

 From: 清如许 [475053586@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA

 Hi, Matt

 Thank you very much for your response!

 There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

 There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.

 If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

 Do you used to configure HA&Federation and know what may cause these problem?

 Thanks,
 Lucy

 ------------------ Original ------------------
  From:  "Matt Narrell";<ma...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<us...@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA

 I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>

     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>

     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>

     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>

 mn

 On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

Re: RE: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA

 You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation  of automatic HDFS failover relies on ZooKeeper for the following things:

 Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating  that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:

 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's  support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the  election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active  state.

 Please go through following link for more details..

 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

 From: 清如许 [475053586@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA

 Hi, Matt

 Thank you very much for your response!

 There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

 There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.

 If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

 Do you used to configure HA&Federation and know what may cause these problem?

 Thanks,
 Lucy

 ------------------ Original ------------------
  From:  "Matt Narrell";<ma...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<us...@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA

 I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>

     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>

     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>

     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>

 mn

 On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

Re: RE: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA

 You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation  of automatic HDFS failover relies on ZooKeeper for the following things:

 Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating  that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:

 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's  support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the  election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active  state.

 Please go through following link for more details..

 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

 From: 清如许 [475053586@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA

 Hi, Matt

 Thank you very much for your response!

 There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

 There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.

 If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

 Do you used to configure HA&Federation and know what may cause these problem?

 Thanks,
 Lucy

 ------------------ Original ------------------
  From:  "Matt Narrell";<ma...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<us...@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA

 I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>

     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>

     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>

     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>

 mn

 On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

Re: RE: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA

 You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation  of automatic HDFS failover relies on ZooKeeper for the following things:

 Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating  that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:

 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's  support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the  election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active  state.

 Please go through following link for more details..

 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

 From: 清如许 [475053586@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA

 Hi, Matt

 Thank you very much for your response!

 There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

 There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.

 If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

 Do you used to configure HA&Federation and know what may cause these problem?

 Thanks,
 Lucy

 ------------------ Original ------------------
  From:  "Matt Narrell";<ma...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<us...@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA

 I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>

     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>

     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>

     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>

 mn

 On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.

You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.

Lucy, 

I’m sorry, I’m only doing HDFS HA, not federated HDFS.

mn

On Sep 29, 2014, at 9:24 PM, 清如许 <47...@qq.com> wrote:

> Hi, Matt
> 
> Thank you very much for your response!
> 
> There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
> 
> There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
> <name>dfs.nameservices</name>
> and each nameservice will have two namenodes.
> 
> If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
> 
> But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
> hdfs haadmin -transitionToActive nn1
> HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
> 
> Do you used to configure HA&Federation and know what may cause these problem?
> 
> Thanks,
> Lucy
> 
> ------------------ Original ------------------
> From:  "Matt Narrell";<ma...@gmail.com>;
> Send time: Monday, Sep 29, 2014 6:28 AM
> To: "user"<us...@hadoop.apache.org>;
> Subject:  Re: Failed to active namenode when config HA
> 
> I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
> 
> Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
> 
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>3</value>
>   </property>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/var/data/hadoop/hdfs/nn</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/var/data/hadoop/hdfs/dn</value>
>   </property>
> 
>     <property>
>       <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
>     </property>
>     <property>
>       <name>dfs.nameservices</name>
>       <value>hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.namenodes.hdfs-cluster</name>
>       <value>nn1,nn2</value>
>     </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
>         <value>namenode1:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
>         <value>namenode1:50070</value>
>       </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
>         <value>namenode2:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
>         <value>namenode2:50070</value>
>       </property>
> 
>     <property>
>       <name>dfs.namenode.shared.edits.dir</name>
>       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.fencing.methods</name>
>       <value>sshfence</value>
>     </property>
>     <property>
>       <name>dfs.ha.fencing.ssh.private-key-files</name>
>       <value>/home/hadoop/.ssh/id_rsa</value>
>     </property>
> </configuration>
> 
> mn
> 
> On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
> 
> > Hi,
> > 
> > I'm new to hadoop and meet some problems when config HA.
> > Below are some important configuration in core-site.xml
> > 
> >   <property>
> >     <name>dfs.nameservices</name>
> >     <value>ns1,ns2</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns1</name>
> >     <value>nn1,nn3</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns2</name>
> >     <value>nn2,nn4</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
> >     <value>namenode1:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
> >     <value>namenode3:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
> >     <value>namenode2:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
> >     <value>namenode4:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.shared.edits.dir</name>
> >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
> >   </property>
> >   <property>
> >     <name>dfs.client.failover.proxy.provider.ns1</name>
> >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.methods</name>
> >     <value>sshfence</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.private-key-files</name>
> >     <value>/home/hduser/.ssh/id_rsa</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
> >     <value>30000</value>
> >   </property>
> >   <property>
> >     <name>dfs.journalnode.edits.dir</name>
> >     <value>/home/hduser/mydata/hdfs/journalnode</value>
> >   </property>
> > 
> > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> > 
> > After configuration, I did the following steps
> > firstly,  I start jornalnode on datanode2,datanode3,datanode4
> > secondly I format datanode1 and start namenode on it
> > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> > 
> > Everything seems fine unless no namenode is active now, then i tried to active one by running 
> > hdfs haadmin -transitionToActive nn1 on namenode1
> > but strangely it says "Illegal argument: Unable to determine the nameservice id."
> > 
> > Could anyone tell me why it cannot determine nn1 from my configuration?
> > Is there something wrong in my configuraion?
> > 
> > Thanks a lot!!!
> > 
> >

RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.

You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.

Lucy, 

I’m sorry, I’m only doing HDFS HA, not federated HDFS.

mn

On Sep 29, 2014, at 9:24 PM, 清如许 <47...@qq.com> wrote:

> Hi, Matt
> 
> Thank you very much for your response!
> 
> There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
> 
> There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
> <name>dfs.nameservices</name>
> and each nameservice will have two namenodes.
> 
> If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
> 
> But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
> hdfs haadmin -transitionToActive nn1
> HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
> 
> Do you used to configure HA&Federation and know what may cause these problem?
> 
> Thanks,
> Lucy
> 
> ------------------ Original ------------------
> From:  "Matt Narrell";<ma...@gmail.com>;
> Send time: Monday, Sep 29, 2014 6:28 AM
> To: "user"<us...@hadoop.apache.org>;
> Subject:  Re: Failed to active namenode when config HA
> 
> I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
> 
> Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
> 
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>3</value>
>   </property>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/var/data/hadoop/hdfs/nn</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/var/data/hadoop/hdfs/dn</value>
>   </property>
> 
>     <property>
>       <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
>     </property>
>     <property>
>       <name>dfs.nameservices</name>
>       <value>hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.namenodes.hdfs-cluster</name>
>       <value>nn1,nn2</value>
>     </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
>         <value>namenode1:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
>         <value>namenode1:50070</value>
>       </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
>         <value>namenode2:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
>         <value>namenode2:50070</value>
>       </property>
> 
>     <property>
>       <name>dfs.namenode.shared.edits.dir</name>
>       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.fencing.methods</name>
>       <value>sshfence</value>
>     </property>
>     <property>
>       <name>dfs.ha.fencing.ssh.private-key-files</name>
>       <value>/home/hadoop/.ssh/id_rsa</value>
>     </property>
> </configuration>
> 
> mn
> 
> On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
> 
> > Hi,
> > 
> > I'm new to hadoop and meet some problems when config HA.
> > Below are some important configuration in core-site.xml
> > 
> >   <property>
> >     <name>dfs.nameservices</name>
> >     <value>ns1,ns2</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns1</name>
> >     <value>nn1,nn3</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns2</name>
> >     <value>nn2,nn4</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
> >     <value>namenode1:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
> >     <value>namenode3:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
> >     <value>namenode2:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
> >     <value>namenode4:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.shared.edits.dir</name>
> >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
> >   </property>
> >   <property>
> >     <name>dfs.client.failover.proxy.provider.ns1</name>
> >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.methods</name>
> >     <value>sshfence</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.private-key-files</name>
> >     <value>/home/hduser/.ssh/id_rsa</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
> >     <value>30000</value>
> >   </property>
> >   <property>
> >     <name>dfs.journalnode.edits.dir</name>
> >     <value>/home/hduser/mydata/hdfs/journalnode</value>
> >   </property>
> > 
> > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> > 
> > After configuration, I did the following steps
> > firstly,  I start jornalnode on datanode2,datanode3,datanode4
> > secondly I format datanode1 and start namenode on it
> > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> > 
> > Everything seems fine unless no namenode is active now, then i tried to active one by running 
> > hdfs haadmin -transitionToActive nn1 on namenode1
> > but strangely it says "Illegal argument: Unable to determine the nameservice id."
> > 
> > Could anyone tell me why it cannot determine nn1 from my configuration?
> > Is there something wrong in my configuraion?
> > 
> > Thanks a lot!!!
> > 
> >

RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.

You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.

You need to start the ZKFC process which will monitor and manage  the state of namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.

Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.

Lucy, 

I’m sorry, I’m only doing HDFS HA, not federated HDFS.

mn

On Sep 29, 2014, at 9:24 PM, 清如许 <47...@qq.com> wrote:

> Hi, Matt
> 
> Thank you very much for your response!
> 
> There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
> 
> There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
> <name>dfs.nameservices</name>
> and each nameservice will have two namenodes.
> 
> If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
> 
> But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
> hdfs haadmin -transitionToActive nn1
> HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
> 
> Do you used to configure HA&Federation and know what may cause these problem?
> 
> Thanks,
> Lucy
> 
> ------------------ Original ------------------
> From:  "Matt Narrell";<ma...@gmail.com>;
> Send time: Monday, Sep 29, 2014 6:28 AM
> To: "user"<us...@hadoop.apache.org>;
> Subject:  Re: Failed to active namenode when config HA
> 
> I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
> 
> Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
> 
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>3</value>
>   </property>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/var/data/hadoop/hdfs/nn</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/var/data/hadoop/hdfs/dn</value>
>   </property>
> 
>     <property>
>       <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
>     </property>
>     <property>
>       <name>dfs.nameservices</name>
>       <value>hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.namenodes.hdfs-cluster</name>
>       <value>nn1,nn2</value>
>     </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
>         <value>namenode1:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
>         <value>namenode1:50070</value>
>       </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
>         <value>namenode2:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
>         <value>namenode2:50070</value>
>       </property>
> 
>     <property>
>       <name>dfs.namenode.shared.edits.dir</name>
>       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.fencing.methods</name>
>       <value>sshfence</value>
>     </property>
>     <property>
>       <name>dfs.ha.fencing.ssh.private-key-files</name>
>       <value>/home/hadoop/.ssh/id_rsa</value>
>     </property>
> </configuration>
> 
> mn
> 
> On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
> 
> > Hi,
> > 
> > I'm new to hadoop and meet some problems when config HA.
> > Below are some important configuration in core-site.xml
> > 
> >   <property>
> >     <name>dfs.nameservices</name>
> >     <value>ns1,ns2</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns1</name>
> >     <value>nn1,nn3</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns2</name>
> >     <value>nn2,nn4</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
> >     <value>namenode1:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
> >     <value>namenode3:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
> >     <value>namenode2:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
> >     <value>namenode4:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.shared.edits.dir</name>
> >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
> >   </property>
> >   <property>
> >     <name>dfs.client.failover.proxy.provider.ns1</name>
> >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.methods</name>
> >     <value>sshfence</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.private-key-files</name>
> >     <value>/home/hduser/.ssh/id_rsa</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
> >     <value>30000</value>
> >   </property>
> >   <property>
> >     <name>dfs.journalnode.edits.dir</name>
> >     <value>/home/hduser/mydata/hdfs/journalnode</value>
> >   </property>
> > 
> > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> > 
> > After configuration, I did the following steps
> > firstly,  I start jornalnode on datanode2,datanode3,datanode4
> > secondly I format datanode1 and start namenode on it
> > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> > 
> > Everything seems fine unless no namenode is active now, then i tried to active one by running 
> > hdfs haadmin -transitionToActive nn1 on namenode1
> > but strangely it says "Illegal argument: Unable to determine the nameservice id."
> > 
> > Could anyone tell me why it cannot determine nn1 from my configuration?
> > Is there something wrong in my configuraion?
> > 
> > Thanks a lot!!!
> > 
> >

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.

Lucy, 

I’m sorry, I’m only doing HDFS HA, not federated HDFS.

mn

On Sep 29, 2014, at 9:24 PM, 清如许 <47...@qq.com> wrote:

> Hi, Matt
> 
> Thank you very much for your response!
> 
> There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
> 
> There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
> <name>dfs.nameservices</name>
> and each nameservice will have two namenodes.
> 
> If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
> 
> But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
> hdfs haadmin -transitionToActive nn1
> HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
> 
> Do you used to configure HA&Federation and know what may cause these problem?
> 
> Thanks,
> Lucy
> 
> ------------------ Original ------------------
> From:  "Matt Narrell";<ma...@gmail.com>;
> Send time: Monday, Sep 29, 2014 6:28 AM
> To: "user"<us...@hadoop.apache.org>;
> Subject:  Re: Failed to active namenode when config HA
> 
> I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
> 
> Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
> 
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>3</value>
>   </property>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/var/data/hadoop/hdfs/nn</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/var/data/hadoop/hdfs/dn</value>
>   </property>
> 
>     <property>
>       <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
>     </property>
>     <property>
>       <name>dfs.nameservices</name>
>       <value>hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.namenodes.hdfs-cluster</name>
>       <value>nn1,nn2</value>
>     </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
>         <value>namenode1:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
>         <value>namenode1:50070</value>
>       </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
>         <value>namenode2:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
>         <value>namenode2:50070</value>
>       </property>
> 
>     <property>
>       <name>dfs.namenode.shared.edits.dir</name>
>       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.fencing.methods</name>
>       <value>sshfence</value>
>     </property>
>     <property>
>       <name>dfs.ha.fencing.ssh.private-key-files</name>
>       <value>/home/hadoop/.ssh/id_rsa</value>
>     </property>
> </configuration>
> 
> mn
> 
> On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
> 
> > Hi,
> > 
> > I'm new to hadoop and meet some problems when config HA.
> > Below are some important configuration in core-site.xml
> > 
> >   <property>
> >     <name>dfs.nameservices</name>
> >     <value>ns1,ns2</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns1</name>
> >     <value>nn1,nn3</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns2</name>
> >     <value>nn2,nn4</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
> >     <value>namenode1:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
> >     <value>namenode3:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
> >     <value>namenode2:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
> >     <value>namenode4:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.shared.edits.dir</name>
> >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
> >   </property>
> >   <property>
> >     <name>dfs.client.failover.proxy.provider.ns1</name>
> >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.methods</name>
> >     <value>sshfence</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.private-key-files</name>
> >     <value>/home/hduser/.ssh/id_rsa</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
> >     <value>30000</value>
> >   </property>
> >   <property>
> >     <name>dfs.journalnode.edits.dir</name>
> >     <value>/home/hduser/mydata/hdfs/journalnode</value>
> >   </property>
> > 
> > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> > 
> > After configuration, I did the following steps
> > firstly,  I start jornalnode on datanode2,datanode3,datanode4
> > secondly I format datanode1 and start namenode on it
> > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> > 
> > Everything seems fine unless no namenode is active now, then i tried to active one by running 
> > hdfs haadmin -transitionToActive nn1 on namenode1
> > but strangely it says "Illegal argument: Unable to determine the nameservice id."
> > 
> > Could anyone tell me why it cannot determine nn1 from my configuration?
> > Is there something wrong in my configuraion?
> > 
> > Thanks a lot!!!
> > 
> >

Re: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy


------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>; 

Subject:  Re: Failed to active namenode when config HA



I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy


------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>; 

Subject:  Re: Failed to active namenode when config HA



I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy


------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>; 

Subject:  Re: Failed to active namenode when config HA



I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy


------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>; 

Subject:  Re: Failed to active namenode when config HA



I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

回复： is Branch-1.1 SBT build broken for yarn-alpha ?

Posted by witgo <wi...@qq.com>.

There's a related discussion 
https://issues.apache.org/jira/browse/SPARK-2815




------------------ 原始邮件 ------------------
发件人: "Chester Chen"<ch...@alpinenow.com>; 
发送时间: 2014年8月21日(星期四) 上午7:42
收件人: "dev"<de...@spark.apache.org>; 
主题: Re: is Branch-1.1 SBT build broken for yarn-alpha ?



Just tried on master branch, and the master branch works fine for yarn-alpha


On Wed, Aug 20, 2014 at 4:39 PM, Chester Chen <ch...@alpinenow.com> wrote:

> I just updated today's build and tried branch-1.1 for both yarn and
> yarn-alpha.
>
> For yarn build, this command seem to work fine.
>
> sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects
>
> for yarn-alpha
>
> sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects
>
> I got the following
>
> Any ideas
>
>
> Chester
>
> ᚛ |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
> projects*
>
> Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
> as default JAVA_HOME.
>
> Note, this will be overridden by -java-home if it is set.
>
> [info] Loading project definition from
> /Users/chester/projects/spark/project/project
>
> [info] Loading project definition from
> /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project
>
> [warn] Multiple resolvers having different access mechanism configured
> with same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
> project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
>
> [info] Loading project definition from
> /Users/chester/projects/spark/project
>
> org.apache.maven.model.building.ModelBuildingException: 1 problem was
> encountered while building the effective model for
> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
>
> *[FATAL] Non-resolvable parent POM: Could not find artifact
> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> http://repo.maven.apache.org/maven2 <http://repo.maven.apache.org/maven2>)
> and 'parent.relativePath' points at wrong local POM @ line 20, column 11*
>
>
>  at
> org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)
>
> at
> com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)
>
> at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)
>
> at
> com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)
>
> at SparkBuild$.projectDefinitions(SparkBuild.scala:165)
>
> at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)
>
> at sbt.Load$$anonfun$24.apply(Load.scala:415)
>
> at sbt.Load$$anonfun$24.apply(Load.scala:415)
>
> at scala.collection.immutable.Stream.flatMap(Stream.scala:442)
>
> at sbt.Load$.loadUnit(Load.scala:415)
>
> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
>
> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
>
> at
> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93)
>
> at
> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92)
>
> at sbt.BuildLoader.apply(BuildLoader.scala:143)
>
> at sbt.Load$.loadAll(Load.scala:312)
>
> at sbt.Load$.loadURI(Load.scala:264)
>
> at sbt.Load$.load(Load.scala:260)
>
> at sbt.Load$.load(Load.scala:251)
>
> at sbt.Load$.apply(Load.scala:134)
>
> at sbt.Load$.defaultLoad(Load.scala:37)
>
> at sbt.BuiltinCommands$.doLoadProject(Main.scala:473)
>
> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
>
> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
>
> at
> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
>
> at
> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
>
> at
> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
>
> at
> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
>
> at sbt.Command$.process(Command.scala:95)
>
> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
>
> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
>
> at sbt.State$$anon$1.process(State.scala:179)
>
> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
>
> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
>
> at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
>
> at sbt.MainLoop$.next(MainLoop.scala:100)
>
> at sbt.MainLoop$.run(MainLoop.scala:93)
>
> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:71)
>
> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:66)
>
> at sbt.Using.apply(Using.scala:25)
>
> at sbt.MainLoop$.runWithNewLog(MainLoop.scala:66)
>
> at sbt.MainLoop$.runAndClearLast(MainLoop.scala:49)
>
> at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:33)
>
> at sbt.MainLoop$.runLogged(MainLoop.scala:25)
>
> at sbt.StandardMain$.runManaged(Main.scala:57)
>
> at sbt.xMain.run(Main.scala:29)
>
> at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)
>
> at xsbt.boot.Launch$.withContextLoader(Launch.scala:129)
>
> at xsbt.boot.Launch$.run(Launch.scala:109)
>
> at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:36)
>
> at xsbt.boot.Launch$.launch(Launch.scala:117)
>
> at xsbt.boot.Launch$.apply(Launch.scala:19)
>
> at xsbt.boot.Boot$.runImpl(Boot.scala:44)
>
> at xsbt.boot.Boot$.main(Boot.scala:20)
>
> at xsbt.boot.Boot.main(Boot.scala)
>
> [error] org.apache.maven.model.building.ModelBuildingException: 1 problem
> was encountered while building the effective model for
> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
>
> [error] [FATAL] Non-resolvable parent POM: Could not find artifact
> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> http://repo.maven.apache.org/maven2) and 'parent.relativePath' points at
> wrong local POM @ line 20, column 11
>
> [error] Use 'last' for the full log.
>
> Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q
>

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

Posted by Chester Chen <ch...@alpinenow.com>.

Just tried on master branch, and the master branch works fine for yarn-alpha


On Wed, Aug 20, 2014 at 4:39 PM, Chester Chen <ch...@alpinenow.com> wrote:

> I just updated today's build and tried branch-1.1 for both yarn and
> yarn-alpha.
>
> For yarn build, this command seem to work fine.
>
> sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects
>
> for yarn-alpha
>
> sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects
>
> I got the following
>
> Any ideas
>
>
> Chester
>
> ᚛ |branch-1.1|$  *sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha
> projects*
>
> Using /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
> as default JAVA_HOME.
>
> Note, this will be overridden by -java-home if it is set.
>
> [info] Loading project definition from
> /Users/chester/projects/spark/project/project
>
> [info] Loading project definition from
> /Users/chester/.sbt/0.13/staging/ec3aa8f39111944cc5f2/sbt-pom-reader/project
>
> [warn] Multiple resolvers having different access mechanism configured
> with same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
> project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
>
> [info] Loading project definition from
> /Users/chester/projects/spark/project
>
> org.apache.maven.model.building.ModelBuildingException: 1 problem was
> encountered while building the effective model for
> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
>
> *[FATAL] Non-resolvable parent POM: Could not find artifact
> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> http://repo.maven.apache.org/maven2 <http://repo.maven.apache.org/maven2>)
> and 'parent.relativePath' points at wrong local POM @ line 20, column 11*
>
>
>  at
> org.apache.maven.model.building.DefaultModelProblemCollector.newModelBuildingException(DefaultModelProblemCollector.java:195)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:841)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310)
>
> at
> org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232)
>
> at
> com.typesafe.sbt.pom.MvnPomResolver.loadEffectivePom(MavenPomResolver.scala:61)
>
> at com.typesafe.sbt.pom.package$.loadEffectivePom(package.scala:41)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:128)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$$anonfun$12.apply(MavenProjectHelper.scala:129)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeProjectTree(MavenProjectHelper.scala:129)
>
> at
> com.typesafe.sbt.pom.MavenProjectHelper$.makeReactorProject(MavenProjectHelper.scala:49)
>
> at
> com.typesafe.sbt.pom.PomBuild$class.projectDefinitions(PomBuild.scala:28)
>
> at SparkBuild$.projectDefinitions(SparkBuild.scala:165)
>
> at sbt.Load$.sbt$Load$$projectsFromBuild(Load.scala:458)
>
> at sbt.Load$$anonfun$24.apply(Load.scala:415)
>
> at sbt.Load$$anonfun$24.apply(Load.scala:415)
>
> at scala.collection.immutable.Stream.flatMap(Stream.scala:442)
>
> at sbt.Load$.loadUnit(Load.scala:415)
>
> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
>
> at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:256)
>
> at
> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93)
>
> at
> sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92)
>
> at sbt.BuildLoader.apply(BuildLoader.scala:143)
>
> at sbt.Load$.loadAll(Load.scala:312)
>
> at sbt.Load$.loadURI(Load.scala:264)
>
> at sbt.Load$.load(Load.scala:260)
>
> at sbt.Load$.load(Load.scala:251)
>
> at sbt.Load$.apply(Load.scala:134)
>
> at sbt.Load$.defaultLoad(Load.scala:37)
>
> at sbt.BuiltinCommands$.doLoadProject(Main.scala:473)
>
> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
>
> at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:467)
>
> at
> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
>
> at
> sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:60)
>
> at
> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
>
> at
> sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:62)
>
> at sbt.Command$.process(Command.scala:95)
>
> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
>
> at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
>
> at sbt.State$$anon$1.process(State.scala:179)
>
> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
>
> at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
>
> at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
>
> at sbt.MainLoop$.next(MainLoop.scala:100)
>
> at sbt.MainLoop$.run(MainLoop.scala:93)
>
> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:71)
>
> at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:66)
>
> at sbt.Using.apply(Using.scala:25)
>
> at sbt.MainLoop$.runWithNewLog(MainLoop.scala:66)
>
> at sbt.MainLoop$.runAndClearLast(MainLoop.scala:49)
>
> at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:33)
>
> at sbt.MainLoop$.runLogged(MainLoop.scala:25)
>
> at sbt.StandardMain$.runManaged(Main.scala:57)
>
> at sbt.xMain.run(Main.scala:29)
>
> at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)
>
> at xsbt.boot.Launch$.withContextLoader(Launch.scala:129)
>
> at xsbt.boot.Launch$.run(Launch.scala:109)
>
> at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:36)
>
> at xsbt.boot.Launch$.launch(Launch.scala:117)
>
> at xsbt.boot.Launch$.apply(Launch.scala:19)
>
> at xsbt.boot.Boot$.runImpl(Boot.scala:44)
>
> at xsbt.boot.Boot$.main(Boot.scala:20)
>
> at xsbt.boot.Boot.main(Boot.scala)
>
> [error] org.apache.maven.model.building.ModelBuildingException: 1 problem
> was encountered while building the effective model for
> org.apache.spark:spark-yarn-alpha_2.10:1.1.0
>
> [error] [FATAL] Non-resolvable parent POM: Could not find artifact
> org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central (
> http://repo.maven.apache.org/maven2) and 'parent.relativePath' points at
> wrong local POM @ line 20, column 11
>
> [error] Use 'last' for the full log.
>
> Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q
>