You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Milinda Pathirage <mp...@umail.iu.edu> on 2013/10/15 21:23:25 UTC

Some questions related to Giraph Pur YARN implementation

Hi Eli,

I tried scripts (giraph, giraph-env) found in bin directory to run
Giraph sample mentioned in quick start guide. But I face some issues
and had to do some patching to get it into a working state (Job
submission works, but execution fails). Below are some things I
noticed:

  1. giraph script in 'bin' directory uses -libjars option. But this
doesn't work with GiraphYarnClient. It should be -yj.
  2. We need to add $GIRAPH_HOME + $VERTEX_IMPL_JAR_DIR (directory
containing vertex implementation jar) to CLASSPATH manually due to the
way YarnUtils.getLocalFiles is implemented. Basically we should add
parent directories of Yarn Jars to class path. I am not sure which is
the correct solution
     * fixing get LocalFiles
     * CLASSPATH base method
  3. YarnUtils.populateJars method uses fileNames.contains(f.getName)
to decide adding jar to local resource map. But if we use giraph
script fileNames contains absolute paths of 'Yarn Lib Jars'. I got
this working by using getAbsolute paths instead of getName.
  4. After above changes we can successfully launch a job in YARN
cluster using giraph script. But job fails due to a file path issue.
When submitting job we serialize Giraph configuration to
giraph-conf.xml. But "giraph.yarn.libjars" property contains list of
files but with absolute paths from client machine which use to submit
the job. For example in my scenario giraph jar is
"/Users/mpathira/giraph-bin/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha.jar".
But GiraphApplicationMaster tries to access these files and fails
because the file is not there in HDFS with the above name.

If we only use jar names instead of paths for 'yarnjars' option we
should be able to fix 4. But I am not sure whether that is the correct
approach. May be we need to change how we serialize giraph-conf.xml in
to HDFS. We can use HDFS paths instead of paths from client machine.

@Eli
I really appreciate your comments regarding above. I can create a JIRA
ticket if needed.

Thanks
Milinda

-- 
Milinda Pathirage

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Re: Fwd: Some questions related to Giraph Pur YARN implementation

Posted by Matthew Laird <la...@sfu.ca>.
Hmm, sounds like Giraph + YARN is definitely on the bleeding edge... 
thanks for all the work you folks are doing to get it working, I guess 
I'll lurk on the dev list for a while until you guys figure the pieces 
out. :)

Thanks again!

On 13-10-15 01:14 PM, Milinda Pathirage wrote:
> Forwarding to user list.
>
> ---------- Forwarded message ----------
> From: Milinda Pathirage<mp...@umail.iu.edu>
> Date: Tue, Oct 15, 2013 at 3:23 PM
> Subject: Some questions related to Giraph Pur YARN implementation
> To: dev@giraph.apache.org
>
>
> Hi Eli,
>
> I tried scripts (giraph, giraph-env) found in bin directory to run
> Giraph sample mentioned in quick start guide. But I face some issues
> and had to do some patching to get it into a working state (Job
> submission works, but execution fails). Below are some things I
> noticed:
>
>    1. giraph script in 'bin' directory uses -libjars option. But this
> doesn't work with GiraphYarnClient. It should be -yj.
>    2. We need to add $GIRAPH_HOME + $VERTEX_IMPL_JAR_DIR (directory
> containing vertex implementation jar) to CLASSPATH manually due to the
> way YarnUtils.getLocalFiles is implemented. Basically we should add
> parent directories of Yarn Jars to class path. I am not sure which is
> the correct solution
>       * fixing get LocalFiles
>       * CLASSPATH base method
>    3. YarnUtils.populateJars method uses fileNames.contains(f.getName)
> to decide adding jar to local resource map. But if we use giraph
> script fileNames contains absolute paths of 'Yarn Lib Jars'. I got
> this working by using getAbsolute paths instead of getName.
>    4. After above changes we can successfully launch a job in YARN
> cluster using giraph script. But job fails due to a file path issue.
> When submitting job we serialize Giraph configuration to
> giraph-conf.xml. But "giraph.yarn.libjars" property contains list of
> files but with absolute paths from client machine which use to submit
> the job. For example in my scenario giraph jar is
> "/Users/mpathira/giraph-bin/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha.jar".
> But GiraphApplicationMaster tries to access these files and fails
> because the file is not there in HDFS with the above name.
>
> If we only use jar names instead of paths for 'yarnjars' option we
> should be able to fix 4. But I am not sure whether that is the correct
> approach. May be we need to change how we serialize giraph-conf.xml in
> to HDFS. We can use HDFS paths instead of paths from client machine.
>
> @Eli
> I really appreciate your comments regarding above. I can create a JIRA
> ticket if needed.
>
> Thanks
> Milinda
>
> --
> Milinda Pathirage
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>
>

-- 
Matthew Laird
Lead Software Developer, Bioinformatics
Brinkman Laboratory
Simon Fraser University, Burnaby, BC, Canada

Fwd: Some questions related to Giraph Pur YARN implementation

Posted by Milinda Pathirage <mp...@umail.iu.edu>.
Forwarding to user list.

---------- Forwarded message ----------
From: Milinda Pathirage <mp...@umail.iu.edu>
Date: Tue, Oct 15, 2013 at 3:23 PM
Subject: Some questions related to Giraph Pur YARN implementation
To: dev@giraph.apache.org


Hi Eli,

I tried scripts (giraph, giraph-env) found in bin directory to run
Giraph sample mentioned in quick start guide. But I face some issues
and had to do some patching to get it into a working state (Job
submission works, but execution fails). Below are some things I
noticed:

  1. giraph script in 'bin' directory uses -libjars option. But this
doesn't work with GiraphYarnClient. It should be -yj.
  2. We need to add $GIRAPH_HOME + $VERTEX_IMPL_JAR_DIR (directory
containing vertex implementation jar) to CLASSPATH manually due to the
way YarnUtils.getLocalFiles is implemented. Basically we should add
parent directories of Yarn Jars to class path. I am not sure which is
the correct solution
     * fixing get LocalFiles
     * CLASSPATH base method
  3. YarnUtils.populateJars method uses fileNames.contains(f.getName)
to decide adding jar to local resource map. But if we use giraph
script fileNames contains absolute paths of 'Yarn Lib Jars'. I got
this working by using getAbsolute paths instead of getName.
  4. After above changes we can successfully launch a job in YARN
cluster using giraph script. But job fails due to a file path issue.
When submitting job we serialize Giraph configuration to
giraph-conf.xml. But "giraph.yarn.libjars" property contains list of
files but with absolute paths from client machine which use to submit
the job. For example in my scenario giraph jar is
"/Users/mpathira/giraph-bin/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha.jar".
But GiraphApplicationMaster tries to access these files and fails
because the file is not there in HDFS with the above name.

If we only use jar names instead of paths for 'yarnjars' option we
should be able to fix 4. But I am not sure whether that is the correct
approach. May be we need to change how we serialize giraph-conf.xml in
to HDFS. We can use HDFS paths instead of paths from client machine.

@Eli
I really appreciate your comments regarding above. I can create a JIRA
ticket if needed.

Thanks
Milinda

--
Milinda Pathirage

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org


-- 
Milinda Pathirage

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org