You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ayan guha <gu...@gmail.com> on 2017/02/27 00:52:25 UTC
SPark - YARN Cluster Mode
Hi
I am facing an issue with Cluster Mode, with pyspark
Here is my code:
conf = SparkConf()
conf.setAppName("Spark Ingestion")
conf.set("spark.yarn.queue","root.Applications")
conf.set("spark.executor.instances","50")
conf.set("spark.executor.memory","22g")
conf.set("spark.yarn.executor.memoryOverhead","4096")
conf.set("spark.executor.cores","4")
conf.set("spark.sql.hive.convertMetastoreParquet", "false")
sc = SparkContext(conf = conf)
sqlContext = HiveContext(sc)
r = sc.parallelize(xrange(1,10000))
print r.count()
sc.stop()
The problem is none of my Config settings are passed on to Yarn.
spark-submit --master yarn --deploy-mode cluster ayan_test.py
I tried the same code with deploy-mode=client and all config are passing
fine.
Am I missing something? Will introducing --property-file be of any help?
Can anybody share some working example?
Best
Ayan
--
Best Regards,
Ayan Guha
Re: SPark - YARN Cluster Mode
Posted by ayan guha <gu...@gmail.com>.
Also, I wanted to add if I specify the conf in the command line, it seems
to be working.
For example, if I use
spark-submit --master yarn --deploy-mode cluster --conf
spark.yarn.queue=root.Application ayan_test.py 10
Then it is going to correct queue.
Any help would be great
Best
Ayan
On Mon, Feb 27, 2017 at 11:52 AM, ayan guha <gu...@gmail.com> wrote:
> Hi
>
> I am facing an issue with Cluster Mode, with pyspark
>
> Here is my code:
>
> conf = SparkConf()
> conf.setAppName("Spark Ingestion")
> conf.set("spark.yarn.queue","root.Applications")
> conf.set("spark.executor.instances","50")
> conf.set("spark.executor.memory","22g")
> conf.set("spark.yarn.executor.memoryOverhead","4096")
> conf.set("spark.executor.cores","4")
> conf.set("spark.sql.hive.convertMetastoreParquet", "false")
> sc = SparkContext(conf = conf)
> sqlContext = HiveContext(sc)
>
> r = sc.parallelize(xrange(1,10000))
> print r.count()
>
> sc.stop()
>
> The problem is none of my Config settings are passed on to Yarn.
>
> spark-submit --master yarn --deploy-mode cluster ayan_test.py
>
> I tried the same code with deploy-mode=client and all config are passing
> fine.
>
> Am I missing something? Will introducing --property-file be of any help?
> Can anybody share some working example?
>
> Best
> Ayan
>
> --
> Best Regards,
> Ayan Guha
>
--
Best Regards,
Ayan Guha
Re: SPark - YARN Cluster Mode
Posted by ayan guha <gu...@gmail.com>.
Hi
Thanks a lot, i used property file to resolve the issue. I think
documentation should mention it though.
On Tue, 28 Feb 2017 at 5:05 am, Marcelo Vanzin <va...@cloudera.com> wrote:
> > none of my Config settings
>
> Is it none of the configs or just the queue? You can't set the YARN
> queue in cluster mode through code, it has to be set in the command
> line. It's a chicken & egg problem (in cluster mode, the YARN app is
> created before your code runs).
>
> --property-file works the same as setting options in the command
> line, so you can use that instead.
>
>
> On Sun, Feb 26, 2017 at 4:52 PM, ayan guha <gu...@gmail.com> wrote:
> > Hi
> >
> > I am facing an issue with Cluster Mode, with pyspark
> >
> > Here is my code:
> >
> > conf = SparkConf()
> > conf.setAppName("Spark Ingestion")
> > conf.set("spark.yarn.queue","root.Applications")
> > conf.set("spark.executor.instances","50")
> > conf.set("spark.executor.memory","22g")
> > conf.set("spark.yarn.executor.memoryOverhead","4096")
> > conf.set("spark.executor.cores","4")
> > conf.set("spark.sql.hive.convertMetastoreParquet", "false")
> > sc = SparkContext(conf = conf)
> > sqlContext = HiveContext(sc)
> >
> > r = sc.parallelize(xrange(1,10000))
> > print r.count()
> >
> > sc.stop()
> >
> > The problem is none of my Config settings are passed on to Yarn.
> >
> > spark-submit --master yarn --deploy-mode cluster ayan_test.py
> >
> > I tried the same code with deploy-mode=client and all config are passing
> > fine.
> >
> > Am I missing something? Will introducing --property-file be of any help?
> Can
> > anybody share some working example?
> >
> > Best
> > Ayan
> >
> > --
> > Best Regards,
> > Ayan Guha
>
>
>
> --
> Marcelo
>
--
Best Regards,
Ayan Guha
Re: SPark - YARN Cluster Mode
Posted by Marcelo Vanzin <va...@cloudera.com>.
> none of my Config settings
Is it none of the configs or just the queue? You can't set the YARN
queue in cluster mode through code, it has to be set in the command
line. It's a chicken & egg problem (in cluster mode, the YARN app is
created before your code runs).
--property-file works the same as setting options in the command
line, so you can use that instead.
On Sun, Feb 26, 2017 at 4:52 PM, ayan guha <gu...@gmail.com> wrote:
> Hi
>
> I am facing an issue with Cluster Mode, with pyspark
>
> Here is my code:
>
> conf = SparkConf()
> conf.setAppName("Spark Ingestion")
> conf.set("spark.yarn.queue","root.Applications")
> conf.set("spark.executor.instances","50")
> conf.set("spark.executor.memory","22g")
> conf.set("spark.yarn.executor.memoryOverhead","4096")
> conf.set("spark.executor.cores","4")
> conf.set("spark.sql.hive.convertMetastoreParquet", "false")
> sc = SparkContext(conf = conf)
> sqlContext = HiveContext(sc)
>
> r = sc.parallelize(xrange(1,10000))
> print r.count()
>
> sc.stop()
>
> The problem is none of my Config settings are passed on to Yarn.
>
> spark-submit --master yarn --deploy-mode cluster ayan_test.py
>
> I tried the same code with deploy-mode=client and all config are passing
> fine.
>
> Am I missing something? Will introducing --property-file be of any help? Can
> anybody share some working example?
>
> Best
> Ayan
>
> --
> Best Regards,
> Ayan Guha
--
Marcelo
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org