You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ayan guha <gu...@gmail.com> on 2017/02/27 00:52:25 UTC

SPark - YARN Cluster Mode

Hi

I am facing an issue with Cluster Mode, with pyspark

Here is my code:

        conf = SparkConf()
        conf.setAppName("Spark Ingestion")
        conf.set("spark.yarn.queue","root.Applications")
        conf.set("spark.executor.instances","50")
        conf.set("spark.executor.memory","22g")
        conf.set("spark.yarn.executor.memoryOverhead","4096")
        conf.set("spark.executor.cores","4")
        conf.set("spark.sql.hive.convertMetastoreParquet", "false")
        sc = SparkContext(conf = conf)
        sqlContext = HiveContext(sc)

        r = sc.parallelize(xrange(1,10000))
        print r.count()

        sc.stop()

The problem is none of my Config settings are passed on to Yarn.

spark-submit --master yarn --deploy-mode cluster ayan_test.py

I tried the same code with deploy-mode=client and all config are passing
fine.

Am I missing something? Will introducing --property-file be of any help?
Can anybody share some working example?

Best
Ayan

-- 
Best Regards,
Ayan Guha

Re: SPark - YARN Cluster Mode

Posted by ayan guha <gu...@gmail.com>.
Also, I wanted to add if I specify the conf in the command line, it seems
to be working.

For example, if I use

spark-submit --master yarn --deploy-mode cluster --conf
spark.yarn.queue=root.Application ayan_test.py 10

Then it is going to correct queue.

Any help would be great

Best
Ayan

On Mon, Feb 27, 2017 at 11:52 AM, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> I am facing an issue with Cluster Mode, with pyspark
>
> Here is my code:
>
>         conf = SparkConf()
>         conf.setAppName("Spark Ingestion")
>         conf.set("spark.yarn.queue","root.Applications")
>         conf.set("spark.executor.instances","50")
>         conf.set("spark.executor.memory","22g")
>         conf.set("spark.yarn.executor.memoryOverhead","4096")
>         conf.set("spark.executor.cores","4")
>         conf.set("spark.sql.hive.convertMetastoreParquet", "false")
>         sc = SparkContext(conf = conf)
>         sqlContext = HiveContext(sc)
>
>         r = sc.parallelize(xrange(1,10000))
>         print r.count()
>
>         sc.stop()
>
> The problem is none of my Config settings are passed on to Yarn.
>
> spark-submit --master yarn --deploy-mode cluster ayan_test.py
>
> I tried the same code with deploy-mode=client and all config are passing
> fine.
>
> Am I missing something? Will introducing --property-file be of any help?
> Can anybody share some working example?
>
> Best
> Ayan
>
> --
> Best Regards,
> Ayan Guha
>



-- 
Best Regards,
Ayan Guha

Re: SPark - YARN Cluster Mode

Posted by ayan guha <gu...@gmail.com>.
Hi

Thanks a lot, i used property file to resolve the issue. I think
documentation should mention it though.

On Tue, 28 Feb 2017 at 5:05 am, Marcelo Vanzin <va...@cloudera.com> wrote:

> >  none of my Config settings
>
> Is it none of the configs or just the queue? You can't set the YARN
> queue in cluster mode through code, it has to be set in the command
> line. It's a chicken & egg problem (in cluster mode, the YARN app is
> created before your code runs).
>
>  --property-file works the same as setting options in the command
> line, so you can use that instead.
>
>
> On Sun, Feb 26, 2017 at 4:52 PM, ayan guha <gu...@gmail.com> wrote:
> > Hi
> >
> > I am facing an issue with Cluster Mode, with pyspark
> >
> > Here is my code:
> >
> >         conf = SparkConf()
> >         conf.setAppName("Spark Ingestion")
> >         conf.set("spark.yarn.queue","root.Applications")
> >         conf.set("spark.executor.instances","50")
> >         conf.set("spark.executor.memory","22g")
> >         conf.set("spark.yarn.executor.memoryOverhead","4096")
> >         conf.set("spark.executor.cores","4")
> >         conf.set("spark.sql.hive.convertMetastoreParquet", "false")
> >         sc = SparkContext(conf = conf)
> >         sqlContext = HiveContext(sc)
> >
> >         r = sc.parallelize(xrange(1,10000))
> >         print r.count()
> >
> >         sc.stop()
> >
> > The problem is none of my Config settings are passed on to Yarn.
> >
> > spark-submit --master yarn --deploy-mode cluster ayan_test.py
> >
> > I tried the same code with deploy-mode=client and all config are passing
> > fine.
> >
> > Am I missing something? Will introducing --property-file be of any help?
> Can
> > anybody share some working example?
> >
> > Best
> > Ayan
> >
> > --
> > Best Regards,
> > Ayan Guha
>
>
>
> --
> Marcelo
>
-- 
Best Regards,
Ayan Guha

Re: SPark - YARN Cluster Mode

Posted by Marcelo Vanzin <va...@cloudera.com>.
>  none of my Config settings

Is it none of the configs or just the queue? You can't set the YARN
queue in cluster mode through code, it has to be set in the command
line. It's a chicken & egg problem (in cluster mode, the YARN app is
created before your code runs).

 --property-file works the same as setting options in the command
line, so you can use that instead.


On Sun, Feb 26, 2017 at 4:52 PM, ayan guha <gu...@gmail.com> wrote:
> Hi
>
> I am facing an issue with Cluster Mode, with pyspark
>
> Here is my code:
>
>         conf = SparkConf()
>         conf.setAppName("Spark Ingestion")
>         conf.set("spark.yarn.queue","root.Applications")
>         conf.set("spark.executor.instances","50")
>         conf.set("spark.executor.memory","22g")
>         conf.set("spark.yarn.executor.memoryOverhead","4096")
>         conf.set("spark.executor.cores","4")
>         conf.set("spark.sql.hive.convertMetastoreParquet", "false")
>         sc = SparkContext(conf = conf)
>         sqlContext = HiveContext(sc)
>
>         r = sc.parallelize(xrange(1,10000))
>         print r.count()
>
>         sc.stop()
>
> The problem is none of my Config settings are passed on to Yarn.
>
> spark-submit --master yarn --deploy-mode cluster ayan_test.py
>
> I tried the same code with deploy-mode=client and all config are passing
> fine.
>
> Am I missing something? Will introducing --property-file be of any help? Can
> anybody share some working example?
>
> Best
> Ayan
>
> --
> Best Regards,
> Ayan Guha



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org