You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ", Roy" <rp...@njit.edu> on 2017/03/24 11:38:11 UTC
spark-submit config via file
Hi,
I am trying to deploy spark job by using spark-submit which has bunch of
parameters like
spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode
cluster --executor-memory 3072m --executor-cores 4 --files streaming.conf
spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"
I was looking a way to put all these flags in the file to pass to
spark-submit to make my spark-submitcommand simple like this
spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode
cluster --properties-file properties.conf --files streaming.conf
spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"
properties.conf has following contents
spark.executor.memory 3072m
spark.executor.cores 4
But I am getting following error
17/03/24 11:36:26 INFO Client: Use hdfs cache file as spark.yarn.archive
for HDP,
hdfsCacheFile:hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
17/03/24 11:36:26 WARN AzureFileSystemThreadPoolExecutor: Disabling threads
for Delete operation as thread count 0 is <= 1
17/03/24 11:36:26 INFO AzureFileSystemThreadPoolExecutor: Time taken for
Delete operation is: 1 ms with threads: 0
17/03/24 11:36:27 INFO Client: Deleted staging directory wasb://
abc@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_1488402758319_0492
Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no
host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2791)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2825)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at
org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:364)
at org.apache.spark.deploy.yarn.Client.org
$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:480)
at
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:552)
at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1218)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1277)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/03/24 11:36:27 INFO MetricsSystemImpl: Stopping azure-file-system
metrics system...
Anyone know is this is even possible ?
Thanks...
Roy
Re: spark-submit config via file
Posted by "Thakrar, Jayesh" <jt...@conversantmedia.com>.
Roy - can you check if you have HADOOP_CONF_DIR and YARN_CONF_DIR set to the directory containing the HDFS and YARN configuration files?
From: Sandeep Nemuri <nh...@gmail.com>
Date: Monday, March 27, 2017 at 9:44 AM
To: Saisai Shao <sa...@gmail.com>
Cc: Yong Zhang <ja...@hotmail.com>, ", Roy" <rp...@njit.edu>, user <us...@spark.apache.org>
Subject: Re: spark-submit config via file
You should try adding your NN host and port in the URL.
On Mon, Mar 27, 2017 at 11:03 AM, Saisai Shao <sa...@gmail.com>> wrote:
It's quite obvious your hdfs URL is not complete, please looks at the exception, your hdfs URI doesn't have host, port. Normally it should be OK if HDFS is your default FS.
I think the problem is you're running on HDI, in which default FS is wasb. So here short name without host:port will lead to error. This looks like a HDI specific issue, you'd better ask HDI.
Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2791)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2825)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
On Fri, Mar 24, 2017 at 9:18 PM, Yong Zhang <ja...@hotmail.com>> wrote:
Of course it is possible.
You can always to set any configurations in your application using API, instead of pass in through the CLI.
val sparkConf = new SparkConf().setAppName(properties.get("appName")).set("master", properties.get("master")).set(xxx, properties.get("xxx"))
Your error is your environment problem.
Yong
________________________________
From: , Roy <rp...@njit.edu>>
Sent: Friday, March 24, 2017 7:38 AM
To: user
Subject: spark-submit config via file
Hi,
I am trying to deploy spark job by using spark-submit which has bunch of parameters like
spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode cluster --executor-memory 3072m --executor-cores 4 --files streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"
I was looking a way to put all these flags in the file to pass to spark-submit to make my spark-submitcommand simple like this
spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode cluster --properties-file properties.conf --files streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"
properties.conf has following contents
spark.executor.memory 3072m
spark.executor.cores 4
But I am getting following error
17/03/24 11:36:26 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
17/03/24 11:36:26 WARN AzureFileSystemThreadPoolExecutor: Disabling threads for Delete operation as thread count 0 is <= 1
17/03/24 11:36:26 INFO AzureFileSystemThreadPoolExecutor: Time taken for Delete operation is: 1 ms with threads: 0
17/03/24 11:36:27 INFO Client: Deleted staging directory wasb://abc@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_1488402758319_0492<http://abc@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_1488402758319_0492>
Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2791)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2825)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:364)
at org.apache.spark.deploy.yarn.Client.org<http://org.apache.spark.deploy.yarn.Client.org>$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:480)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:552)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1218)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1277)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/03/24 11:36:27 INFO MetricsSystemImpl: Stopping azure-file-system metrics system...
Anyone know is this is even possible ?
Thanks...
Roy
--
Regards
Sandeep Nemuri
Re: spark-submit config via file
Posted by Sandeep Nemuri <nh...@gmail.com>.
You should try adding your NN host and port in the URL.
On Mon, Mar 27, 2017 at 11:03 AM, Saisai Shao <sa...@gmail.com>
wrote:
> It's quite obvious your hdfs URL is not complete, please looks at the
> exception, your hdfs URI doesn't have host, port. Normally it should be OK
> if HDFS is your default FS.
>
> I think the problem is you're running on HDI, in which default FS is wasb.
> So here short name without host:port will lead to error. This looks like a
> HDI specific issue, you'd better ask HDI.
>
> Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no
> host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
>
> at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(Dist
> ributedFileSystem.java:154)
>
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.
> java:2791)
>
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
>
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem
> .java:2825)
>
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
>
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>
>
>
>
> On Fri, Mar 24, 2017 at 9:18 PM, Yong Zhang <ja...@hotmail.com> wrote:
>
>> Of course it is possible.
>>
>>
>> You can always to set any configurations in your application using API,
>> instead of pass in through the CLI.
>>
>>
>> val sparkConf = new SparkConf().setAppName(properties.get("appName")
>> ).set("master", properties.get("master")).set(xxx, properties.get("xxx"))
>>
>> Your error is your environment problem.
>>
>> Yong
>> ------------------------------
>> *From:* , Roy <rp...@njit.edu>
>> *Sent:* Friday, March 24, 2017 7:38 AM
>> *To:* user
>> *Subject:* spark-submit config via file
>>
>> Hi,
>>
>> I am trying to deploy spark job by using spark-submit which has bunch of
>> parameters like
>>
>> spark-submit --class StreamingEventWriterDriver --master yarn
>> --deploy-mode cluster --executor-memory 3072m --executor-cores 4 --files
>> streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf
>> "streaming.conf"
>>
>> I was looking a way to put all these flags in the file to pass to
>> spark-submit to make my spark-submitcommand simple like this
>>
>> spark-submit --class StreamingEventWriterDriver --master yarn
>> --deploy-mode cluster --properties-file properties.conf --files
>> streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf
>> "streaming.conf"
>>
>> properties.conf has following contents
>>
>>
>> spark.executor.memory 3072m
>>
>> spark.executor.cores 4
>>
>>
>> But I am getting following error
>>
>>
>> 17/03/24 11:36:26 INFO Client: Use hdfs cache file as spark.yarn.archive
>> for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-
>> hdp-yarn-archive.tar.gz
>>
>> 17/03/24 11:36:26 WARN AzureFileSystemThreadPoolExecutor: Disabling
>> threads for Delete operation as thread count 0 is <= 1
>>
>> 17/03/24 11:36:26 INFO AzureFileSystemThreadPoolExecutor: Time taken for
>> Delete operation is: 1 ms with threads: 0
>>
>> 17/03/24 11:36:27 INFO Client: Deleted staging directory wasb://
>> abc@abc.blob.core.windows.net/user/sshuser/.sparkStag
>> ing/application_1488402758319_0492
>>
>> Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no
>> host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
>>
>> at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(Dist
>> ributedFileSystem.java:154)
>>
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.
>> java:2791)
>>
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem
>> .java:2825)
>>
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:
>> 2807)
>>
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
>>
>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>>
>> at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.
>> scala:364)
>>
>> at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$
>> yarn$Client$$distribute$1(Client.scala:480)
>>
>> at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Cl
>> ient.scala:552)
>>
>> at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon
>> text(Client.scala:881)
>>
>> at org.apache.spark.deploy.yarn.Client.submitApplication(Client
>> .scala:170)
>>
>> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1218)
>>
>> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1277)
>>
>> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>>
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:498)
>>
>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:745)
>>
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:187)
>>
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.
>> scala:212)
>>
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>> 126)
>>
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> 17/03/24 11:36:27 INFO MetricsSystemImpl: Stopping azure-file-system
>> metrics system...
>>
>> Anyone know is this is even possible ?
>>
>>
>> Thanks...
>>
>> Roy
>>
>
>
--
* Regards*
* Sandeep Nemuri*
Re: spark-submit config via file
Posted by Saisai Shao <sa...@gmail.com>.
It's quite obvious your hdfs URL is not complete, please looks at the
exception, your hdfs URI doesn't have host, port. Normally it should be OK
if HDFS is your default FS.
I think the problem is you're running on HDI, in which default FS is wasb.
So here short name without host:port will lead to error. This looks like a
HDI specific issue, you'd better ask HDI.
Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no
host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(
DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.createFileSystem(
FileSystem.java:2791)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
FileSystem.java:2825)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
On Fri, Mar 24, 2017 at 9:18 PM, Yong Zhang <ja...@hotmail.com> wrote:
> Of course it is possible.
>
>
> You can always to set any configurations in your application using API,
> instead of pass in through the CLI.
>
>
> val sparkConf = new SparkConf().setAppName(properties.get("appName")).set(
> "master", properties.get("master")).set(xxx, properties.get("xxx"))
>
> Your error is your environment problem.
>
> Yong
> ------------------------------
> *From:* , Roy <rp...@njit.edu>
> *Sent:* Friday, March 24, 2017 7:38 AM
> *To:* user
> *Subject:* spark-submit config via file
>
> Hi,
>
> I am trying to deploy spark job by using spark-submit which has bunch of
> parameters like
>
> spark-submit --class StreamingEventWriterDriver --master yarn
> --deploy-mode cluster --executor-memory 3072m --executor-cores 4 --files
> streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf
> "streaming.conf"
>
> I was looking a way to put all these flags in the file to pass to
> spark-submit to make my spark-submitcommand simple like this
>
> spark-submit --class StreamingEventWriterDriver --master yarn
> --deploy-mode cluster --properties-file properties.conf --files
> streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf
> "streaming.conf"
>
> properties.conf has following contents
>
>
> spark.executor.memory 3072m
>
> spark.executor.cores 4
>
>
> But I am getting following error
>
>
> 17/03/24 11:36:26 INFO Client: Use hdfs cache file as spark.yarn.archive
> for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.6.0.0-403/spark2/
> spark2-hdp-yarn-archive.tar.gz
>
> 17/03/24 11:36:26 WARN AzureFileSystemThreadPoolExecutor: Disabling
> threads for Delete operation as thread count 0 is <= 1
>
> 17/03/24 11:36:26 INFO AzureFileSystemThreadPoolExecutor: Time taken for
> Delete operation is: 1 ms with threads: 0
>
> 17/03/24 11:36:27 INFO Client: Deleted staging directory wasb://
> abc@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_
> 1488402758319_0492
>
> Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no
> host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
>
> at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(
> DistributedFileSystem.java:154)
>
> at org.apache.hadoop.fs.FileSystem.createFileSystem(
> FileSystem.java:2791)
>
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
>
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
> FileSystem.java:2825)
>
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
>
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>
> at org.apache.spark.deploy.yarn.Client.copyFileToRemote(
> Client.scala:364)
>
> at org.apache.spark.deploy.yarn.Client.org$apache$spark$
> deploy$yarn$Client$$distribute$1(Client.scala:480)
>
> at org.apache.spark.deploy.yarn.Client.prepareLocalResources(
> Client.scala:552)
>
> at org.apache.spark.deploy.yarn.Client.
> createContainerLaunchContext(Client.scala:881)
>
> at org.apache.spark.deploy.yarn.Client.submitApplication(
> Client.scala:170)
>
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1218)
>
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1277)
>
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
>
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:187)
>
> at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:212)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:126)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> 17/03/24 11:36:27 INFO MetricsSystemImpl: Stopping azure-file-system
> metrics system...
>
> Anyone know is this is even possible ?
>
>
> Thanks...
>
> Roy
>
Re: spark-submit config via file
Posted by Yong Zhang <ja...@hotmail.com>.
Of course it is possible.
You can always to set any configurations in your application using API, instead of pass in through the CLI.
val sparkConf = new SparkConf().setAppName(properties.get("appName")).set("master", properties.get("master")).set(xxx, properties.get("xxx"))
Your error is your environment problem.
Yong
________________________________
From: , Roy <rp...@njit.edu>
Sent: Friday, March 24, 2017 7:38 AM
To: user
Subject: spark-submit config via file
Hi,
I am trying to deploy spark job by using spark-submit which has bunch of parameters like
spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode cluster --executor-memory 3072m --executor-cores 4 --files streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"
I was looking a way to put all these flags in the file to pass to spark-submit to make my spark-submitcommand simple like this
spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode cluster --properties-file properties.conf --files streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"
properties.conf has following contents
spark.executor.memory 3072m
spark.executor.cores 4
But I am getting following error
17/03/24 11:36:26 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
17/03/24 11:36:26 WARN AzureFileSystemThreadPoolExecutor: Disabling threads for Delete operation as thread count 0 is <= 1
17/03/24 11:36:26 INFO AzureFileSystemThreadPoolExecutor: Time taken for Delete operation is: 1 ms with threads: 0
17/03/24 11:36:27 INFO Client: Deleted staging directory wasb://abc@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_1488402758319_0492<http://abc@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_1488402758319_0492>
Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2791)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2825)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:364)
at org.apache.spark.deploy.yarn.Client.org<http://org.apache.spark.deploy.yarn.Client.org>$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:480)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:552)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1218)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1277)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/03/24 11:36:27 INFO MetricsSystemImpl: Stopping azure-file-system metrics system...
Anyone know is this is even possible ?
Thanks...
Roy