You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by "Jung, Soonoh" <so...@gmail.com> on 2017/09/01 09:33:15 UTC

Setting spark config properties in Zeppelin 0.7.2

Hi zeppelin users,

I have an issue when I upgraded zeppelin to 0.7.2 from 0.6.2

I am using spark-redis 0.3.2 library to load redis values.

To use that library, I have to set "redis.host" property on spark config
instance
It used to work on zeppelin 0.6.2 but not in 0.7.2.

How can I set spark config property in zeppelin 0.7.2?

The interpreter property setting:
Properties
namevalue
args
master yarn-client
redis.host xxx.cache.amazonaws.com
redis.timeout 60000

A sample test note:

-----------
%spark

import com.redislabs.provider.redis._

sc.getConf.getAll.foreach(println)

val rdd = sc.fromRedisKV("test_key")
----------


I expect "(redis.host,xxx.cache.amazonaws.com)" in the output but it does
not.
output:

--------------
(spark.eventLog.enabled,true)
(spark.submit.pyArchives,pyspark.zip:py4j-0.9-src.zip:py4j-0.8.2.1-src.zip:py4j-0.10.1-src.zip:py4j-0.10.3-src.zip:py4j-0.10.4-src.zip)
(spark.network.timeout,300s)
(spark.executor.instances,15)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.app.id,application_1504247249188_0005)
(spark.executor.memory,5g)
(spark.driver.memory,5g)
(spark.executor.cores,4)
(spark.submit.pyFiles,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70
-XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,
http://ip-172-31-2-15.ap-northeast-1.compute.internal:20888/proxy/application_1504247249188_0005
)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.jars,file:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
(spark.repl.class.outputDir,/mnt/tmp/spark-fa28d8f7-d675-4181-99d2-1bd6ef67db5c)
(spark.submit.deployMode,client)
(spark.yarn.dist.archives,/usr/lib/spark/R/lib/sparkr.zip#sparkr)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
(spark.driver.extraJavaOptions, -Dfile.encoding=UTF-8
-Dlog4j.configuration=file:///etc/zeppelin/conf/log4j.properties
-Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-zeppelin-ip-172-31-2-15.log)
(spark.app.name,am-zeppelin-segmentation-prod)
(spark.driver.port,34316)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,ip-172-31-2-15.ap-northeast-1.compute.internal)
(spark.history.ui.port,18080)
(spark.sql.catalogImplementation,in-memory)
(spark.home,/usr/lib/spark)
(master,yarn)
(spark.shuffle.service.enabled,true)
(spark.master,yarn-client)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.scheduler.mode,FAIR)
(spark.executor.id,driver)
(spark.dynamicAllocation.cachedExecutorIdleTimeout ,1200s)
(spark.dynamicAllocation.executorIdleTimeout,30s)
(spark.yarn.dist.files,file:/etc/spark/conf/hive-site.xml,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
(spark.executorEnv.PYTHONPATH,/usr/lib/spark/python/lib/py4j-src.zip:/usr/lib/spark/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-src.zip)
(spark.yarn.historyServer.address,ip-172-31-2-15.ap-northeast-1.compute.internal:18080)
(spark.driver.appUIAddress,
http://ip-172-31-2-15.ap-northeast-1.compute.internal:4040)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.yarn.isPython,true)
(spark.dynamicAllocation.enabled,true)
(spark.driver.host,172.31.2.15)
(spark.repl.class.uri,spark://172.31.2.15:34316/classes)
(spark.driver.extraClassPath,:/usr/lib/zeppelin/local-repo/2CRQYKAQF/*:/usr/lib/zeppelin/interpreter/spark/*:/usr/lib/zeppelin/lib/interpreter/*::/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
...
-------------


Best regards,
Soonoh

Re: Setting spark config properties in Zeppelin 0.7.2

Posted by "Jung, Soonoh" <so...@gmail.com>.
Thank you for creating an issue and future fix.

Regards,
Soonoh

On 1 September 2017 at 18:50, Jeff Zhang <zj...@gmail.com> wrote:

>
> It is due to only spark.* is accepted as spark properties, it seems
> there's still some libraaries using non spark.*.
>
> I created https://issues.apache.org/jira/browse/ZEPPELIN-2893 for that.
> And will fix it in 0.7.3
>
> Jung, Soonoh <so...@gmail.com>于2017年9月1日周五 下午5:33写道:
>
>> Hi zeppelin users,
>>
>> I have an issue when I upgraded zeppelin to 0.7.2 from 0.6.2
>>
>> I am using spark-redis 0.3.2 library to load redis values.
>>
>> To use that library, I have to set "redis.host" property on spark config
>> instance
>> It used to work on zeppelin 0.6.2 but not in 0.7.2.
>>
>> How can I set spark config property in zeppelin 0.7.2?
>>
>> The interpreter property setting:
>> Properties
>> namevalue
>> args
>> master yarn-client
>> redis.host xxx.cache.amazonaws.com
>> redis.timeout 60000
>>
>> A sample test note:
>>
>> -----------
>> %spark
>>
>> import com.redislabs.provider.redis._
>>
>> sc.getConf.getAll.foreach(println)
>>
>> val rdd = sc.fromRedisKV("test_key")
>> ----------
>>
>>
>> I expect "(redis.host,xxx.cache.amazonaws.com)" in the output but it
>> does not.
>> output:
>>
>> --------------
>> (spark.eventLog.enabled,true)
>> (spark.submit.pyArchives,pyspark.zip:py4j-0.9-src.zip:
>> py4j-0.8.2.1-src.zip:py4j-0.10.1-src.zip:py4j-0.10.3-src.
>> zip:py4j-0.10.4-src.zip)
>> (spark.network.timeout,300s)
>> (spark.executor.instances,15)
>> (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/
>> hadoop-lzo/lib/native)
>> (spark.app.id,application_1504247249188_0005)
>> (spark.executor.memory,5g)
>> (spark.driver.memory,5g)
>> (spark.executor.cores,4)
>> (spark.submit.pyFiles,file:/usr/lib/spark/python/lib/
>> pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
>> (spark.serializer,org.apache.spark.serializer.KryoSerializer)
>> (spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:
>> CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70
>> -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
>> (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.
>> AmIpFilter.param.PROXY_URI_BASES,http://ip-172-31-2-15.
>> ap-northeast-1.compute.internal:20888/proxy/
>> application_1504247249188_0005)
>> (spark.eventLog.dir,hdfs:///var/log/spark/apps)
>> (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.
>> services.dynamodbv2)
>> (spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
>> (spark.jars,file:/usr/lib/zeppelin/interpreter/spark/
>> zeppelin-spark_2.10-0.7.2.jar)
>> (spark.repl.class.outputDir,/mnt/tmp/spark-fa28d8f7-d675-
>> 4181-99d2-1bd6ef67db5c)
>> (spark.submit.deployMode,client)
>> (spark.yarn.dist.archives,/usr/lib/spark/R/lib/sparkr.zip#sparkr)
>> (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
>> (spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.
>> amfilter.AmIpFilter)
>> (spark.driver.extraJavaOptions, -Dfile.encoding=UTF-8
>> -Dlog4j.configuration=file:///etc/zeppelin/conf/log4j.properties
>> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-
>> spark-zeppelin-ip-172-31-2-15.log)
>> (spark.app.name,am-zeppelin-segmentation-prod)
>> (spark.driver.port,34316)
>> (spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/
>> hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/
>> share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/
>> usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/
>> security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/
>> aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar)
>> (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.
>> AmIpFilter.param.PROXY_HOSTS,ip-172-31-2-15.ap-northeast-1.
>> compute.internal)
>> (spark.history.ui.port,18080)
>> (spark.sql.catalogImplementation,in-memory)
>> (spark.home,/usr/lib/spark)
>> (master,yarn)
>> (spark.shuffle.service.enabled,true)
>> (spark.master,yarn-client)
>> (spark.hadoop.yarn.timeline-service.enabled,false)
>> (spark.scheduler.mode,FAIR)
>> (spark.executor.id,driver)
>> (spark.dynamicAllocation.cachedExecutorIdleTimeout ,1200s)
>> (spark.dynamicAllocation.executorIdleTimeout,30s)
>> (spark.yarn.dist.files,file:/etc/spark/conf/hive-site.xml,
>> file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/
>> spark/python/lib/py4j-0.10.4-src.zip)
>> (spark.executorEnv.PYTHONPATH,/usr/lib/spark/python/lib/
>> py4j-src.zip:/usr/lib/spark/python/:<CPS>{{PWD}}/pyspark.
>> zip<CPS>{{PWD}}/py4j-src.zip)
>> (spark.yarn.historyServer.address,ip-172-31-2-15.ap-
>> northeast-1.compute.internal:18080)
>> (spark.driver.appUIAddress,http://ip-172-31-2-15.ap-
>> northeast-1.compute.internal:4040)
>> (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/
>> hadoop-lzo/lib/native)
>> (spark.yarn.isPython,true)
>> (spark.dynamicAllocation.enabled,true)
>> (spark.driver.host,172.31.2.15)
>> (spark.repl.class.uri,spark://172.31.2.15:34316/classes)
>> (spark.driver.extraClassPath,:/usr/lib/zeppelin/local-repo/
>> 2CRQYKAQF/*:/usr/lib/zeppelin/interpreter/spark/*:/usr/lib/
>> zeppelin/lib/interpreter/*::/usr/lib/hadoop-lzo/lib/*:/usr/
>> lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/
>> usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:
>> /usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/zeppelin/
>> interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
>> ...
>> -------------
>>
>>
>> Best regards,
>> Soonoh
>>
>>

Re: Setting spark config properties in Zeppelin 0.7.2

Posted by Jeff Zhang <zj...@gmail.com>.
It is due to only spark.* is accepted as spark properties, it seems there's
still some libraaries using non spark.*.

I created https://issues.apache.org/jira/browse/ZEPPELIN-2893 for that. And
will fix it in 0.7.3

Jung, Soonoh <so...@gmail.com>于2017年9月1日周五 下午5:33写道:

> Hi zeppelin users,
>
> I have an issue when I upgraded zeppelin to 0.7.2 from 0.6.2
>
> I am using spark-redis 0.3.2 library to load redis values.
>
> To use that library, I have to set "redis.host" property on spark config
> instance
> It used to work on zeppelin 0.6.2 but not in 0.7.2.
>
> How can I set spark config property in zeppelin 0.7.2?
>
> The interpreter property setting:
> Properties
> namevalue
> args
> master yarn-client
> redis.host xxx.cache.amazonaws.com
> redis.timeout 60000
>
> A sample test note:
>
> -----------
> %spark
>
> import com.redislabs.provider.redis._
>
> sc.getConf.getAll.foreach(println)
>
> val rdd = sc.fromRedisKV("test_key")
> ----------
>
>
> I expect "(redis.host,xxx.cache.amazonaws.com)" in the output but it does
> not.
> output:
>
> --------------
> (spark.eventLog.enabled,true)
>
> (spark.submit.pyArchives,pyspark.zip:py4j-0.9-src.zip:py4j-0.8.2.1-src.zip:py4j-0.10.1-src.zip:py4j-0.10.3-src.zip:py4j-0.10.4-src.zip)
> (spark.network.timeout,300s)
> (spark.executor.instances,15)
>
> (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
> (spark.app.id,application_1504247249188_0005)
> (spark.executor.memory,5g)
> (spark.driver.memory,5g)
> (spark.executor.cores,4)
>
> (spark.submit.pyFiles,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
> (spark.serializer,org.apache.spark.serializer.KryoSerializer)
> (spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70
> -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
>
> (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,
> http://ip-172-31-2-15.ap-northeast-1.compute.internal:20888/proxy/application_1504247249188_0005
> )
> (spark.eventLog.dir,hdfs:///var/log/spark/apps)
> (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
> (spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
>
> (spark.jars,file:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
>
> (spark.repl.class.outputDir,/mnt/tmp/spark-fa28d8f7-d675-4181-99d2-1bd6ef67db5c)
> (spark.submit.deployMode,client)
> (spark.yarn.dist.archives,/usr/lib/spark/R/lib/sparkr.zip#sparkr)
> (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
>
> (spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
> (spark.driver.extraJavaOptions, -Dfile.encoding=UTF-8
> -Dlog4j.configuration=file:///etc/zeppelin/conf/log4j.properties
> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-zeppelin-ip-172-31-2-15.log)
> (spark.app.name,am-zeppelin-segmentation-prod)
> (spark.driver.port,34316)
>
> (spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar)
>
> (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,ip-172-31-2-15.ap-northeast-1.compute.internal)
> (spark.history.ui.port,18080)
> (spark.sql.catalogImplementation,in-memory)
> (spark.home,/usr/lib/spark)
> (master,yarn)
> (spark.shuffle.service.enabled,true)
> (spark.master,yarn-client)
> (spark.hadoop.yarn.timeline-service.enabled,false)
> (spark.scheduler.mode,FAIR)
> (spark.executor.id,driver)
> (spark.dynamicAllocation.cachedExecutorIdleTimeout ,1200s)
> (spark.dynamicAllocation.executorIdleTimeout,30s)
>
> (spark.yarn.dist.files,file:/etc/spark/conf/hive-site.xml,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
>
> (spark.executorEnv.PYTHONPATH,/usr/lib/spark/python/lib/py4j-src.zip:/usr/lib/spark/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-src.zip)
>
> (spark.yarn.historyServer.address,ip-172-31-2-15.ap-northeast-1.compute.internal:18080)
> (spark.driver.appUIAddress,
> http://ip-172-31-2-15.ap-northeast-1.compute.internal:4040)
>
> (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
> (spark.yarn.isPython,true)
> (spark.dynamicAllocation.enabled,true)
> (spark.driver.host,172.31.2.15)
> (spark.repl.class.uri,spark://172.31.2.15:34316/classes)
>
> (spark.driver.extraClassPath,:/usr/lib/zeppelin/local-repo/2CRQYKAQF/*:/usr/lib/zeppelin/interpreter/spark/*:/usr/lib/zeppelin/lib/interpreter/*::/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
> ...
> -------------
>
>
> Best regards,
> Soonoh
>
>