You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "xinzhang (JIRA)" <ji...@apache.org> on 2017/09/14 08:33:00 UTC
[jira] [Updated] (SPARK-22007) spark-submit on yarn or local , got
different result
[ https://issues.apache.org/jira/browse/SPARK-22007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xinzhang updated SPARK-22007:
-----------------------------
Description:
submit the py script on local.
/opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
result:
+------------+
|databaseName|
+------------+
| default|
| zzzz|
| xxxxx|
+------------+
submit the py script on yarn.
/opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py
result:
+------------+
|databaseName|
+------------+
| default|
+------------+
the py script :
[yangtt@dc-gateway119 test]$ cat test_hive.py
#!/usr/bin/env python
#coding=utf-8
from os.path import expanduser, join, abspath
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.conf import SparkConf
def squared(s):
return s * s
warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')
spark = SparkSession \
.builder \
.appName("Python_Spark_SQL_Hive") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.config(conf=SparkConf()) \
.enableHiveSupport() \
.getOrCreate()
spark.udf.register("squared",squared)
spark.sql("show databases").show()
Q:why the spark load the different hive metastore
the yarn always use the DERBY?
17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
my current metastore is in mysql.
any suggest will be helpful.
thanks.
was:
submit the py script on local.
/opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
result:
+------------+
|databaseName|
+------------+
| default|
| zzzz|
| xxxxx|
+------------+
submit the py script on yarn.
/opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py
result:
+------------+
|databaseName|
+------------+
| default|
+------------+
the py script :
[yangtt@dc-gateway119 test]$ cat test_hive.py
#!/usr/bin/env python
#coding=utf-8
from os.path import expanduser, join, abspath
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.conf import SparkConf
def squared(s):
return s * s
# warehouse_location points to the default location for managed databases and tables
warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')
spark = SparkSession \
.builder \
.appName("Python_Spark_SQL_Hive") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.config(conf=SparkConf()) \
.enableHiveSupport() \
.getOrCreate()
spark.udf.register("squared",squared)
spark.sql("show databases").show()
Q:why the spark load the different hive metastore
the yarn always use the DERBY?
17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
my current metastore is in mysql.
any suggest will be helpful.
thanks.
> spark-submit on yarn or local , got different result
> ----------------------------------------------------
>
> Key: SPARK-22007
> URL: https://issues.apache.org/jira/browse/SPARK-22007
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, Spark Shell, Spark Submit
> Affects Versions: 2.1.0
> Reporter: xinzhang
>
> submit the py script on local.
> /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
> result:
> +------------+
> |databaseName|
> +------------+
> | default|
> | zzzz|
> | xxxxx|
> +------------+
> submit the py script on yarn.
> /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py
> result:
> +------------+
> |databaseName|
> +------------+
> | default|
> +------------+
> the py script :
> [yangtt@dc-gateway119 test]$ cat test_hive.py
> #!/usr/bin/env python
> #coding=utf-8
> from os.path import expanduser, join, abspath
> from pyspark.sql import SparkSession
> from pyspark.sql import Row
> from pyspark.conf import SparkConf
> def squared(s):
> return s * s
> warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')
> spark = SparkSession \
> .builder \
> .appName("Python_Spark_SQL_Hive") \
> .config("spark.sql.warehouse.dir", warehouse_location) \
> .config(conf=SparkConf()) \
> .enableHiveSupport() \
> .getOrCreate()
> spark.udf.register("squared",squared)
> spark.sql("show databases").show()
> Q:why the spark load the different hive metastore
> the yarn always use the DERBY?
> 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
> my current metastore is in mysql.
> any suggest will be helpful.
> thanks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org