You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "xinzhang (JIRA)" <ji...@apache.org> on 2017/09/14 08:33:00 UTC

[jira] [Updated] (SPARK-22007) spark-submit on yarn or local , got different result

     [ https://issues.apache.org/jira/browse/SPARK-22007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xinzhang updated SPARK-22007:
-----------------------------
    Description: 
submit the py script on local.
/opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
result:
+------------+
|databaseName|
+------------+
|     default|
|         zzzz|
|       xxxxx|
+------------+

submit the py script on yarn.
/opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py
result:
+------------+
|databaseName|
+------------+
|     default|
+------------+

the py script :

[yangtt@dc-gateway119 test]$ cat test_hive.py 
#!/usr/bin/env python
#coding=utf-8

from os.path import expanduser, join, abspath

from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.conf import SparkConf

def squared(s):
  return s * s

warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')

spark = SparkSession \
    .builder \
    .appName("Python_Spark_SQL_Hive") \
    .config("spark.sql.warehouse.dir", warehouse_location) \
    .config(conf=SparkConf()) \
    .enableHiveSupport() \
    .getOrCreate()

spark.udf.register("squared",squared)

spark.sql("show databases").show()



Q:why the spark load the different hive metastore
the yarn always use the DERBY?
17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
my current metastore is in mysql.
any suggest will be helpful.
thanks.

  was:
submit the py script on local.
/opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
result:
+------------+
|databaseName|
+------------+
|     default|
|         zzzz|
|       xxxxx|
+------------+

submit the py script on yarn.
/opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py
result:
+------------+
|databaseName|
+------------+
|     default|
+------------+

the py script :

[yangtt@dc-gateway119 test]$ cat test_hive.py 
#!/usr/bin/env python
#coding=utf-8

from os.path import expanduser, join, abspath

from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.conf import SparkConf

def squared(s):
  return s * s

# warehouse_location points to the default location for managed databases and tables
warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')

spark = SparkSession \
    .builder \
    .appName("Python_Spark_SQL_Hive") \
    .config("spark.sql.warehouse.dir", warehouse_location) \
    .config(conf=SparkConf()) \
    .enableHiveSupport() \
    .getOrCreate()

spark.udf.register("squared",squared)

spark.sql("show databases").show()



Q:why the spark load the different hive metastore
the yarn always use the DERBY?
17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
my current metastore is in mysql.
any suggest will be helpful.
thanks.


> spark-submit on yarn or local , got different result
> ----------------------------------------------------
>
>                 Key: SPARK-22007
>                 URL: https://issues.apache.org/jira/browse/SPARK-22007
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Shell, Spark Submit
>    Affects Versions: 2.1.0
>            Reporter: xinzhang
>
> submit the py script on local.
> /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
> result:
> +------------+
> |databaseName|
> +------------+
> |     default|
> |         zzzz|
> |       xxxxx|
> +------------+
> submit the py script on yarn.
> /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py
> result:
> +------------+
> |databaseName|
> +------------+
> |     default|
> +------------+
> the py script :
> [yangtt@dc-gateway119 test]$ cat test_hive.py 
> #!/usr/bin/env python
> #coding=utf-8
> from os.path import expanduser, join, abspath
> from pyspark.sql import SparkSession
> from pyspark.sql import Row
> from pyspark.conf import SparkConf
> def squared(s):
>   return s * s
> warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')
> spark = SparkSession \
>     .builder \
>     .appName("Python_Spark_SQL_Hive") \
>     .config("spark.sql.warehouse.dir", warehouse_location) \
>     .config(conf=SparkConf()) \
>     .enableHiveSupport() \
>     .getOrCreate()
> spark.udf.register("squared",squared)
> spark.sql("show databases").show()
> Q:why the spark load the different hive metastore
> the yarn always use the DERBY?
> 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
> my current metastore is in mysql.
> any suggest will be helpful.
> thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org