You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ya <xi...@126.com> on 2019/06/05 03:50:31 UTC

installation of spark

Dear list,


I am very new to spark, and I am having trouble installing it on my mac. I have following questions, please give me some guidance. Thank you very much.


1. How many and what software should I install before installing spark? I have been searching online, people discussing their experiences on this topic with different opinions, some says there is no need to install hadoop before install spark, some says hadoop has to be installed before spark. Some other people say scala has to be installed, whereas others say scala is included in spark, and it is installed automatically once spark in installed. So I am confused what to install for a start.


2.  Is there an simple way to configure these software? for instance, an all-in-one configuration file? It takes forever for me to configure things before I can really use it for data analysis.


I hope my questions make sense. Thank you very much.


Best regards,


YA

Re: sparksql in sparkR?

Posted by Felix Cheung <fe...@hotmail.com>.

This seem to be more a question of spark-sql shell? I may suggest you change the email title to get more attention.

________________________________
From: ya <xi...@126.com>
Sent: Wednesday, June 5, 2019 11:48:17 PM
To: user@spark.apache.org
Subject: sparksql in sparkR?

Dear list,

I am trying to use sparksql within my R, I am having the following questions, could you give me some advice please? Thank you very much.

1. I connect my R and spark using the library sparkR, probably some of the members here also are R users? Do I understand correctly that SparkSQL can be connected and triggered via SparkR and used in R (not in sparkR shell of spark)?

2. I ran sparkR library in R, trying to create a new sql database and a table, I could not get the database and the table I want. The code looks like below:

library(SparkR)
Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7')
sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
sql("create database learnsql; use learnsql")
sql("
create table employee_tbl
(emp_id varchar(10) not null,
emp_name char(10) not null,
emp_st_addr char(10) not null,
emp_city char(10) not null,
emp_st char(10) not null,
emp_zip integer(5) not null,
emp_phone integer(10) null,
emp_pager integer(10) null);
insert into employee_tbl values ('0001','john','yanlanjie 1','gz','jiaoqiaojun','510006','1353');
select*from employee_tbl;
“)

I ran the following code in spark-sql shell, I get the database learnsql, however, I still can’t get the table.

spark-sql> create database learnsql;show databases;
19/06/06 14:42:36 INFO HiveMetaStore: 0: create_database: Database(name:learnsql, description:, locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
19/06/06 14:42:36 INFO audit: ugi=ya    ip=unknown-ip-addr      cmd=create_database: Database(name:learnsql, description:, locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database learnsql already exists;

spark-sql> create table employee_tbl
         > (emp_id varchar(10) not null,
         > emp_name char(10) not null,
         > emp_st_addr char(10) not null,
         > emp_city char(10) not null,
         > emp_st char(10) not null,
         > emp_zip integer(5) not null,
         > emp_phone integer(10) null,
         > emp_pager integer(10) null);
Error in query:
no viable alternative at input 'create table employee_tbl\n(emp_id varchar(10) not'(line 2, pos 20)

== SQL ==
create table employee_tbl
(emp_id varchar(10) not null,
--------------------^^^
emp_name char(10) not null,
emp_st_addr char(10) not null,
emp_city char(10) not null,
emp_st char(10) not null,
emp_zip integer(5) not null,
emp_phone integer(10) null,
emp_pager integer(10) null)

spark-sql> insert into employee_tbl values ('0001','john','yanlanjie 1','gz','jiaoqiaojun','510006','1353');
19/06/06 14:43:43 INFO HiveMetaStore: 0: get_table : db=default tbl=employee_tbl
19/06/06 14:43:43 INFO audit: ugi=ya    ip=unknown-ip-addr      cmd=get_table : db=default tbl=employee_tbl
Error in query: Table or view not found: employee_tbl; line 1 pos 0


Does sparkSQL has different coding grammar? What did I miss?

Thank you very much.

Best regards,

YA




---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL in R?

Posted by Felix Cheung <fe...@hotmail.com>.

I don’t think you should get a hive-xml from the internet.

It should have connection information about a running hive metastore - if you don’t have a hive metastore service as you are running locally (from a laptop?) then you don’t really need it. You can get spark to work with it’s own.

________________________________
From: ya <xi...@126.com>
Sent: Friday, June 7, 2019 8:26:27 PM
To: Rishikesh Gawade; felixcheung_m@hotmail.com; user@spark.apache.org
Subject: Spark SQL in R?

Dear Felix and Richikesh and list,

Thank you very much for your previous help. So far I have tried two ways to trigger Spark SQL: one is to use R with sparklyr library and SparkR library; the other way is to use SparkR shell from Spark. I am not connecting a remote spark cluster, but a local one. Both failed with or without hive-site.xml. I suspect the content of hive-site.xml I found online was not appropriate for this case, as the spark session can not be initialized after adding this hive-site.xml. My questions are:

1. Is there any example for the content of hive-site.xml for this case?

2. I used sql() function to call the Spark SQL, is this the right way to do it?

###################################
##Here is the content in the hive-site.xml:##
###################################

<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.76.100:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value>
<description>password to use against metastore database</description>
</property>
</configuration>

################################
##Here is the situation happened in R:##
################################

> library(sparklyr) # load sparklyr package
> sc=spark_connect(master="local",spark_home="/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7") # connect sparklyr with spark
> sql('create database learnsql')
Error in sql("create database learnsql") : could not find function "sql"
> library(SparkR)

Attaching package: ‘SparkR’

The following object is masked from ‘package:sparklyr’:

    collect

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from ‘package:base’:

    as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind,
    sample, startsWith, subset, summary, transform, union

> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized
> Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7')
> sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
Spark not found in SPARK_HOME:
Spark package found in SPARK_HOME: /Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7
Launching java with spark-submit command /Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7/bin/spark-submit   sparkr-shell /var/folders/d8/7j6xswf92c3gmhwy_lrk63pm0000gn/T//Rtmpz22kK9/backend_port103d4cfcfd2c
19/06/08 11:14:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Error in handleErrors(returnStatus, conn) :

…... hundreds of lines of information and mistakes here ……

> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized

###################################
##Here is what happened in SparkR shell:##
####################################

Error in handleErrors(returnStatus, conn) :
  java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1107)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141)
at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:80)
at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:79)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.Iterator$class.foreach(Iterator.sca
> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized

Thank you very much.

YA

在 2019年6月8日，上午1:44，Rishikesh Gawade <ri...@gmail.com>> 写道：

Hi.
1. Yes you can connect to spark via R. If you are connecting to a remote spark cluster then you'll need EITHER a spark binary along with hive-site.xml in its config direcctory on the machine running R OR livy server installed on the cluster. You can then go on to use SparklyR, which, although has almost the same functions as of SparkR, is recommended over the latter.
For the first method mentioned above, use
sc <- sparklyr::spark_connect(master = "yarn-client", spark_home = Sys.getenv("SPARK_HOME"), conf = spark_config())
For the second method, use
sc <- sparklyr::spark_connect( master = "livyserverIP:port", method = "livy", conf = livy_config(conf = spark_config(), username = "foo", password = "bar"))

2. The reason that you're not getting the desired result could be that hive-site.xml is missing.To be able to connect to Hive from Spark-shell/Spark-submit/SparkR/SparklyR and perform sql operations, you need to have hive-site.xml in the $SPARK_HOME/conf directory. This is hive-site.xml should contain one and only one configuration which would be 'hive.metastore.uris'.

3. In case of spark-sql shell, it should work after putting the aforementioned hive-site.xml in the config directory of Spark. If it doesn't work, then please check the syntax.

Regards,
Rishikesh Gawade

On Thu, Jun 6, 2019, 12:18 PM ya <xi...@126.com>> wrote:
Dear list,

I am trying to use sparksql within my R, I am having the following questions, could you give me some advice please? Thank you very much.

1. I connect my R and spark using the library sparkR, probably some of the members here also are R users? Do I understand correctly that SparkSQL can be connected and triggered via SparkR and used in R (not in sparkR shell of spark)?

2. I ran sparkR library in R, trying to create a new sql database and a table, I could not get the database and the table I want. The code looks like below:

library(SparkR)
Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7')
sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
sql("create database learnsql; use learnsql")
sql("
create table employee_tbl
(emp_id varchar(10) not null,
emp_name char(10) not null,
emp_st_addr char(10) not null,
emp_city char(10) not null,
emp_st char(10) not null,
emp_zip integer(5) not null,
emp_phone integer(10) null,
emp_pager integer(10) null);
insert into employee_tbl values ('0001','john','yanlanjie 1','gz','jiaoqiaojun','510006','1353');
select*from employee_tbl;
“)

I ran the following code in spark-sql shell, I get the database learnsql, however, I still can’t get the table.

spark-sql> create database learnsql;show databases;
19/06/06 14:42:36 INFO HiveMetaStore: 0: create_database: Database(name:learnsql, description:, locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
19/06/06 14:42:36 INFO audit: ugi=ya    ip=unknown-ip-addr      cmd=create_database: Database(name:learnsql, description:, locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database learnsql already exists;

spark-sql> create table employee_tbl
         > (emp_id varchar(10) not null,
         > emp_name char(10) not null,
         > emp_st_addr char(10) not null,
         > emp_city char(10) not null,
         > emp_st char(10) not null,
         > emp_zip integer(5) not null,
         > emp_phone integer(10) null,
         > emp_pager integer(10) null);
Error in query:
no viable alternative at input 'create table employee_tbl\n(emp_id varchar(10) not'(line 2, pos 20)

== SQL ==
create table employee_tbl
(emp_id varchar(10) not null,
--------------------^^^
emp_name char(10) not null,
emp_st_addr char(10) not null,
emp_city char(10) not null,
emp_st char(10) not null,
emp_zip integer(5) not null,
emp_phone integer(10) null,
emp_pager integer(10) null)

spark-sql> insert into employee_tbl values ('0001','john','yanlanjie 1','gz','jiaoqiaojun','510006','1353');
19/06/06 14:43:43 INFO HiveMetaStore: 0: get_table : db=default tbl=employee_tbl
19/06/06 14:43:43 INFO audit: ugi=ya    ip=unknown-ip-addr      cmd=get_table : db=default tbl=employee_tbl
Error in query: Table or view not found: employee_tbl; line 1 pos 0

Does sparkSQL has different coding grammar? What did I miss?

Thank you very much.

Best regards,

YA

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>

Spark SQL in R?

Posted by ya <xi...@126.com>.

Dear Felix and Richikesh and list,

Thank you very much for your previous help. So far I have tried two ways to trigger Spark SQL: one is to use R with sparklyr library and SparkR library; the other way is to use SparkR shell from Spark. I am not connecting a remote spark cluster, but a local one. Both failed with or without hive-site.xml. I suspect the content of hive-site.xml I found online was not appropriate for this case, as the spark session can not be initialized after adding this hive-site.xml. My questions are:

1. Is there any example for the content of hive-site.xml for this case?

2. I used sql() function to call the Spark SQL, is this the right way to do it?

###################################
##Here is the content in the hive-site.xml:##
###################################

<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.76.100:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
 
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
 
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
 
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value>
<description>password to use against metastore database</description>
</property>
</configuration>



################################
##Here is the situation happened in R:##
################################

> library(sparklyr) # load sparklyr package
> sc=spark_connect(master="local",spark_home="/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7") # connect sparklyr with spark
> sql('create database learnsql')
Error in sql("create database learnsql") : could not find function "sql"
> library(SparkR)

Attaching package: ‘SparkR’

The following object is masked from ‘package:sparklyr’:

    collect

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from ‘package:base’:

    as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind,
    sample, startsWith, subset, summary, transform, union

> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized
> Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7') 
> sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
Spark not found in SPARK_HOME: 
Spark package found in SPARK_HOME: /Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7
Launching java with spark-submit command /Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7/bin/spark-submit   sparkr-shell /var/folders/d8/7j6xswf92c3gmhwy_lrk63pm0000gn/T//Rtmpz22kK9/backend_port103d4cfcfd2c 
19/06/08 11:14:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Error in handleErrors(returnStatus, conn) : 

…... hundreds of lines of information and mistakes here ……

> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized



###################################
##Here is what happened in SparkR shell:##
####################################

Error in handleErrors(returnStatus, conn) : 
  java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
	at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1107)
	at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145)
	at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
	at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141)
	at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:80)
	at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:79)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
	at scala.collection.Iterator$class.foreach(Iterator.sca
> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized



Thank you very much.

YA







> 在 2019年6月8日，上午1:44，Rishikesh Gawade <ri...@gmail.com> 写道：
> 
> Hi.
> 1. Yes you can connect to spark via R. If you are connecting to a remote spark cluster then you'll need EITHER a spark binary along with hive-site.xml in its config direcctory on the machine running R OR livy server installed on the cluster. You can then go on to use SparklyR, which, although has almost the same functions as of SparkR, is recommended over the latter.
> For the first method mentioned above, use
> sc <- sparklyr::spark_connect(master = "yarn-client", spark_home = Sys.getenv("SPARK_HOME"), conf = spark_config())
> For the second method, use
> sc <- sparklyr::spark_connect( master = "livyserverIP:port", method = "livy", conf = livy_config(conf = spark_config(), username = "foo", password = "bar"))
> 
> 2. The reason that you're not getting the desired result could be that hive-site.xml is missing.To be able to connect to Hive from Spark-shell/Spark-submit/SparkR/SparklyR and perform sql operations, you need to have hive-site.xml in the $SPARK_HOME/conf directory. This is hive-site.xml should contain one and only one configuration which would be 'hive.metastore.uris'. 
> 
> 3. In case of spark-sql shell, it should work after putting the aforementioned hive-site.xml in the config directory of Spark. If it doesn't work, then please check the syntax.
> 
> Regards,
> Rishikesh Gawade
> 
> 
> On Thu, Jun 6, 2019, 12:18 PM ya <xinxi813@126.com <ma...@126.com>> wrote:
> Dear list,
> 
> I am trying to use sparksql within my R, I am having the following questions, could you give me some advice please? Thank you very much.
> 
> 1. I connect my R and spark using the library sparkR, probably some of the members here also are R users? Do I understand correctly that SparkSQL can be connected and triggered via SparkR and used in R (not in sparkR shell of spark)?
> 
> 2. I ran sparkR library in R, trying to create a new sql database and a table, I could not get the database and the table I want. The code looks like below:
> 
> library(SparkR)
> Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7') 
> sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
> sql("create database learnsql; use learnsql")
> sql("
> create table employee_tbl
> (emp_id varchar(10) not null,
> emp_name char(10) not null,
> emp_st_addr char(10) not null,
> emp_city char(10) not null,
> emp_st char(10) not null,
> emp_zip integer(5) not null,
> emp_phone integer(10) null,
> emp_pager integer(10) null);
> insert into employee_tbl values ('0001','john','yanlanjie 1','gz','jiaoqiaojun','510006','1353');
> select*from employee_tbl;
> “)
> 
> I ran the following code in spark-sql shell, I get the database learnsql, however, I still can’t get the table. 
> 
> spark-sql> create database learnsql;show databases;
> 19/06/06 14:42:36 INFO HiveMetaStore: 0: create_database: Database(name:learnsql, description:, locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
> 19/06/06 14:42:36 INFO audit: ugi=ya    ip=unknown-ip-addr      cmd=create_database: Database(name:learnsql, description:, locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})       
> Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database learnsql already exists;
> 
> spark-sql> create table employee_tbl
>          > (emp_id varchar(10) not null,
>          > emp_name char(10) not null,
>          > emp_st_addr char(10) not null,
>          > emp_city char(10) not null,
>          > emp_st char(10) not null,
>          > emp_zip integer(5) not null,
>          > emp_phone integer(10) null,
>          > emp_pager integer(10) null);
> Error in query: 
> no viable alternative at input 'create table employee_tbl\n(emp_id varchar(10) not'(line 2, pos 20)
> 
> == SQL ==
> create table employee_tbl
> (emp_id varchar(10) not null,
> --------------------^^^
> emp_name char(10) not null,
> emp_st_addr char(10) not null,
> emp_city char(10) not null,
> emp_st char(10) not null,
> emp_zip integer(5) not null,
> emp_phone integer(10) null,
> emp_pager integer(10) null)
> 
> spark-sql> insert into employee_tbl values ('0001','john','yanlanjie 1','gz','jiaoqiaojun','510006','1353');
> 19/06/06 14:43:43 INFO HiveMetaStore: 0: get_table : db=default tbl=employee_tbl
> 19/06/06 14:43:43 INFO audit: ugi=ya    ip=unknown-ip-addr      cmd=get_table : db=default tbl=employee_tbl     
> Error in query: Table or view not found: employee_tbl; line 1 pos 0
> 
> 
> Does sparkSQL has different coding grammar? What did I miss?
> 
> Thank you very much.
> 
> Best regards,
> 
> YA
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
>

sparksql in sparkR?

Posted by ya <xi...@126.com>.

Dear list,

I am trying to use sparksql within my R, I am having the following questions, could you give me some advice please? Thank you very much.

1. I connect my R and spark using the library sparkR, probably some of the members here also are R users? Do I understand correctly that SparkSQL can be connected and triggered via SparkR and used in R (not in sparkR shell of spark)?

2. I ran sparkR library in R, trying to create a new sql database and a table, I could not get the database and the table I want. The code looks like below:

library(SparkR)
Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7') 
sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
sql("create database learnsql; use learnsql")
sql("
create table employee_tbl
(emp_id varchar(10) not null,
emp_name char(10) not null,
emp_st_addr char(10) not null,
emp_city char(10) not null,
emp_st char(10) not null,
emp_zip integer(5) not null,
emp_phone integer(10) null,
emp_pager integer(10) null);
insert into employee_tbl values ('0001','john','yanlanjie 1','gz','jiaoqiaojun','510006','1353');
select*from employee_tbl;
“)

I ran the following code in spark-sql shell, I get the database learnsql, however, I still can’t get the table. 

spark-sql> create database learnsql;show databases;
19/06/06 14:42:36 INFO HiveMetaStore: 0: create_database: Database(name:learnsql, description:, locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
19/06/06 14:42:36 INFO audit: ugi=ya	ip=unknown-ip-addr	cmd=create_database: Database(name:learnsql, description:, locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})	
Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database learnsql already exists;

spark-sql> create table employee_tbl
         > (emp_id varchar(10) not null,
         > emp_name char(10) not null,
         > emp_st_addr char(10) not null,
         > emp_city char(10) not null,
         > emp_st char(10) not null,
         > emp_zip integer(5) not null,
         > emp_phone integer(10) null,
         > emp_pager integer(10) null);
Error in query: 
no viable alternative at input 'create table employee_tbl\n(emp_id varchar(10) not'(line 2, pos 20)

== SQL ==
create table employee_tbl
(emp_id varchar(10) not null,
--------------------^^^
emp_name char(10) not null,
emp_st_addr char(10) not null,
emp_city char(10) not null,
emp_st char(10) not null,
emp_zip integer(5) not null,
emp_phone integer(10) null,
emp_pager integer(10) null)

spark-sql> insert into employee_tbl values ('0001','john','yanlanjie 1','gz','jiaoqiaojun','510006','1353');
19/06/06 14:43:43 INFO HiveMetaStore: 0: get_table : db=default tbl=employee_tbl
19/06/06 14:43:43 INFO audit: ugi=ya	ip=unknown-ip-addr	cmd=get_table : db=default tbl=employee_tbl	
Error in query: Table or view not found: employee_tbl; line 1 pos 0


Does sparkSQL has different coding grammar? What did I miss?

Thank you very much.

Best regards,

YA




---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: installation of spark

Posted by Alonso Isidoro Roman <al...@gmail.com>.

When using osx, it is recommended to install java, scala and spark using
brew.

Run these commands on a terminal:

brew update

brew install scala

brew install sbt

brew cask install java

brew install spark


There is no need to install HDFS, you  can use your local file system
without a problem.


*How to set JAVA_HOME on Mac OS X **temporary *

   1. Open *Terminal*.
   2. Confirm you have JDK by typing “which java”. ...
   3. Check you have the needed version of Java, by typing “java -version”.
   4. *Set JAVA_HOME* using this command in *Terminal*: *export JAVA_HOME*
   =/Library/Java/Home.
   5. echo $*JAVA_HOME* on *Terminal* to confirm the path.
   6. You should now be able to run your application.


*How to set JAVA_HOME on Mac OS X permanently*

$ vim .bash_profile

$ export JAVA_HOME=$(/usr/libexec/java_home)

$ source .bash_profile

$ echo $JAVA_HOME


Have fun!

Alonso


El mié., 5 jun. 2019 a las 6:10, Jack Kolokasis (<ko...@ics.forth.gr>)
escribió:

> Hello,
>
>     at first you will need to make sure that JAVA is installed, or install
> it otherwise. Then install scala and a build tool (sbt or maven). In my
> point of view, IntelliJ IDEA is a good option to create your Spark
> applications.  At the end you have to install a distributed file system e.g
> HDFS.
>
>     I think there is no an all-in-one configuration. But there are
> examples about how to configure you Spark cluster (e.g
> https://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-standalone-example-2-workers-on-1-node-cluster.adoc
> ).
> Best,
> --Iacovos
> On 5/6/19 5:50 π.μ., ya wrote:
>
> Dear list,
>
> I am very new to spark, and I am having trouble installing it on my mac. I
> have following questions, please give me some guidance. Thank you very much.
>
> 1. How many and what software should I install before installing spark? I
> have been searching online, people discussing their experiences on this
> topic with different opinions, some says there is no need to install hadoop
> before install spark, some says hadoop has to be installed before spark.
> Some other people say scala has to be installed, whereas others say scala
> is included in spark, and it is installed automatically once spark in
> installed. So I am confused what to install for a start.
>
> 2.  Is there an simple way to configure these software? for instance, an
> all-in-one configuration file? It takes forever for me to configure things
> before I can really use it for data analysis.
>
> I hope my questions make sense. Thank you very much.
>
> Best regards,
>
> YA
>
>

-- 
Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman
<https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>

Re: installation of spark

Posted by Jack Kolokasis <ko...@ics.forth.gr>.

Hello,

     at first you will need to make sure that JAVA is installed, or 
install it otherwise. Then install scala and a build tool (sbt or 
maven). In my point of view, IntelliJ IDEA is a good option to create 
your Spark applications.  At the end you have to install a distributed 
file system e.g HDFS.

     I think there is no an all-in-one configuration. But there are 
examples about how to configure you Spark cluster (e.g 
https://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-standalone-example-2-workers-on-1-node-cluster.adoc).

Best,
--Iacovos
On 5/6/19 5:50 π.μ., ya wrote:
> Dear list,
>
> I am very new to spark, and I am having trouble installing it on my 
> mac. I have following questions, please give me some guidance. Thank 
> you very much.
>
> 1. How many and what software should I install before installing 
> spark? I have been searching online, people discussing their 
> experiences on this topic with different opinions, some says there is 
> no need to install hadoop before install spark, some says hadoop has 
> to be installed before spark. Some other people say scala has to be 
> installed, whereas others say scala is included in spark, and it is 
> installed automatically once spark in installed. So I am confused what 
> to install for a start.
>
> 2.  Is there an simple way to configure these software? for instance, 
> an all-in-one configuration file? It takes forever for me to configure 
> things before I can really use it for data analysis.
>
> I hope my questions make sense. Thank you very much.
>
> Best regards,
>
> YA