You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Peter Zhang <zh...@gmail.com> on 2016/01/19 05:23:57 UTC
SparkR with Hive integration
Hi all,
http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
From Hive tables
You can also create SparkR DataFrames from Hive tables. To do this we will need to create a HiveContext which can access tables in the Hive MetaStore. Note that Spark should have been built with Hive support and more details on the difference between SQLContext and HiveContext can be found in the SQL programming guide.
# sc is an existing SparkContext.
hiveContext <- sparkRHive.init(sc)
sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
# Queries can be expressed in HiveQL.
results <- sql(hiveContext, "FROM src SELECT key, value")
# results is now a DataFrame
head(results)
## key value
## 1 238 val_238
## 2 86 val_86
## 3 311 val_311
I use RStudio to run above command, when I run "sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)”)”
I got exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if it doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/src is not a directory or unable to create one)
How to use HDFS instead of local file system(file)?
Which parameter should to set?
Thanks a lot.
Peter Zhang
--
Google
Sent with Airmail
Re: SparkR with Hive integration
Posted by Felix Cheung <fe...@hotmail.com>.
You might need hive-site.xml
_____________________________
From: Peter Zhang <zh...@gmail.com>
Sent: Monday, January 18, 2016 9:08 PM
Subject: Re: SparkR with Hive integration
To: Jeff Zhang <zj...@gmail.com>
Cc: <us...@spark.apache.org>
Thanks,
I will try.
Peter
--
Google
Sent with Airmail
On January 19, 2016 at 12:44:46, Jeff Zhang (zjffdu@gmail.com) wrote: Please make sure you export environment variable HADOOP_CONF_DIR which contains the core-site.xml
On Mon, Jan 18, 2016 at 8:23 PM, Peter Zhang <zh...@gmail.com> wrote:
Hi all,
http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes From Hive tables
You can also create SparkR DataFrames from Hive tables. To do this we will need to create a HiveContext which can access tables in the Hive MetaStore. Note that Spark should have been built with Hive support and more details on the difference between SQLContext and HiveContext can be found in the SQL programming guide. # sc is an existing SparkContext.hiveContext <- sparkRHive.init(sc)sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")# Queries can be expressed in HiveQL.results <- sql(hiveContext, "FROM src SELECT key, value")# results is now a DataFramehead(results)## key value## 1 238 val_238## 2 86 val_86## 3 311 val_311
I use RStudio to run above command, when I run " sql ( hiveContext , "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)” )”
I got exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if it doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message: file:/user/hive/warehouse/src is not a directory or unable to create one)
How to use HDFS instead of local file system(file)? Which parameter should to set?
Thanks a lot.
Peter Zhang --
Google
Sent with Airmail
--
Best Regards
Jeff Zhang
Re: SparkR with Hive integration
Posted by Peter Zhang <zh...@gmail.com>.
Thanks,
I will try.
Peter
--
Google
Sent with Airmail
On January 19, 2016 at 12:44:46, Jeff Zhang (zjffdu@gmail.com) wrote:
Please make sure you export environment variable HADOOP_CONF_DIR which contains the core-site.xml
On Mon, Jan 18, 2016 at 8:23 PM, Peter Zhang <zh...@gmail.com> wrote:
Hi all,
http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
From Hive tables
You can also create SparkR DataFrames from Hive tables. To do this we will need to create a HiveContext which can access tables in the Hive MetaStore. Note that Spark should have been built with Hive support and more details on the difference between SQLContext and HiveContext can be found in the SQL programming guide.
# sc is an existing SparkContext.
hiveContext <- sparkRHive.init(sc)
sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
# Queries can be expressed in HiveQL.
results <- sql(hiveContext, "FROM src SELECT key, value")
# results is now a DataFrame
head(results)
## key value
## 1 238 val_238
## 2 86 val_86
## 3 311 val_311
I use RStudio to run above command, when I run "sql(hiveContext,
"CREATE TABLE IF NOT EXISTS src (key INT, value
STRING)”)”
I got
exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if it doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/src is not a directory or unable to create one)
How to use HDFS instead of local file system(file)?
Which parameter should to set?
Thanks a lot.
Peter Zhang
--
Google
Sent with Airmail
--
Best Regards
Jeff Zhang
Re: SparkR with Hive integration
Posted by Jeff Zhang <zj...@gmail.com>.
Please make sure you export environment variable HADOOP_CONF_DIR which
contains the core-site.xml
On Mon, Jan 18, 2016 at 8:23 PM, Peter Zhang <zh...@gmail.com> wrote:
> Hi all,
>
> http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
> From Hive tables
> <http://spark.apache.org/docs/latest/sparkr.html#from-hive-tables>
>
> You can also create SparkR DataFrames from Hive tables. To do this we will
> need to create a HiveContext which can access tables in the Hive MetaStore.
> Note that Spark should have been built with Hive support
> <http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support> and
> more details on the difference between SQLContext and HiveContext can be
> found in the SQL programming guide
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext>
> .
>
> # sc is an existing SparkContext.
> hiveContext <- sparkRHive.init(sc)
>
> sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
> # Queries can be expressed in HiveQL.
> results <- sql(hiveContext, "FROM src SELECT key, value")
> # results is now a DataFramehead(results)## key value## 1 238 val_238## 2 86 val_86## 3 311 val_311
>
>
> I use RStudio to run above command, when I run "sql(hiveContext, "CREATE
> TABLE IF NOT EXISTS src (key INT, value STRING)”)”
>
> I got exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if
> it doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR
> DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:file:/user/hive/warehouse/src is not a directory or
> unable to create one)
>
> How to use HDFS instead of local file system(file)?
> Which parameter should to set?
>
> Thanks a lot.
>
>
> Peter Zhang
> --
> Google
> Sent with Airmail
>
--
Best Regards
Jeff Zhang