You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Peter Zhang <zh...@gmail.com> on 2016/01/19 05:23:57 UTC

SparkR with Hive integration

Hi all,

http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
From Hive tables
You can also create SparkR DataFrames from Hive tables. To do this we will need to create a HiveContext which can access tables in the Hive MetaStore. Note that Spark should have been built with Hive support and more details on the difference between SQLContext and HiveContext can be found in the SQL programming guide.

# sc is an existing SparkContext.
hiveContext <- sparkRHive.init(sc)

sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")

# Queries can be expressed in HiveQL.
results <- sql(hiveContext, "FROM src SELECT key, value")

# results is now a DataFrame
head(results)
##  key   value
## 1 238 val_238
## 2  86  val_86
## 3 311 val_311

I use RStudio to run above command, when I run "sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)”)”

I got exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if it doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/src is not a directory or unable to create one)

How  to use HDFS instead of local file system(file)?
Which parameter should to set?

Thanks a lot.


Peter Zhang
-- 
Google
Sent with Airmail

Re: SparkR with Hive integration

Posted by Felix Cheung <fe...@hotmail.com>.

You might need hive-site.xml



    _____________________________
From: Peter Zhang <zh...@gmail.com>
Sent: Monday, January 18, 2016 9:08 PM
Subject: Re: SparkR with Hive integration
To: Jeff Zhang <zj...@gmail.com>
Cc:  <us...@spark.apache.org>


          Thanks,        
       I will try.       
       Peter     
          --     
Google    
Sent with Airmail        
  

On January 19, 2016 at 12:44:46, Jeff Zhang (zjffdu@gmail.com) wrote:                                         Please make sure you export environment variable HADOOP_CONF_DIR which contains the core-site.xml                    
                On Mon, Jan 18, 2016 at 8:23 PM, Peter Zhang         <zh...@gmail.com> wrote:        
                                          Hi all,                                 
                                 http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes                      From Hive tables           

 You can also create SparkR DataFrames from Hive tables. To do this we will need to create a HiveContext which can access tables in the Hive MetaStore. Note that Spark should have been built with Hive support and more details on the difference between SQLContext and HiveContext can be found in the SQL programming guide.                                    # sc is an existing SparkContext.hiveContext <- sparkRHive.init(sc)sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")# Queries can be expressed in HiveQL.results <- sql(hiveContext, "FROM src SELECT key, value")# results is now a DataFramehead(results)## key value## 1 238 val_238## 2 86 val_86## 3 311 val_311                                            
           I use RStudio to run above command, when I run "          sql          (          hiveContext          ,            "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)”          )”                                   
                                   I got exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if it doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:            file:/user/hive/warehouse/src is not a directory or unable to create one)                                   
                                   How  to use HDFS instead of local file system(file)?                                   Which parameter should to set?                                   
                                   Thanks a lot.                                   
                                   
                                   Peter Zhang                                                                 -- 
 Google
 Sent with Airmail                                                                                    
        
                
        --       
                Best Regards        
         
 Jeff Zhang

Re: SparkR with Hive integration

Posted by Peter Zhang <zh...@gmail.com>.

Thanks, 

I will try.

Peter

-- 
Google
Sent with Airmail

On January 19, 2016 at 12:44:46, Jeff Zhang (zjffdu@gmail.com) wrote:

Please make sure you export environment variable HADOOP_CONF_DIR which contains the core-site.xml

On Mon, Jan 18, 2016 at 8:23 PM, Peter Zhang <zh...@gmail.com> wrote:
Hi all,

http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
From Hive tables
You can also create SparkR DataFrames from Hive tables. To do this we will need to create a HiveContext which can access tables in the Hive MetaStore. Note that Spark should have been built with Hive support and more details on the difference between SQLContext and HiveContext can be found in the SQL programming guide.


# sc is an existing SparkContext.
hiveContext <- sparkRHive.init(sc)

sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")

# Queries can be expressed in HiveQL.
results <- sql(hiveContext, "FROM src SELECT key, value")

# results is now a DataFrame
head(results)
##  key   value
## 1 238 val_238
## 2  86  val_86
## 3 311 val_311

I use RStudio to run above command, when I run "sql(hiveContext,  
"CREATE TABLE IF NOT EXISTS src (key INT, value
STRING)”)”

I got
exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if it doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:file:/user/hive/warehouse/src is not a directory or unable to create one)

How  to use HDFS instead of local file system(file)?
Which parameter should to set?

Thanks a lot.


Peter Zhang
-- 
Google
Sent with Airmail



--
Best Regards

Jeff Zhang

Re: SparkR with Hive integration

Posted by Jeff Zhang <zj...@gmail.com>.

Please make sure you export environment variable HADOOP_CONF_DIR which
contains the core-site.xml

On Mon, Jan 18, 2016 at 8:23 PM, Peter Zhang <zh...@gmail.com> wrote:

> Hi all,
>
> http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
> From Hive tables
> <http://spark.apache.org/docs/latest/sparkr.html#from-hive-tables>
>
> You can also create SparkR DataFrames from Hive tables. To do this we will
> need to create a HiveContext which can access tables in the Hive MetaStore.
> Note that Spark should have been built with Hive support
> <http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support> and
> more details on the difference between SQLContext and HiveContext can be
> found in the SQL programming guide
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext>
> .
>
> # sc is an existing SparkContext.
> hiveContext <- sparkRHive.init(sc)
>
> sql(hiveContext, "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> sql(hiveContext, "LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
> # Queries can be expressed in HiveQL.
> results <- sql(hiveContext, "FROM src SELECT key, value")
> # results is now a DataFramehead(results)##  key   value## 1 238 val_238## 2  86  val_86## 3 311 val_311
>
>
> I use RStudio to run above command, when I run "sql(hiveContext, "CREATE
> TABLE IF NOT EXISTS src (key INT, value STRING)”)”
>
> I got exception: 16/01/19 12:11:51 INFO FileUtils: Creating directory if
> it doesn't exist: file:/user/hive/warehouse/src 16/01/19 12:11:51 ERROR
> DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:file:/user/hive/warehouse/src is not a directory or
> unable to create one)
>
> How  to use HDFS instead of local file system(file)?
> Which parameter should to set?
>
> Thanks a lot.
>
>
> Peter Zhang
> --
> Google
> Sent with Airmail
>



-- 
Best Regards

Jeff Zhang