You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Eskilson,Aleksander" <Al...@Cerner.com> on 2015/06/08 22:38:34 UTC

SparkR Reading Tables from Hive

Hi there,

I’m testing out the new SparkR-Hive interop right now. I’m noticing an apparent disconnect between the Hive store I have my data loaded and the store that sparkRHIve.init() connects to. For example, in beeline:

0: jdbc:hive2://quickstart.cloudera:10000> show databases;
+---------------+--+
| database_name |
+---------------+--+
| default       |
+---------------+--+
0: jdbc:hive2://quickstart.cloudera:10000> show tables;
+---------------+--+
| tab_name      |
+---------------+--+
| my_table      |
+---------------+--+

But in sparkR:

> hqlContext <- sparkRHive.init(sc)
> showDF(sql(hqlContext, “show databases”))
+---------+
| result  |
+---------+
| default |
+---------+
> showDF(tables(hqlContext, “default”))
+-----------+-------------+
+ tableName | isTemporary |
+-----------+-------------+
+-----------+-------------+
> showDF(sql(hqlContext, “show tables”))
+-----------+-------------+
+ tableName | isTemporary |
+-----------+-------------+
+-----------+-------------+

The data in my_table was landed into Hive from a CSV via kite-dataset. The installation of Spark I’m working with was built separately, and operates as standalone. Could it be that sparkRHive.init() is getting the wrong address of the Hive metastore? How could I peer into the context and see what the address is set to, and if it’s wrong, reset it?

Ultimately, I’d like to be able to read my_table from Hive into a SparkR DataFrame which ought to be possible with
> result <- sql(hqlContext, “SELECT * FROM my_table”)
But this fails with:
org.apache.spark.sql.AnalysisException: no such table my_table; line 1 pos 14
which is expected, I suppose, since we don’t see the table in the listing above.

Any thoughts?

Thanks,
Alek Eskilson

CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

Re: SparkR Reading Tables from Hive

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

Thanks for the confirmation - I was just going to send a pointer to the
documentation that talks about hive-site.xml.
http://people.apache.org/~pwendell/spark-releases/latest/sql-programming-guide.html#hive-tables

Thanks
Shivaram

On Mon, Jun 8, 2015 at 1:57 PM, Eskilson,Aleksander <
Alek.Eskilson@cerner.com> wrote:

>  Resolved, my hive-site.xml wasn’t in the conf folder. I can load tables
> into DataFrames as expected.
>
>  Thanks,
> Alek
>
>   From: <Eskilson>, Aleksander Eskilson <Al...@cerner.com>
> Date: Monday, June 8, 2015 at 3:38 PM
> To: "dev@spark.apache.org" <de...@spark.apache.org>
> Subject: SparkR Reading Tables from Hive
>
>   Hi there,
>
>  I’m testing out the new SparkR-Hive interop right now. I’m noticing an
> apparent disconnect between the Hive store I have my data loaded and the
> store that sparkRHIve.init() connects to. For example, in beeline:
>
>  0: jdbc:hive2://quickstart.cloudera:10000> show databases;
>  +---------------+--+
>  | database_name |
>  +---------------+--+
>  | default       |
>  +---------------+--+
>  0: jdbc:hive2://quickstart.cloudera:10000> show tables;
>  +---------------+--+
> | tab_name      |
> +---------------+--+
> | my_table      |
> +---------------+--+
>
>  But in sparkR:
>
>  > hqlContext <- sparkRHive.init(sc)
>  > showDF(sql(hqlContext, “show databases”))
>  +---------+
>  | result  |
>  +---------+
>  | default |
>  +---------+
> > showDF(tables(hqlContext, “default”))
> +-----------+-------------+
> + tableName | isTemporary |
> +-----------+-------------+
> +-----------+-------------+
> > showDF(sql(hqlContext, “show tables”))
>  +-----------+-------------+
> + tableName | isTemporary |
> +-----------+-------------+
> +-----------+-------------+
>
>  The data in my_table was landed into Hive from a CSV via kite-dataset.
> The installation of Spark I’m working with was built separately, and
> operates as standalone. Could it be that sparkRHive.init() is getting the
> wrong address of the Hive metastore? How could I peer into the context and
> see what the address is set to, and if it’s wrong, reset it?
>
>  Ultimately, I’d like to be able to read my_table from Hive into a SparkR
> DataFrame which ought to be possible with
> > result <- sql(hqlContext, “SELECT * FROM my_table”)
> But this fails with:
> org.apache.spark.sql.AnalysisException: no such table my_table; line 1 pos
> 14
> which is expected, I suppose, since we don’t see the table in the listing
> above.
>
>  Any thoughts?
>
>  Thanks,
> Alek Eskilson
> CONFIDENTIALITY NOTICE This message and any included attachments are from
> Cerner Corporation and are intended only for the addressee. The information
> contained in this message is confidential and may constitute inside or
> non-public information under international, federal, or state securities
> laws. Unauthorized forwarding, printing, copying, distribution, or use of
> such information is strictly prohibited and may be unlawful. If you are not
> the addressee, please promptly delete this message and notify the sender of
> the delivery error by e-mail or you may call Cerner's corporate offices in
> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>

Re: SparkR Reading Tables from Hive

Posted by "Eskilson,Aleksander" <Al...@Cerner.com>.

Resolved, my hive-site.xml wasn’t in the conf folder. I can load tables into DataFrames as expected.

Thanks,
Alek

From: <Eskilson>, Aleksander Eskilson <Al...@cerner.com>>
Date: Monday, June 8, 2015 at 3:38 PM
To: "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Subject: SparkR Reading Tables from Hive

Hi there,

I’m testing out the new SparkR-Hive interop right now. I’m noticing an apparent disconnect between the Hive store I have my data loaded and the store that sparkRHIve.init() connects to. For example, in beeline:

0: jdbc:hive2://quickstart.cloudera:10000> show databases;
+---------------+--+
| database_name |
+---------------+--+
| default       |
+---------------+--+
0: jdbc:hive2://quickstart.cloudera:10000> show tables;
+---------------+--+
| tab_name      |
+---------------+--+
| my_table      |
+---------------+--+

But in sparkR:

> hqlContext <- sparkRHive.init(sc)
> showDF(sql(hqlContext, “show databases”))
+---------+
| result  |
+---------+
| default |
+---------+
> showDF(tables(hqlContext, “default”))
+-----------+-------------+
+ tableName | isTemporary |
+-----------+-------------+
+-----------+-------------+
> showDF(sql(hqlContext, “show tables”))
+-----------+-------------+
+ tableName | isTemporary |
+-----------+-------------+
+-----------+-------------+

The data in my_table was landed into Hive from a CSV via kite-dataset. The installation of Spark I’m working with was built separately, and operates as standalone. Could it be that sparkRHive.init() is getting the wrong address of the Hive metastore? How could I peer into the context and see what the address is set to, and if it’s wrong, reset it?

Ultimately, I’d like to be able to read my_table from Hive into a SparkR DataFrame which ought to be possible with
> result <- sql(hqlContext, “SELECT * FROM my_table”)
But this fails with:
org.apache.spark.sql.AnalysisException: no such table my_table; line 1 pos 14
which is expected, I suppose, since we don’t see the table in the listing above.

Any thoughts?

Thanks,
Alek Eskilson
CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.