You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Thomas Achache <t....@criteo.com> on 2016/02/04 11:18:03 UTC

Querying Hive from R via SparkR

Hello everyone,

We are running a Hive Cluster in a Kerberos environment, that we usually access via ssh from our local machines on windows. I would like to be able to query Hive directly from R on those same windows machines by using the SparkR<https://spark.apache.org/docs/1.5.2/sparkr.html> package:
https://spark.apache.org/docs/1.5.2/sparkr.html#from-hive-tables

Does anyone here have some experience doing that? I suspect this might not be the best mailing list to ask this question so don't hesitate to redirect me if needed.

I would like to know
a) if what I'm trying to achieve is even possible (maybe it's only possible if R is directly installed on the hive cluster, and we use Rstudio Servers?)
b) If anyone could point me to a web resource that explains how to setup the SparkContext and / or the HiveContext
c) If there is a simpler solution to query Hive from R (for instance, we also use a JDBC connection with Vertica and it works just fine in the setup described above)

Sorry if this is a bit off-topic but I'm completely lost in the documentation and I don't know where to ask help :(

All the best,

Thomas