You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Eric Pederson <er...@gmail.com> on 2015/07/08 20:27:08 UTC

Communication between driver, cluster and HiveServer

All:

I recently ran into a scenario where spark-shell could communicate with
Hive but another application of mine (Spark Notebook) could not.  When I
tried to get a reference to a table using sql.table("tab") Spark Notebook
threw an exception but spark-shell did not.

I was trying to track down the difference between the two applications and
was having a hard time figuring out what it was.

The problem was resolved by tweaking a hive-site.xml security setting, but
I'm still curious about how it works.

It seems like spark-shell knows how to look at
$SPARK_HOME/conf/hive-site.xml and communicate with the HiveServer
directly.  But my other application doesn't know anything about
hive-site.xml and must communicate with another piece of Spark to get the
information.  Originally this indirect communication didn't work, but after
the tweak to hive-site.xml it does.

How does the communication between the driver and Hive work?  And is
spark-shell somehow special in this regard?

Thanks,

-- Eric

Re: Communication between driver, cluster and HiveServer

Posted by Eric Pederson <er...@gmail.com>.
A couple of other things.

The Spark Notebook application does have hive-site.xml in its classpath.
It is a copy of the original $SPARK_HOME/conf/hive-site.xml that worked for
spark-shell originally   After the security tweaks were made to
$SPARK_HOME/conf/hive-site.xml, Spark Notebook started working.  But the
same tweaks did *not* need to be applied to the copy that is in the Spark
Notebook's classpath.

I'm running Spark 1.3.1, Hive 0.13.1 and MapR 4.1.0.  The tweaks were
hive.metastore.sasl.enabled=false, hive.server2.authentication=PAM, and
hive.server2.authentication.pam.services=login,sshd,sudo.

Thanks,

-- Eric

On Wed, Jul 8, 2015 at 2:27 PM, Eric Pederson <er...@gmail.com> wrote:

> All:
>
> I recently ran into a scenario where spark-shell could communicate with
> Hive but another application of mine (Spark Notebook) could not.  When I
> tried to get a reference to a table using sql.table("tab") Spark Notebook
> threw an exception but spark-shell did not.
>
> I was trying to track down the difference between the two applications and
> was having a hard time figuring out what it was.
>
> The problem was resolved by tweaking a hive-site.xml security setting,
> but I'm still curious about how it works.
>
> It seems like spark-shell knows how to look at
> $SPARK_HOME/conf/hive-site.xml and communicate with the HiveServer
> directly.  But my other application doesn't know anything about
> hive-site.xml and must communicate with another piece of Spark to get the
> information.  Originally this indirect communication didn't work, but after
> the tweak to hive-site.xml it does.
>
> How does the communication between the driver and Hive work?  And is
> spark-shell somehow special in this regard?
>
> Thanks,
>
> -- Eric
>