You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Ruslan Dautkhanov <da...@gmail.com> on 2016/11/21 21:52:36 UTC

"You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Getting
You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
See full stack [2] below.

I'm using Spark 1.6 that comes with CDH 5.8.3.
So it's definitely compiled with Hive.
We use Jupyter notebooks without problems in the same environment.

Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
apache.org

Is Zeppelin compiled with Hive too? I guess so.
Not sure what else is missing.

Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
difference.


[1]
$ cat zeppelin-env.sh
export JAVA_HOME=/usr/java/java7
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
spark.driver.memory=7g --conf spark.executor.cores=2 --conf
spark.executor.memory=8g"
export SPARK_APP_NAME="Zeppelin notebook"
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_CONF_DIR=/etc/hive/conf
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
export
PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
export MASTER="yarn-client"
export ZEPPELIN_SPARK_USEHIVECONTEXT=true




[2]

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
assembly
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module>
exec(code)
File "<stdin>", line 9, in <module>
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
line 580, in sql

[3]
Also have correct symlinks in zeppelin_home/conf for
- hive-site.xml
- hdfs-site.xml
- core-site.xml
- yarn-site.xml



Thank you,
Ruslan Dautkhanov

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Ruslan Dautkhanov <da...@gmail.com>.

Yes I can work with hiveContext from spark-shell.

Back to the original question.

Getting
You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
See full stack [2] above.

Any ideas?



-- 
Ruslan Dautkhanov

On Thu, Nov 24, 2016 at 4:48 PM, Jeff Zhang <zj...@gmail.com> wrote:

> My point is that I suspect CDH also didn't compile spark with hive, you
> can run spark-shell to verify that.
>
>
> Ruslan Dautkhanov <da...@gmail.com>于2016年11月25日周五 上午1:48写道：
>
>> Yep, CDH doesn't have Spark compiled with Thrift server.
>> My understanding Zeppelin uses spark-shell REPL and not Spark thrift
>> server.
>>
>> Thank you.
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Thu, Nov 24, 2016 at 1:57 AM, Jeff Zhang <zj...@gmail.com> wrote:
>>
>> AFAIK, spark of CDH don’t support spark thrift server, so it is possible
>> it is not compiled with hive. Can you run spark-shell to verify that ? If
>> it is built with hive, HiveContext will be created in spark-shell.
>>
>> Ruslan Dautkhanov <da...@gmail.com>于2016年11月24日周四 下午3:30写道：
>>
>> I can't reproduce this in %spark, nor %sql
>>
>> It seems to be %pyspark-specific.
>>
>> Also seems it runs fine first time I start Zeppelin, then it shows this
>> error
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>>
>>
>> sqlc = HiveContext(sc)
>> sqlc.sql("select count(*) from hivedb.someTable")
>>
>> It runs fine only one time, then
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>>
>> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in
>> <module>
>>
>>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>>
>> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 2, in <module>
>>
>>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>>
>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 683, in _ssql_ctx
>> self._scala_HiveContext = self._get_hive_ctx()
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 692, in _get_hive_ctx
>> return self._jvm.HiveContext(self._jsc.sc())
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>> line 1064, in __call__
>> answer, self._gateway_client, None, self._fqn)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
>> line 45, in deco
>> return f(*a, **kw)
>>
>>
>>
>> I don't see more details in logs than above error stack.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Wed, Nov 23, 2016 at 7:02 AM, Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>> Hmm, SPARK_HOME is set it should pick up the right Spark.
>>
>> Does this work with the Scala Spark interpreter instead of pyspark? If it
>> doesn't, is there more info in the log?
>>
>>
>> ------------------------------
>> *From:* Ruslan Dautkhanov <da...@gmail.com>
>> *Sent:* Monday, November 21, 2016 1:52:36 PM
>> *To:* users@zeppelin.apache.org
>> *Subject:* "You must build Spark with Hive. Export 'SPARK_HIVE=true'"
>>
>> Getting
>> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
>> See full stack [2] below.
>>
>> I'm using Spark 1.6 that comes with CDH 5.8.3.
>> So it's definitely compiled with Hive.
>> We use Jupyter notebooks without problems in the same environment.
>>
>> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
>> apache.org
>>
>> Is Zeppelin compiled with Hive too? I guess so.
>> Not sure what else is missing.
>>
>> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
>> difference.
>>
>>
>> [1]
>> $ cat zeppelin-env.sh
>> export JAVA_HOME=/usr/java/java7
>> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
>> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
>> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
>> spark.executor.memory=8g"
>> export SPARK_APP_NAME="Zeppelin notebook"
>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>> export HIVE_CONF_DIR=/etc/hive/conf
>> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
>> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
>> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/
>> opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
>> export MASTER="yarn-client"
>> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>>
>>
>>
>>
>> [2]
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in
>> <module>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 9, in <module>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>>
>> [3]
>> Also have correct symlinks in zeppelin_home/conf for
>> - hive-site.xml
>> - hdfs-site.xml
>> - core-site.xml
>> - yarn-site.xml
>>
>>
>>
>> Thank you,
>> Ruslan Dautkhanov
>>
>>
>>
>>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Jeff Zhang <zj...@gmail.com>.

My point is that I suspect CDH also didn't compile spark with hive, you can
run spark-shell to verify that.


Ruslan Dautkhanov <da...@gmail.com>于2016年11月25日周五 上午1:48写道：

> Yep, CDH doesn't have Spark compiled with Thrift server.
> My understanding Zeppelin uses spark-shell REPL and not Spark thrift
> server.
>
> Thank you.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Thu, Nov 24, 2016 at 1:57 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
> AFAIK, spark of CDH don’t support spark thrift server, so it is possible
> it is not compiled with hive. Can you run spark-shell to verify that ? If
> it is built with hive, HiveContext will be created in spark-shell.
>
> Ruslan Dautkhanov <da...@gmail.com>于2016年11月24日周四 下午3:30写道：
>
> I can't reproduce this in %spark, nor %sql
>
> It seems to be %pyspark-specific.
>
> Also seems it runs fine first time I start Zeppelin, then it shows this
> error
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
>
>
> sqlc = HiveContext(sc)
> sqlc.sql("select count(*) from hivedb.someTable")
>
> It runs fine only one time, then
>
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
> Traceback (most recent call last):
>
> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in <module>
>
>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
>
> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in <module>
> exec(code)
> File "<stdin>", line 2, in <module>
>
>
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 580, in sql
>
> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 683, in _ssql_ctx
> self._scala_HiveContext = self._get_hive_ctx()
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 692, in _get_hive_ctx
> return self._jvm.HiveContext(self._jsc.sc())
> File
> "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
> line 1064, in __call__
> answer, self._gateway_client, None, self._fqn)
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
> line 45, in deco
> return f(*a, **kw)
>
>
>
> I don't see more details in logs than above error stack.
>
>
> --
> Ruslan Dautkhanov
>
> On Wed, Nov 23, 2016 at 7:02 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
> Hmm, SPARK_HOME is set it should pick up the right Spark.
>
> Does this work with the Scala Spark interpreter instead of pyspark? If it
> doesn't, is there more info in the log?
>
>
> ------------------------------
> *From:* Ruslan Dautkhanov <da...@gmail.com>
> *Sent:* Monday, November 21, 2016 1:52:36 PM
> *To:* users@zeppelin.apache.org
> *Subject:* "You must build Spark with Hive. Export 'SPARK_HIVE=true'"
>
> Getting
> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
> See full stack [2] below.
>
> I'm using Spark 1.6 that comes with CDH 5.8.3.
> So it's definitely compiled with Hive.
> We use Jupyter notebooks without problems in the same environment.
>
> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
> apache.org
>
> Is Zeppelin compiled with Hive too? I guess so.
> Not sure what else is missing.
>
> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
> difference.
>
>
> [1]
> $ cat zeppelin-env.sh
> export JAVA_HOME=/usr/java/java7
> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
> spark.executor.memory=8g"
> export SPARK_APP_NAME="Zeppelin notebook"
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> export HIVE_CONF_DIR=/etc/hive/conf
> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
> export
> PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
> export MASTER="yarn-client"
> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>
>
>
>
> [2]
>
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
> Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module>
> exec(code)
> File "<stdin>", line 9, in <module>
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 580, in sql
>
> [3]
> Also have correct symlinks in zeppelin_home/conf for
> - hive-site.xml
> - hdfs-site.xml
> - core-site.xml
> - yarn-site.xml
>
>
>
> Thank you,
> Ruslan Dautkhanov
>
>
>
>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Ruslan Dautkhanov <da...@gmail.com>.

Yep, CDH doesn't have Spark compiled with Thrift server.
My understanding Zeppelin uses spark-shell REPL and not Spark thrift server.

Thank you.



-- 
Ruslan Dautkhanov

On Thu, Nov 24, 2016 at 1:57 AM, Jeff Zhang <zj...@gmail.com> wrote:

> AFAIK, spark of CDH don’t support spark thrift server, so it is possible
> it is not compiled with hive. Can you run spark-shell to verify that ? If
> it is built with hive, HiveContext will be created in spark-shell.
>
> Ruslan Dautkhanov <da...@gmail.com>于2016年11月24日周四 下午3:30写道：
>
>> I can't reproduce this in %spark, nor %sql
>>
>> It seems to be %pyspark-specific.
>>
>> Also seems it runs fine first time I start Zeppelin, then it shows this
>> error
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>>
>>
>> sqlc = HiveContext(sc)
>> sqlc.sql("select count(*) from hivedb.someTable")
>>
>> It runs fine only one time, then
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>>
>> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in
>> <module>
>>
>>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>>
>> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 2, in <module>
>>
>>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>>
>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 683, in _ssql_ctx
>> self._scala_HiveContext = self._get_hive_ctx()
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 692, in _get_hive_ctx
>> return self._jvm.HiveContext(self._jsc.sc())
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>> line 1064, in __call__
>> answer, self._gateway_client, None, self._fqn)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
>> line 45, in deco
>> return f(*a, **kw)
>>
>>
>>
>> I don't see more details in logs than above error stack.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Wed, Nov 23, 2016 at 7:02 AM, Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>> Hmm, SPARK_HOME is set it should pick up the right Spark.
>>
>> Does this work with the Scala Spark interpreter instead of pyspark? If it
>> doesn't, is there more info in the log?
>>
>>
>> ------------------------------
>> *From:* Ruslan Dautkhanov <da...@gmail.com>
>> *Sent:* Monday, November 21, 2016 1:52:36 PM
>> *To:* users@zeppelin.apache.org
>> *Subject:* "You must build Spark with Hive. Export 'SPARK_HIVE=true'"
>>
>> Getting
>> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
>> See full stack [2] below.
>>
>> I'm using Spark 1.6 that comes with CDH 5.8.3.
>> So it's definitely compiled with Hive.
>> We use Jupyter notebooks without problems in the same environment.
>>
>> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
>> apache.org
>>
>> Is Zeppelin compiled with Hive too? I guess so.
>> Not sure what else is missing.
>>
>> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
>> difference.
>>
>>
>> [1]
>> $ cat zeppelin-env.sh
>> export JAVA_HOME=/usr/java/java7
>> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
>> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
>> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
>> spark.executor.memory=8g"
>> export SPARK_APP_NAME="Zeppelin notebook"
>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>> export HIVE_CONF_DIR=/etc/hive/conf
>> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
>> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
>> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/
>> opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
>> export MASTER="yarn-client"
>> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>>
>>
>>
>>
>> [2]
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in
>> <module>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 9, in <module>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>>
>> [3]
>> Also have correct symlinks in zeppelin_home/conf for
>> - hive-site.xml
>> - hdfs-site.xml
>> - core-site.xml
>> - yarn-site.xml
>>
>>
>>
>> Thank you,
>> Ruslan Dautkhanov
>>
>>
>>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Jeff Zhang <zj...@gmail.com>.

AFAIK, spark of CDH don’t support spark thrift server, so it is possible it
is not compiled with hive. Can you run spark-shell to verify that ? If it
is built with hive, HiveContext will be created in spark-shell.

Ruslan Dautkhanov <da...@gmail.com>于2016年11月24日周四 下午3:30写道：

> I can't reproduce this in %spark, nor %sql
>
> It seems to be %pyspark-specific.
>
> Also seems it runs fine first time I start Zeppelin, then it shows this
> error
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
>
>
> sqlc = HiveContext(sc)
> sqlc.sql("select count(*) from hivedb.someTable")
>
> It runs fine only one time, then
>
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
> Traceback (most recent call last):
>
> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in <module>
>
>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
>
> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in <module>
> exec(code)
> File "<stdin>", line 2, in <module>
>
>
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 580, in sql
>
> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 683, in _ssql_ctx
> self._scala_HiveContext = self._get_hive_ctx()
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 692, in _get_hive_ctx
> return self._jvm.HiveContext(self._jsc.sc())
> File
> "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
> line 1064, in __call__
> answer, self._gateway_client, None, self._fqn)
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
> line 45, in deco
> return f(*a, **kw)
>
>
>
> I don't see more details in logs than above error stack.
>
>
> --
> Ruslan Dautkhanov
>
> On Wed, Nov 23, 2016 at 7:02 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
> Hmm, SPARK_HOME is set it should pick up the right Spark.
>
> Does this work with the Scala Spark interpreter instead of pyspark? If it
> doesn't, is there more info in the log?
>
>
> ------------------------------
> *From:* Ruslan Dautkhanov <da...@gmail.com>
> *Sent:* Monday, November 21, 2016 1:52:36 PM
> *To:* users@zeppelin.apache.org
> *Subject:* "You must build Spark with Hive. Export 'SPARK_HIVE=true'"
>
> Getting
> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
> See full stack [2] below.
>
> I'm using Spark 1.6 that comes with CDH 5.8.3.
> So it's definitely compiled with Hive.
> We use Jupyter notebooks without problems in the same environment.
>
> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
> apache.org
>
> Is Zeppelin compiled with Hive too? I guess so.
> Not sure what else is missing.
>
> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
> difference.
>
>
> [1]
> $ cat zeppelin-env.sh
> export JAVA_HOME=/usr/java/java7
> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
> spark.executor.memory=8g"
> export SPARK_APP_NAME="Zeppelin notebook"
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> export HIVE_CONF_DIR=/etc/hive/conf
> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
> export
> PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
> export MASTER="yarn-client"
> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>
>
>
>
> [2]
>
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
> Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module>
> exec(code)
> File "<stdin>", line 9, in <module>
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 580, in sql
>
> [3]
> Also have correct symlinks in zeppelin_home/conf for
> - hive-site.xml
> - hdfs-site.xml
> - core-site.xml
> - yarn-site.xml
>
>
>
> Thank you,
> Ruslan Dautkhanov
>
>
>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Ruslan Dautkhanov <da...@gmail.com>.

I can't reproduce this in %spark, nor %sql

It seems to be %pyspark-specific.

Also seems it runs fine first time I start Zeppelin, then it shows this
error
You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
assembly


sqlc = HiveContext(sc)
sqlc.sql("select count(*) from hivedb.someTable")

It runs fine only one time, then

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
> Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in <module>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in <module>
> exec(code)
> File "<stdin>", line 2, in <module>
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 580, in sql
> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 683, in _ssql_ctx
> self._scala_HiveContext = self._get_hive_ctx()
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 692, in _get_hive_ctx
> return self._jvm.HiveContext(self._jsc.sc())
> File
> "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
> line 1064, in __call__
> answer, self._gateway_client, None, self._fqn)
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
> line 45, in deco
> return f(*a, **kw)



I don't see more details in logs than above error stack.


-- 
Ruslan Dautkhanov

On Wed, Nov 23, 2016 at 7:02 AM, Felix Cheung <fe...@hotmail.com>
wrote:

> Hmm, SPARK_HOME is set it should pick up the right Spark.
>
> Does this work with the Scala Spark interpreter instead of pyspark? If it
> doesn't, is there more info in the log?
>
>
> ------------------------------
> *From:* Ruslan Dautkhanov <da...@gmail.com>
> *Sent:* Monday, November 21, 2016 1:52:36 PM
> *To:* users@zeppelin.apache.org
> *Subject:* "You must build Spark with Hive. Export 'SPARK_HIVE=true'"
>
> Getting
> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
> See full stack [2] below.
>
> I'm using Spark 1.6 that comes with CDH 5.8.3.
> So it's definitely compiled with Hive.
> We use Jupyter notebooks without problems in the same environment.
>
> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
> apache.org
>
> Is Zeppelin compiled with Hive too? I guess so.
> Not sure what else is missing.
>
> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
> difference.
>
>
> [1]
> $ cat zeppelin-env.sh
> export JAVA_HOME=/usr/java/java7
> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
> spark.executor.memory=8g"
> export SPARK_APP_NAME="Zeppelin notebook"
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> export HIVE_CONF_DIR=/etc/hive/conf
> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/
> opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
> export MASTER="yarn-client"
> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>
>
>
>
> [2]
>
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
> Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module>
> exec(code)
> File "<stdin>", line 9, in <module>
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 580, in sql
>
> [3]
> Also have correct symlinks in zeppelin_home/conf for
> - hive-site.xml
> - hdfs-site.xml
> - core-site.xml
> - yarn-site.xml
>
>
>
> Thank you,
> Ruslan Dautkhanov
>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Felix Cheung <fe...@hotmail.com>.

Hmm, SPARK_HOME is set it should pick up the right Spark.

Does this work with the Scala Spark interpreter instead of pyspark? If it doesn't, is there more info in the log?

________________________________
From: Ruslan Dautkhanov <da...@gmail.com>
Sent: Monday, November 21, 2016 1:52:36 PM
To: users@zeppelin.apache.org
Subject: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Getting
You must build Spark with Hive. Export 'SPARK_HIVE=true'
See full stack [2] below.

I'm using Spark 1.6 that comes with CDH 5.8.3.
So it's definitely compiled with Hive.
We use Jupyter notebooks without problems in the same environment.

Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from apache.org<http://apache.org>

Is Zeppelin compiled with Hive too? I guess so.
Not sure what else is missing.

Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make difference.

[1]
$ cat zeppelin-env.sh
export JAVA_HOME=/usr/java/java7
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf spark.driver.memory=7g --conf spark.executor.cores=2 --conf spark.executor.memory=8g"
export SPARK_APP_NAME="Zeppelin notebook"
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_CONF_DIR=/etc/hive/conf
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
export MASTER="yarn-client"
export ZEPPELIN_SPARK_USEHIVECONTEXT=true

[2]

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module>
exec(code)
File "<stdin>", line 9, in <module>
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 580, in sql

[3]
Also have correct symlinks in zeppelin_home/conf for
- hive-site.xml
- hdfs-site.xml
- core-site.xml
- yarn-site.xml

Thank you,
Ruslan Dautkhanov

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Ruslan Dautkhanov <da...@gmail.com>.

That's what we will have to do. It's hard to explain to users though, that
in Zeppelin you can assign HiveContext
to a variable only once. Didn't have this problem in Jupyter. Is this hard
to fix? Created https://issues.apache.org/jira/browse/ZEPPELIN-1728

If somebody forgets about this rule, it's only fixable by restarting
Zeppelin server which is super inconvenient.

Thanks.



-- 
Ruslan Dautkhanov

On Tue, Nov 29, 2016 at 12:54 PM, Felix Cheung <fe...@hotmail.com>
wrote:

> Can you reuse the HiveContext instead of making new ones with
> HiveContext(sc)?
>
>
> ------------------------------
> *From:* Ruslan Dautkhanov <da...@gmail.com>
> *Sent:* Sunday, November 27, 2016 8:07:41 AM
> *To:* users
> *Subject:* Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"
>
> Also, to get rid of this problem (once HiveContext(sc) was assigned at
> least twice to a variable,
> the only fix is - ro restart Zeppelin :-(
>
>
> --
> Ruslan Dautkhanov
>
> On Sun, Nov 27, 2016 at 9:00 AM, Ruslan Dautkhanov <da...@gmail.com>
> wrote:
>
>> I found a pattern when this happens.
>>
>> When I run
>> sqlCtx = HiveContext(sc)
>>
>> it works as expected.
>>
>> Second and any time after that - gives that exception stack I reported in
>> this email chain.
>>
>> > sqlCtx = HiveContext(sc)
>> > sqlCtx.sql('select * from marketview.spend_dim')
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 267, in
>> <module>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 2, in <module>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 683, in _ssql_ctx
>> self._scala_HiveContext = self._get_hive_ctx()
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 692, in _get_hive_ctx
>> return self._jvm.HiveContext(self._jsc.sc())
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>> line 1064, in __call__
>> answer, self._gateway_client, None, self._fqn)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
>> line 45, in deco
>> return f(*a, **kw)
>>
>>
>> Key piece to reproduce this issue - assign HiveContext(sc) to a variable
>> more than once,
>> and use that variable between assignments.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Mon, Nov 21, 2016 at 2:52 PM, Ruslan Dautkhanov <da...@gmail.com>
>> wrote:
>>
>>> Getting
>>> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
>>> See full stack [2] below.
>>>
>>> I'm using Spark 1.6 that comes with CDH 5.8.3.
>>> So it's definitely compiled with Hive.
>>> We use Jupyter notebooks without problems in the same environment.
>>>
>>> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
>>> apache.org
>>>
>>> Is Zeppelin compiled with Hive too? I guess so.
>>> Not sure what else is missing.
>>>
>>> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
>>> difference.
>>>
>>>
>>> [1]
>>> $ cat zeppelin-env.sh
>>> export JAVA_HOME=/usr/java/java7
>>> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
>>> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
>>> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
>>> spark.executor.memory=8g"
>>> export SPARK_APP_NAME="Zeppelin notebook"
>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>> export HIVE_CONF_DIR=/etc/hive/conf
>>> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
>>> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
>>> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/
>>> cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
>>> export MASTER="yarn-client"
>>> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>>>
>>>
>>>
>>>
>>> [2]
>>>
>>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>>> build/sbt assembly
>>> Traceback (most recent call last):
>>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in
>>> <module>
>>> raise Exception(traceback.format_exc())
>>> Exception: Traceback (most recent call last):
>>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in
>>> <module>
>>> exec(code)
>>> File "<stdin>", line 9, in <module>
>>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>>> line 580, in sql
>>>
>>> [3]
>>> Also have correct symlinks in zeppelin_home/conf for
>>> - hive-site.xml
>>> - hdfs-site.xml
>>> - core-site.xml
>>> - yarn-site.xml
>>>
>>>
>>>
>>> Thank you,
>>> Ruslan Dautkhanov
>>>
>>
>>
>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Felix Cheung <fe...@hotmail.com>.

Can you reuse the HiveContext instead of making new ones with HiveContext(sc)?

________________________________
From: Ruslan Dautkhanov <da...@gmail.com>
Sent: Sunday, November 27, 2016 8:07:41 AM
To: users
Subject: Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Also, to get rid of this problem (once HiveContext(sc) was assigned at least twice to a variable,
the only fix is - ro restart Zeppelin :-(

--
Ruslan Dautkhanov

On Sun, Nov 27, 2016 at 9:00 AM, Ruslan Dautkhanov <da...@gmail.com>> wrote:
I found a pattern when this happens.

When I run
sqlCtx = HiveContext(sc)

it works as expected.

Second and any time after that - gives that exception stack I reported in this email chain.

> sqlCtx = HiveContext(sc)
> sqlCtx.sql('select * from marketview.spend_dim')

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 267, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 265, in <module>
exec(code)
File "<stdin>", line 2, in <module>
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 580, in sql
return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 683, in _ssql_ctx
self._scala_HiveContext = self._get_hive_ctx()
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 692, in _get_hive_ctx
return self._jvm.HiveContext(self._jsc.sc<http://jsc.sc>())
File "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
answer, self._gateway_client, None, self._fqn)
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)

Key piece to reproduce this issue - assign HiveContext(sc) to a variable more than once,
and use that variable between assignments.

--
Ruslan Dautkhanov

On Mon, Nov 21, 2016 at 2:52 PM, Ruslan Dautkhanov <da...@gmail.com>> wrote:
Getting
You must build Spark with Hive. Export 'SPARK_HIVE=true'
See full stack [2] below.

I'm using Spark 1.6 that comes with CDH 5.8.3.
So it's definitely compiled with Hive.
We use Jupyter notebooks without problems in the same environment.

Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from apache.org<http://apache.org>

Is Zeppelin compiled with Hive too? I guess so.
Not sure what else is missing.

Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make difference.

[1]
$ cat zeppelin-env.sh
export JAVA_HOME=/usr/java/java7
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf spark.driver.memory=7g --conf spark.executor.cores=2 --conf spark.executor.memory=8g"
export SPARK_APP_NAME="Zeppelin notebook"
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HIVE_CONF_DIR=/etc/hive/conf
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
export MASTER="yarn-client"
export ZEPPELIN_SPARK_USEHIVECONTEXT=true

[2]

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module>
exec(code)
File "<stdin>", line 9, in <module>
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 580, in sql

[3]
Also have correct symlinks in zeppelin_home/conf for
- hive-site.xml
- hdfs-site.xml
- core-site.xml
- yarn-site.xml

Thank you,
Ruslan Dautkhanov

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Ruslan Dautkhanov <da...@gmail.com>.

Also, to get rid of this problem (once HiveContext(sc) was assigned at
least twice to a variable,
the only fix is - ro restart Zeppelin :-(


-- 
Ruslan Dautkhanov

On Sun, Nov 27, 2016 at 9:00 AM, Ruslan Dautkhanov <da...@gmail.com>
wrote:

> I found a pattern when this happens.
>
> When I run
> sqlCtx = HiveContext(sc)
>
> it works as expected.
>
> Second and any time after that - gives that exception stack I reported in
> this email chain.
>
> > sqlCtx = HiveContext(sc)
> > sqlCtx.sql('select * from marketview.spend_dim')
>
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
> Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 267, in <module>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 265, in <module>
> exec(code)
> File "<stdin>", line 2, in <module>
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 580, in sql
> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 683, in _ssql_ctx
> self._scala_HiveContext = self._get_hive_ctx()
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 692, in _get_hive_ctx
> return self._jvm.HiveContext(self._jsc.sc())
> File "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
> line 1064, in __call__
> answer, self._gateway_client, None, self._fqn)
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
> line 45, in deco
> return f(*a, **kw)
>
>
> Key piece to reproduce this issue - assign HiveContext(sc) to a variable
> more than once,
> and use that variable between assignments.
>
>
> --
> Ruslan Dautkhanov
>
> On Mon, Nov 21, 2016 at 2:52 PM, Ruslan Dautkhanov <da...@gmail.com>
> wrote:
>
>> Getting
>> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
>> See full stack [2] below.
>>
>> I'm using Spark 1.6 that comes with CDH 5.8.3.
>> So it's definitely compiled with Hive.
>> We use Jupyter notebooks without problems in the same environment.
>>
>> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
>> apache.org
>>
>> Is Zeppelin compiled with Hive too? I guess so.
>> Not sure what else is missing.
>>
>> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
>> difference.
>>
>>
>> [1]
>> $ cat zeppelin-env.sh
>> export JAVA_HOME=/usr/java/java7
>> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
>> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
>> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
>> spark.executor.memory=8g"
>> export SPARK_APP_NAME="Zeppelin notebook"
>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>> export HIVE_CONF_DIR=/etc/hive/conf
>> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
>> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
>> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/
>> cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
>> export MASTER="yarn-client"
>> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>>
>>
>>
>>
>> [2]
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in
>> <module>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 9, in <module>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>>
>> [3]
>> Also have correct symlinks in zeppelin_home/conf for
>> - hive-site.xml
>> - hdfs-site.xml
>> - core-site.xml
>> - yarn-site.xml
>>
>>
>>
>> Thank you,
>> Ruslan Dautkhanov
>>
>
>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Posted by Ruslan Dautkhanov <da...@gmail.com>.

I found a pattern when this happens.

When I run
sqlCtx = HiveContext(sc)

it works as expected.

Second and any time after that - gives that exception stack I reported in
this email chain.

> sqlCtx = HiveContext(sc)
> sqlCtx.sql('select * from marketview.spend_dim')

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
assembly
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 267, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 265, in <module>
exec(code)
File "<stdin>", line 2, in <module>
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
line 580, in sql
return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
line 683, in _ssql_ctx
self._scala_HiveContext = self._get_hive_ctx()
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
line 692, in _get_hive_ctx
return self._jvm.HiveContext(self._jsc.sc())
File
"/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 1064, in __call__
answer, self._gateway_client, None, self._fqn)
File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
line 45, in deco
return f(*a, **kw)

Key piece to reproduce this issue - assign HiveContext(sc) to a variable
more than once,
and use that variable between assignments.

-- 
Ruslan Dautkhanov

On Mon, Nov 21, 2016 at 2:52 PM, Ruslan Dautkhanov <da...@gmail.com>
wrote:

> Getting
> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
> See full stack [2] below.
>
> I'm using Spark 1.6 that comes with CDH 5.8.3.
> So it's definitely compiled with Hive.
> We use Jupyter notebooks without problems in the same environment.
>
> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
> apache.org
>
> Is Zeppelin compiled with Hive too? I guess so.
> Not sure what else is missing.
>
> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
> difference.
>
>
> [1]
> $ cat zeppelin-env.sh
> export JAVA_HOME=/usr/java/java7
> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
> spark.executor.memory=8g"
> export SPARK_APP_NAME="Zeppelin notebook"
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> export HIVE_CONF_DIR=/etc/hive/conf
> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/
> opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
> export MASTER="yarn-client"
> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>
>
>
>
> [2]
>
> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt
> assembly
> Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module>
> exec(code)
> File "<stdin>", line 9, in <module>
> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
> line 580, in sql
>
> [3]
> Also have correct symlinks in zeppelin_home/conf for
> - hive-site.xml
> - hdfs-site.xml
> - core-site.xml
> - yarn-site.xml
>
>
>
> Thank you,
> Ruslan Dautkhanov
>