You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Cox, Jonathan A" <ja...@sandia.gov> on 2015/12/09 06:08:23 UTC

Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

Alright, I reproduced what you did exactly, and it now works. The problem is that the Phoenix client JAR is not working correctly with the Spark builds that include Hadoop.

When I downloaded the Spark build with user provided Hadoop, and also installed Hadoop manually, Spark works with Phoenix correctly!

Thank you much,
Jonathan

Sent from my iPhone

On Dec 8, 2015, at 8:54 PM, Josh Mahonin <jm...@gmail.com>> wrote:

Hi Jonathan,

Spark only needs the client JAR. It contains all the other Phoenix dependencies as well.

I'm not sure exactly what the issue you're seeing is. I just downloaded and extracted fresh copies of Spark 1.5.2 (pre-built with user-provided Hadoop), and the latest Phoenix 4.6.0 binary release.

I copied the 'phoenix-4.6.0-HBase-1.1-client.jar' to /tmp and created a 'spark-defaults.conf' in the 'conf' folder of the Spark install with the following:

spark.executor.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
spark.driver.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar

I then launched the 'spark-shell', and was able to execute:

import org.apache.phoenix.spark._

>From there, you should be able to use the methods provided by the phoenix-spark integration within the Spark shell.

Good luck,

Josh

On Tue, Dec 8, 2015 at 8:51 PM, Cox, Jonathan A <ja...@sandia.gov>> wrote:
I am trying to get Spark up and running with Phoenix, but the installation instructions are not clear to me, or there is something else wrong. I’m using Spark 1.5.2, HBase 1.1.2 and Phoenix 4.6.0 with a standalone install (no HDFS or cluster) with Debian Linux 8 (Jessie) x64. I’m also using Java 1.8.0_40.

The instructions state:

1.       Ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath for the Spark executors and drivers

2.       One method is to add the phoenix-4.4.0-client.jar to ‘SPARK_CLASSPATH’ in spark-env.sh, or setting both ‘spark.executor.extraClassPath’ and ‘spark.driver.extraClassPath’ in spark-defaults.conf

First off, what are “all requisite Phoenix / HBase platform dependencies”? #2 suggests that all I need to do is add  ‘phoenix-4.6.0-HBase-1.1-client.jar’ to Spark’s class path. But what about ‘phoenix-spark-4.6.0-HBase-1.1.jar’ or ‘phoenix-core-4.6.0-HBase-1.1.jar’? Do either of these (or anything else) need to be added to Spark’s class path?

Secondly, if I follow the instructions exactly, and add only ‘phoenix-4.6.0-HBase-1.1-client.jar’ to ‘spark-defaults.conf’:
spark.executor.extraClassPath   /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
spark.driver.extraClassPath     /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
Then I get the following error when starting the interactive Spark shell with ‘spark-shell’:
15/12/08 18:38:05 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/12/08 18:38:05 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/12/08 18:38:05 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
                at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
…

<console>:10: error: not found: value sqlContext
       import sqlContext.implicits._
              ^
<console>:10: error: not found: value sqlContext
       import sqlContext.sql

On the other hand, if I include all three of the aforementioned JARs, I get the same error. However, if I include only the ‘phoenix-spark-4.6.0-HBase-1.1.jar’, spark-shell seems so launch without error. Nevertheless, if I then try the simple tutorial commands in spark-shell, I get the following:
Spark output: SQL context available as sqlContext.

scala >> import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.phoenix.spark._

                                val sqlContext = new SQLContext(sc)

                                val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "phoenix-server:2181")

                Spark error:
                                java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
                at org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:71)
                at org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:39)
                at org.apache.phoenix.spark.PhoenixRDD.phoenixConf(PhoenixRDD.scala:38)
                at org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:42)
                at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
                at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
                at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)

This final error seems similar to the one in mailing list post Phoenix-spark : NoClassDefFoundError: HBaseConfiguration<http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E> < http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>. But the question does not seem to have been answered satisfactory. Also note, if I include all three JARs, as he did, I get an error when launching spark-shell.

Can you please clarify what is the proper way to install and configure Phoenix with Spark?

Sincerely,
Jonathan


Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

Posted by Josh Mahonin <jm...@gmail.com>.
Definitely. I'd like to dig into what the root cause is, but it might be
optimistic to think I'll be able to get to that any time soon.

I'll try get the docs updated today.

On Wed, Dec 9, 2015 at 1:09 PM, James Taylor <ja...@apache.org> wrote:

> Would it make sense to tweak the Spark installation instructions slightly
> with this information, Josh?
>
> On Wed, Dec 9, 2015 at 9:11 AM, Cox, Jonathan A <ja...@sandia.gov> wrote:
>
>> Josh,
>>
>>
>>
>> Previously, I was using the SPARK_CLASSPATH, but then read that it was
>> deprecated and switched to the spark-defaults.conf file. The result was the
>> same.
>>
>>
>>
>> Also, I was using ‘spark-1.5.2-bin-hadoop2.6.tgz’, which includes some
>> Hadoop 2.6 JARs. This caused the trouble. However, by separately
>> downloading Hadoop 2.6 and Spark without Hadoop, the errors went away.
>>
>>
>>
>> -Jonathan
>>
>>
>>
>> *From:* Josh Mahonin [mailto:jmahonin@gmail.com]
>> *Sent:* Wednesday, December 09, 2015 5:57 AM
>> *To:* user@phoenix.apache.org
>> *Subject:* Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin
>> / Various Errors
>>
>>
>>
>> Hi Jonathan,
>>
>>
>>
>> Thanks for the information. If you're able, could you also try the
>> 'SPARK_CLASSPATH' environment variable instead of the spark-defaults.conf
>> setting, and let us know if that works? Also the exact Spark package you're
>> using would be helpful as well (from source, prebuilt for 2.6+, 2.4+, CDH,
>> etc.)
>>
>> Thanks,
>>
>>
>>
>> Josh
>>
>>
>>
>> On Wed, Dec 9, 2015 at 12:08 AM, Cox, Jonathan A <ja...@sandia.gov>
>> wrote:
>>
>> Alright, I reproduced what you did exactly, and it now works. The problem
>> is that the Phoenix client JAR is not working correctly with the Spark
>> builds that include Hadoop.
>>
>>
>>
>> When I downloaded the Spark build with user provided Hadoop, and also
>> installed Hadoop manually, Spark works with Phoenix correctly!
>>
>>
>>
>> Thank you much,
>>
>> Jonathan
>>
>> Sent from my iPhone
>>
>>
>> On Dec 8, 2015, at 8:54 PM, Josh Mahonin <jm...@gmail.com> wrote:
>>
>> Hi Jonathan,
>>
>>
>>
>> Spark only needs the client JAR. It contains all the other Phoenix
>> dependencies as well.
>>
>>
>>
>> I'm not sure exactly what the issue you're seeing is. I just downloaded
>> and extracted fresh copies of Spark 1.5.2 (pre-built with user-provided
>> Hadoop), and the latest Phoenix 4.6.0 binary release.
>>
>>
>>
>> I copied the 'phoenix-4.6.0-HBase-1.1-client.jar' to /tmp and created a
>> 'spark-defaults.conf' in the 'conf' folder of the Spark install with the
>> following:
>>
>>
>> spark.executor.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
>>
>> spark.driver.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
>>
>> I then launched the 'spark-shell', and was able to execute:
>>
>> import org.apache.phoenix.spark._
>>
>>
>>
>> From there, you should be able to use the methods provided by the
>> phoenix-spark integration within the Spark shell.
>>
>>
>>
>> Good luck,
>>
>>
>>
>> Josh
>>
>>
>>
>> On Tue, Dec 8, 2015 at 8:51 PM, Cox, Jonathan A <ja...@sandia.gov> wrote:
>>
>> I am trying to get Spark up and running with Phoenix, but the
>> installation instructions are not clear to me, or there is something else
>> wrong. I’m using Spark 1.5.2, HBase 1.1.2 and Phoenix 4.6.0 with a
>> standalone install (no HDFS or cluster) with Debian Linux 8 (Jessie) x64.
>> I’m also using Java 1.8.0_40.
>>
>>
>>
>> The instructions state:
>>
>> 1.       Ensure that all requisite Phoenix / HBase platform dependencies
>> are available on the classpath for the Spark executors and drivers
>>
>> 2.       One method is to add the phoenix-4.4.0-client.jar to
>> ‘SPARK_CLASSPATH’ in spark-env.sh, or setting both
>> ‘spark.executor.extraClassPath’ and ‘spark.driver.extraClassPath’ in
>> spark-defaults.conf
>>
>>
>>
>> *First off, what are “all requisite Phoenix / HBase platform
>> dependencies”?* #2 suggests that all I need to do is add
>>  ‘phoenix-4.6.0-HBase-1.1-client.jar’ to Spark’s class path. But what about
>> ‘phoenix-spark-4.6.0-HBase-1.1.jar’ or ‘phoenix-core-4.6.0-HBase-1.1.jar’?
>> Do either of these (or anything else) need to be added to Spark’s class
>> path?
>>
>>
>>
>> Secondly, if I follow the instructions exactly, and add only
>> ‘phoenix-4.6.0-HBase-1.1-client.jar’ to ‘spark-defaults.conf’:
>>
>> spark.executor.extraClassPath
>> /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
>>
>> spark.driver.extraClassPath
>> /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
>>
>> Then I get the following error when starting the interactive Spark shell
>> with ‘spark-shell’:
>>
>> 15/12/08 18:38:05 WARN ObjectStore: Version information not found in
>> metastore. hive.metastore.schema.verification is not enabled so recording
>> the schema version 1.2.0
>>
>> 15/12/08 18:38:05 WARN ObjectStore: Failed to get database default,
>> returning NoSuchObjectException
>>
>> 15/12/08 18:38:05 WARN Hive: Failed to access metastore. This class
>> should not accessed in runtime.
>>
>> org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.RuntimeException: Unable to instantiate
>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>
>>                 at
>> org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
>>
>> …
>>
>>
>>
>> <console>:10: error: not found: value sqlContext
>>
>>        import sqlContext.implicits._
>>
>>               ^
>>
>> <console>:10: error: not found: value sqlContext
>>
>>        import sqlContext.sql
>>
>>
>>
>> On the other hand, if I include all three of the aforementioned JARs, I
>> get the same error. However, *if I include only the
>> ‘phoenix-spark-4.6.0-HBase-1.1.jar’*, spark-shell seems so launch
>> without error. Nevertheless, if I then try the simple tutorial commands in
>> spark-shell, I get the following:
>>
>> *Spark output:* SQL context available as sqlContext.
>>
>>
>>
>> *scala >>* import org.apache.spark.SparkContext
>>
>> import org.apache.spark.sql.SQLContext
>>
>> import org.apache.phoenix.spark._
>>
>>
>>
>>                                 val sqlContext = new SQLContext(sc)
>>
>>
>>
>>                                 val df =
>> sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1",
>> "zkUrl" -> "phoenix-server:2181")
>>
>>
>>
>>                 *Spark error:*
>>
>>                                 *java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hbase/HBaseConfiguration*
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:71)
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:39)
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRDD.phoenixConf(PhoenixRDD.scala:38)
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:42)
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
>>
>>                 at
>> org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
>>
>>                 at
>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
>>
>>
>>
>> This final error seems similar to the one in mailing list post Phoenix-spark
>> : NoClassDefFoundError: HBaseConfiguration
>> <http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>
>> <
>> http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>.
>> But the question does not seem to have been answered satisfactory. Also
>> note, if I include all three JARs, as he did, I get an error when launching
>> spark-shell.
>>
>>
>>
>> *Can you please clarify what is the proper way to install and configure
>> Phoenix with Spark?*
>>
>>
>>
>> Sincerely,
>>
>> Jonathan
>>
>>
>>
>>
>>
>
>

Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

Posted by James Taylor <ja...@apache.org>.
Would it make sense to tweak the Spark installation instructions slightly
with this information, Josh?

On Wed, Dec 9, 2015 at 9:11 AM, Cox, Jonathan A <ja...@sandia.gov> wrote:

> Josh,
>
>
>
> Previously, I was using the SPARK_CLASSPATH, but then read that it was
> deprecated and switched to the spark-defaults.conf file. The result was the
> same.
>
>
>
> Also, I was using ‘spark-1.5.2-bin-hadoop2.6.tgz’, which includes some
> Hadoop 2.6 JARs. This caused the trouble. However, by separately
> downloading Hadoop 2.6 and Spark without Hadoop, the errors went away.
>
>
>
> -Jonathan
>
>
>
> *From:* Josh Mahonin [mailto:jmahonin@gmail.com]
> *Sent:* Wednesday, December 09, 2015 5:57 AM
> *To:* user@phoenix.apache.org
> *Subject:* Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin /
> Various Errors
>
>
>
> Hi Jonathan,
>
>
>
> Thanks for the information. If you're able, could you also try the
> 'SPARK_CLASSPATH' environment variable instead of the spark-defaults.conf
> setting, and let us know if that works? Also the exact Spark package you're
> using would be helpful as well (from source, prebuilt for 2.6+, 2.4+, CDH,
> etc.)
>
> Thanks,
>
>
>
> Josh
>
>
>
> On Wed, Dec 9, 2015 at 12:08 AM, Cox, Jonathan A <ja...@sandia.gov> wrote:
>
> Alright, I reproduced what you did exactly, and it now works. The problem
> is that the Phoenix client JAR is not working correctly with the Spark
> builds that include Hadoop.
>
>
>
> When I downloaded the Spark build with user provided Hadoop, and also
> installed Hadoop manually, Spark works with Phoenix correctly!
>
>
>
> Thank you much,
>
> Jonathan
>
> Sent from my iPhone
>
>
> On Dec 8, 2015, at 8:54 PM, Josh Mahonin <jm...@gmail.com> wrote:
>
> Hi Jonathan,
>
>
>
> Spark only needs the client JAR. It contains all the other Phoenix
> dependencies as well.
>
>
>
> I'm not sure exactly what the issue you're seeing is. I just downloaded
> and extracted fresh copies of Spark 1.5.2 (pre-built with user-provided
> Hadoop), and the latest Phoenix 4.6.0 binary release.
>
>
>
> I copied the 'phoenix-4.6.0-HBase-1.1-client.jar' to /tmp and created a
> 'spark-defaults.conf' in the 'conf' folder of the Spark install with the
> following:
>
>
> spark.executor.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
>
> spark.driver.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
>
> I then launched the 'spark-shell', and was able to execute:
>
> import org.apache.phoenix.spark._
>
>
>
> From there, you should be able to use the methods provided by the
> phoenix-spark integration within the Spark shell.
>
>
>
> Good luck,
>
>
>
> Josh
>
>
>
> On Tue, Dec 8, 2015 at 8:51 PM, Cox, Jonathan A <ja...@sandia.gov> wrote:
>
> I am trying to get Spark up and running with Phoenix, but the installation
> instructions are not clear to me, or there is something else wrong. I’m
> using Spark 1.5.2, HBase 1.1.2 and Phoenix 4.6.0 with a standalone install
> (no HDFS or cluster) with Debian Linux 8 (Jessie) x64. I’m also using Java
> 1.8.0_40.
>
>
>
> The instructions state:
>
> 1.       Ensure that all requisite Phoenix / HBase platform dependencies
> are available on the classpath for the Spark executors and drivers
>
> 2.       One method is to add the phoenix-4.4.0-client.jar to
> ‘SPARK_CLASSPATH’ in spark-env.sh, or setting both
> ‘spark.executor.extraClassPath’ and ‘spark.driver.extraClassPath’ in
> spark-defaults.conf
>
>
>
> *First off, what are “all requisite Phoenix / HBase platform
> dependencies”?* #2 suggests that all I need to do is add
>  ‘phoenix-4.6.0-HBase-1.1-client.jar’ to Spark’s class path. But what about
> ‘phoenix-spark-4.6.0-HBase-1.1.jar’ or ‘phoenix-core-4.6.0-HBase-1.1.jar’?
> Do either of these (or anything else) need to be added to Spark’s class
> path?
>
>
>
> Secondly, if I follow the instructions exactly, and add only
> ‘phoenix-4.6.0-HBase-1.1-client.jar’ to ‘spark-defaults.conf’:
>
> spark.executor.extraClassPath
> /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
>
> spark.driver.extraClassPath
> /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
>
> Then I get the following error when starting the interactive Spark shell
> with ‘spark-shell’:
>
> 15/12/08 18:38:05 WARN ObjectStore: Version information not found in
> metastore. hive.metastore.schema.verification is not enabled so recording
> the schema version 1.2.0
>
> 15/12/08 18:38:05 WARN ObjectStore: Failed to get database default,
> returning NoSuchObjectException
>
> 15/12/08 18:38:05 WARN Hive: Failed to access metastore. This class should
> not accessed in runtime.
>
> org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>
>                 at
> org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
>
> …
>
>
>
> <console>:10: error: not found: value sqlContext
>
>        import sqlContext.implicits._
>
>               ^
>
> <console>:10: error: not found: value sqlContext
>
>        import sqlContext.sql
>
>
>
> On the other hand, if I include all three of the aforementioned JARs, I
> get the same error. However, *if I include only the
> ‘phoenix-spark-4.6.0-HBase-1.1.jar’*, spark-shell seems so launch without
> error. Nevertheless, if I then try the simple tutorial commands in
> spark-shell, I get the following:
>
> *Spark output:* SQL context available as sqlContext.
>
>
>
> *scala >>* import org.apache.spark.SparkContext
>
> import org.apache.spark.sql.SQLContext
>
> import org.apache.phoenix.spark._
>
>
>
>                                 val sqlContext = new SQLContext(sc)
>
>
>
>                                 val df =
> sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1",
> "zkUrl" -> "phoenix-server:2181")
>
>
>
>                 *Spark error:*
>
>                                 *java.lang.NoClassDefFoundError:
> org/apache/hadoop/hbase/HBaseConfiguration*
>
>                 at
> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:71)
>
>                 at
> org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:39)
>
>                 at
> org.apache.phoenix.spark.PhoenixRDD.phoenixConf(PhoenixRDD.scala:38)
>
>                 at
> org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:42)
>
>                 at
> org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
>
>                 at
> org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
>
>                 at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
>
>
>
> This final error seems similar to the one in mailing list post Phoenix-spark
> : NoClassDefFoundError: HBaseConfiguration
> <http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>
> <
> http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>.
> But the question does not seem to have been answered satisfactory. Also
> note, if I include all three JARs, as he did, I get an error when launching
> spark-shell.
>
>
>
> *Can you please clarify what is the proper way to install and configure
> Phoenix with Spark?*
>
>
>
> Sincerely,
>
> Jonathan
>
>
>
>
>

RE: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

Posted by "Cox, Jonathan A" <ja...@sandia.gov>.
Josh,

Previously, I was using the SPARK_CLASSPATH, but then read that it was deprecated and switched to the spark-defaults.conf file. The result was the same.

Also, I was using ‘spark-1.5.2-bin-hadoop2.6.tgz’, which includes some Hadoop 2.6 JARs. This caused the trouble. However, by separately downloading Hadoop 2.6 and Spark without Hadoop, the errors went away.

-Jonathan

From: Josh Mahonin [mailto:jmahonin@gmail.com]
Sent: Wednesday, December 09, 2015 5:57 AM
To: user@phoenix.apache.org
Subject: Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

Hi Jonathan,

Thanks for the information. If you're able, could you also try the 'SPARK_CLASSPATH' environment variable instead of the spark-defaults.conf setting, and let us know if that works? Also the exact Spark package you're using would be helpful as well (from source, prebuilt for 2.6+, 2.4+, CDH, etc.)

Thanks,

Josh

On Wed, Dec 9, 2015 at 12:08 AM, Cox, Jonathan A <ja...@sandia.gov>> wrote:
Alright, I reproduced what you did exactly, and it now works. The problem is that the Phoenix client JAR is not working correctly with the Spark builds that include Hadoop.

When I downloaded the Spark build with user provided Hadoop, and also installed Hadoop manually, Spark works with Phoenix correctly!

Thank you much,
Jonathan

Sent from my iPhone

On Dec 8, 2015, at 8:54 PM, Josh Mahonin <jm...@gmail.com>> wrote:
Hi Jonathan,

Spark only needs the client JAR. It contains all the other Phoenix dependencies as well.

I'm not sure exactly what the issue you're seeing is. I just downloaded and extracted fresh copies of Spark 1.5.2 (pre-built with user-provided Hadoop), and the latest Phoenix 4.6.0 binary release.

I copied the 'phoenix-4.6.0-HBase-1.1-client.jar' to /tmp and created a 'spark-defaults.conf' in the 'conf' folder of the Spark install with the following:

spark.executor.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
spark.driver.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar

I then launched the 'spark-shell', and was able to execute:

import org.apache.phoenix.spark._

From there, you should be able to use the methods provided by the phoenix-spark integration within the Spark shell.

Good luck,

Josh

On Tue, Dec 8, 2015 at 8:51 PM, Cox, Jonathan A <ja...@sandia.gov>> wrote:
I am trying to get Spark up and running with Phoenix, but the installation instructions are not clear to me, or there is something else wrong. I’m using Spark 1.5.2, HBase 1.1.2 and Phoenix 4.6.0 with a standalone install (no HDFS or cluster) with Debian Linux 8 (Jessie) x64. I’m also using Java 1.8.0_40.

The instructions state:

1.       Ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath for the Spark executors and drivers

2.       One method is to add the phoenix-4.4.0-client.jar to ‘SPARK_CLASSPATH’ in spark-env.sh, or setting both ‘spark.executor.extraClassPath’ and ‘spark.driver.extraClassPath’ in spark-defaults.conf

First off, what are “all requisite Phoenix / HBase platform dependencies”? #2 suggests that all I need to do is add  ‘phoenix-4.6.0-HBase-1.1-client.jar’ to Spark’s class path. But what about ‘phoenix-spark-4.6.0-HBase-1.1.jar’ or ‘phoenix-core-4.6.0-HBase-1.1.jar’? Do either of these (or anything else) need to be added to Spark’s class path?

Secondly, if I follow the instructions exactly, and add only ‘phoenix-4.6.0-HBase-1.1-client.jar’ to ‘spark-defaults.conf’:
spark.executor.extraClassPath   /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
spark.driver.extraClassPath     /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
Then I get the following error when starting the interactive Spark shell with ‘spark-shell’:
15/12/08 18:38:05 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/12/08 18:38:05 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/12/08 18:38:05 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
                at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
…

<console>:10: error: not found: value sqlContext
       import sqlContext.implicits._
              ^
<console>:10: error: not found: value sqlContext
       import sqlContext.sql

On the other hand, if I include all three of the aforementioned JARs, I get the same error. However, if I include only the ‘phoenix-spark-4.6.0-HBase-1.1.jar’, spark-shell seems so launch without error. Nevertheless, if I then try the simple tutorial commands in spark-shell, I get the following:
Spark output: SQL context available as sqlContext.

scala >> import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.phoenix.spark._

                                val sqlContext = new SQLContext(sc)

                                val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "phoenix-server:2181")

                Spark error:
                                java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
                at org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:71)
                at org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:39)
                at org.apache.phoenix.spark.PhoenixRDD.phoenixConf(PhoenixRDD.scala:38)
                at org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:42)
                at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
                at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
                at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)

This final error seems similar to the one in mailing list post Phoenix-spark : NoClassDefFoundError: HBaseConfiguration<http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E> < http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>. But the question does not seem to have been answered satisfactory. Also note, if I include all three JARs, as he did, I get an error when launching spark-shell.

Can you please clarify what is the proper way to install and configure Phoenix with Spark?

Sincerely,
Jonathan



Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

Posted by Josh Mahonin <jm...@gmail.com>.
Hi Jonathan,

Thanks for the information. If you're able, could you also try the
'SPARK_CLASSPATH' environment variable instead of the spark-defaults.conf
setting, and let us know if that works? Also the exact Spark package you're
using would be helpful as well (from source, prebuilt for 2.6+, 2.4+, CDH,
etc.)

Thanks,

Josh

On Wed, Dec 9, 2015 at 12:08 AM, Cox, Jonathan A <ja...@sandia.gov> wrote:

> Alright, I reproduced what you did exactly, and it now works. The problem
> is that the Phoenix client JAR is not working correctly with the Spark
> builds that include Hadoop.
>
> When I downloaded the Spark build with user provided Hadoop, and also
> installed Hadoop manually, Spark works with Phoenix correctly!
>
> Thank you much,
> Jonathan
>
> Sent from my iPhone
>
> On Dec 8, 2015, at 8:54 PM, Josh Mahonin <jm...@gmail.com> wrote:
>
> Hi Jonathan,
>
> Spark only needs the client JAR. It contains all the other Phoenix
> dependencies as well.
>
> I'm not sure exactly what the issue you're seeing is. I just downloaded
> and extracted fresh copies of Spark 1.5.2 (pre-built with user-provided
> Hadoop), and the latest Phoenix 4.6.0 binary release.
>
> I copied the 'phoenix-4.6.0-HBase-1.1-client.jar' to /tmp and created a
> 'spark-defaults.conf' in the 'conf' folder of the Spark install with the
> following:
>
> spark.executor.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
> spark.driver.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
>
> I then launched the 'spark-shell', and was able to execute:
>
> import org.apache.phoenix.spark._
>
> From there, you should be able to use the methods provided by the
> phoenix-spark integration within the Spark shell.
>
> Good luck,
>
> Josh
>
> On Tue, Dec 8, 2015 at 8:51 PM, Cox, Jonathan A <ja...@sandia.gov> wrote:
>
>> I am trying to get Spark up and running with Phoenix, but the
>> installation instructions are not clear to me, or there is something else
>> wrong. I’m using Spark 1.5.2, HBase 1.1.2 and Phoenix 4.6.0 with a
>> standalone install (no HDFS or cluster) with Debian Linux 8 (Jessie) x64.
>> I’m also using Java 1.8.0_40.
>>
>>
>>
>> The instructions state:
>>
>> 1.       Ensure that all requisite Phoenix / HBase platform dependencies
>> are available on the classpath for the Spark executors and drivers
>>
>> 2.       One method is to add the phoenix-4.4.0-client.jar to
>> ‘SPARK_CLASSPATH’ in spark-env.sh, or setting both
>> ‘spark.executor.extraClassPath’ and ‘spark.driver.extraClassPath’ in
>> spark-defaults.conf
>>
>>
>>
>> *First off, what are “all requisite Phoenix / HBase platform
>> dependencies”?* #2 suggests that all I need to do is add
>>  ‘phoenix-4.6.0-HBase-1.1-client.jar’ to Spark’s class path. But what about
>> ‘phoenix-spark-4.6.0-HBase-1.1.jar’ or ‘phoenix-core-4.6.0-HBase-1.1.jar’?
>> Do either of these (or anything else) need to be added to Spark’s class
>> path?
>>
>>
>>
>> Secondly, if I follow the instructions exactly, and add only
>> ‘phoenix-4.6.0-HBase-1.1-client.jar’ to ‘spark-defaults.conf’:
>>
>> spark.executor.extraClassPath
>> /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
>>
>> spark.driver.extraClassPath
>> /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
>>
>> Then I get the following error when starting the interactive Spark shell
>> with ‘spark-shell’:
>>
>> 15/12/08 18:38:05 WARN ObjectStore: Version information not found in
>> metastore. hive.metastore.schema.verification is not enabled so recording
>> the schema version 1.2.0
>>
>> 15/12/08 18:38:05 WARN ObjectStore: Failed to get database default,
>> returning NoSuchObjectException
>>
>> 15/12/08 18:38:05 WARN Hive: Failed to access metastore. This class
>> should not accessed in runtime.
>>
>> org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.RuntimeException: Unable to instantiate
>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>
>>                 at
>> org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
>>
>> …
>>
>>
>>
>> <console>:10: error: not found: value sqlContext
>>
>>        import sqlContext.implicits._
>>
>>               ^
>>
>> <console>:10: error: not found: value sqlContext
>>
>>        import sqlContext.sql
>>
>>
>>
>> On the other hand, if I include all three of the aforementioned JARs, I
>> get the same error. However, *if I include only the
>> ‘phoenix-spark-4.6.0-HBase-1.1.jar’*, spark-shell seems so launch
>> without error. Nevertheless, if I then try the simple tutorial commands in
>> spark-shell, I get the following:
>>
>> *Spark output:* SQL context available as sqlContext.
>>
>>
>>
>> *scala >>* import org.apache.spark.SparkContext
>>
>> import org.apache.spark.sql.SQLContext
>>
>> import org.apache.phoenix.spark._
>>
>>
>>
>>                                 val sqlContext = new SQLContext(sc)
>>
>>
>>
>>                                 val df =
>> sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1",
>> "zkUrl" -> "phoenix-server:2181")
>>
>>
>>
>>                 *Spark error:*
>>
>>                                 *java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hbase/HBaseConfiguration*
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:71)
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:39)
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRDD.phoenixConf(PhoenixRDD.scala:38)
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:42)
>>
>>                 at
>> org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
>>
>>                 at
>> org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
>>
>>                 at
>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
>>
>>
>>
>> This final error seems similar to the one in mailing list post Phoenix-spark
>> : NoClassDefFoundError: HBaseConfiguration
>> <http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>
>> <
>> http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>.
>> But the question does not seem to have been answered satisfactory. Also
>> note, if I include all three JARs, as he did, I get an error when launching
>> spark-shell.
>>
>>
>>
>> *Can you please clarify what is the proper way to install and configure
>> Phoenix with Spark?*
>>
>>
>>
>> Sincerely,
>>
>> Jonathan
>>
>
>