You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@phoenix.apache.org by jm...@apache.org on 2015/12/21 17:24:31 UTC
svn commit: r1721207 - in /phoenix/site: publish/phoenix_spark.html
source/src/site/markdown/phoenix_spark.md
Author: jmahonin
Date: Mon Dec 21 16:24:30 2015
New Revision: 1721207
URL: http://svn.apache.org/viewvc?rev=1721207&view=rev
Log:
Update phoenix_spark docs: new client-spark jar and pyspark
Modified:
phoenix/site/publish/phoenix_spark.html
phoenix/site/source/src/site/markdown/phoenix_spark.md
Modified: phoenix/site/publish/phoenix_spark.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/phoenix_spark.html?rev=1721207&r1=1721206&r2=1721207&view=diff
==============================================================================
--- phoenix/site/publish/phoenix_spark.html (original)
+++ phoenix/site/publish/phoenix_spark.html Mon Dec 21 16:24:30 2015
@@ -1,7 +1,7 @@
<!DOCTYPE html>
<!--
- Generated by Apache Maven Doxia at 2015-10-31
+ Generated by Apache Maven Doxia at 2015-12-21
Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
-->
<html xml:lang="en" lang="en">
@@ -150,7 +150,7 @@
<h4 id="Prerequisites">Prerequisites</h4>
<ul>
<li>Phoenix 4.4.0+</li>
- <li>Spark 1.3.1+</li>
+ <li>Spark 1.3.1+ (prebuilt with Hadoop 2.4 recommended)</li>
</ul>
</div>
<div class="section">
@@ -162,9 +162,8 @@
<div class="section">
<h4 id="Spark_setup">Spark setup</h4>
<ol style="list-style-type: decimal">
- <li>Ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath for the Spark executors and drivers</li>
- <li>One method is to add the phoenix-4.4.0-client.jar to âSPARK_CLASSPATHâ in spark-env.sh, or setting both âspark.executor.extraClassPathâ and âspark.driver.extraClassPathâ in spark-defaults.conf</li>
- <li>To help your IDE, you may want to add the following âprovidedâ dependency:</li>
+ <li>To ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath for the Spark executors and drivers, set both â<i>spark.executor.extraClassPath</i>â and â<i>spark.driver.extraClassPath</i>â in spark-defaults.conf to include the âphoenix-<i><tt><version></tt></i>-client-<b>spark</b>.jarâ Note that for Phoenix versions <tt><</tt> 4.7.0, you must use the âphoenix-<i><tt><version></tt></i>-client.jarâ</li>
+ <li>Add the following dependency to your build:</li>
</ol>
<div class="source">
<pre><dependency>
@@ -292,7 +291,7 @@ CREATE TABLE OUTPUT_TABLE (id BIGINT NOT
</div>
<div class="source">
<pre>import org.apache.spark.SparkContext
-import org.apache.spark.sql.SQLContext
+import org.apache.phoenix.spark.sql._
import org.apache.phoenix.spark._
// Load INPUT_TABLE
@@ -307,6 +306,35 @@ df.save("org.apache.phoenix.spark&q
</pre>
</div>
</div>
+ </div>
+ <div class="section">
+ <h3 id="PySpark">PySpark</h3>
+ <p>With Sparkâs DataFrame support, you can also use <tt>pyspark</tt> to read and write from Phoenix tables.</p>
+ <div class="section">
+ <h4 id="Load_a_DataFrame">Load a DataFrame</h4>
+ <p>Given a table <i>TABLE1</i> and a Zookeeper url of <tt>localhost:2181</tt> you can load the table as a DataFrame using the following Python code in <tt>pyspark</tt></p>
+ <div class="source">
+ <pre>df = sqlContext.read \
+ .format("org.apache.phoenix.spark") \
+ .option("table", "TABLE1") \
+ .option("zkUrl", "localhost:2181") \
+ .load()
+</pre>
+ </div>
+ </div>
+ <div class="section">
+ <h4 id="Save_a_DataFrame">Save a DataFrame</h4>
+ <p>Given the same table and Zookeeper URLs above, you can save a DataFrame to a Phoenix table using the following code</p>
+ <div class="source">
+ <pre>df.write \
+ .format("org.apache.phoenix.spark") \
+ .mode("overwrite") \
+ .option("table", "TABLE1") \
+ .option("zkUrl", "localhost:2181") \
+ .save()
+</pre>
+ </div>
+ </div>
</div>
<div class="section">
<h3 id="Notes">Notes</h3>
Modified: phoenix/site/source/src/site/markdown/phoenix_spark.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/phoenix_spark.md?rev=1721207&r1=1721206&r2=1721207&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/phoenix_spark.md (original)
+++ phoenix/site/source/src/site/markdown/phoenix_spark.md Mon Dec 21 16:24:30 2015
@@ -6,7 +6,7 @@ as RDDs or DataFrames, and enables persi
#### Prerequisites
* Phoenix 4.4.0+
-* Spark 1.3.1+
+* Spark 1.3.1+ (prebuilt with Hadoop 2.4 recommended)
#### Why not JDBC?
@@ -24,11 +24,11 @@ The choice of which method to use to acc
#### Spark setup
-1. Ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath for the Spark executors and drivers
-2. One method is to add the phoenix-4.4.0-client.jar to 'SPARK_CLASSPATH' in spark-env.sh,
-or setting both 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath' in
-spark-defaults.conf
-3. To help your IDE, you may want to add the following 'provided' dependency:
+1. To ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath
+for the Spark executors and drivers, set both '_spark.executor.extraClassPath_' and
+'_spark.driver.extraClassPath_' in spark-defaults.conf to include the 'phoenix-_`<version>`_-client-**spark**.jar'
+Note that for Phoenix versions `<` 4.7.0, you must use the 'phoenix-_`<version>`_-client.jar'
+2. Add the following dependency to your build:
```
<dependency>
@@ -157,7 +157,7 @@ CREATE TABLE OUTPUT_TABLE (id BIGINT NOT
```scala
import org.apache.spark.SparkContext
-import org.apache.spark.sql.SQLContext
+import org.apache.phoenix.spark.sql._
import org.apache.phoenix.spark._
// Load INPUT_TABLE
@@ -171,6 +171,38 @@ df.save("org.apache.phoenix.spark", Save
"zkUrl" -> hbaseConnectionString))
```
+### PySpark
+
+With Spark's DataFrame support, you can also use `pyspark` to read and write from Phoenix tables.
+
+#### Load a DataFrame
+
+Given a table _TABLE1_ and a Zookeeper url of `localhost:2181` you can load the table as a
+DataFrame using the following Python code in `pyspark`
+
+```python
+df = sqlContext.read \
+ .format("org.apache.phoenix.spark") \
+ .option("table", "TABLE1") \
+ .option("zkUrl", "localhost:2181") \
+ .load()
+```
+
+#### Save a DataFrame
+
+Given the same table and Zookeeper URLs above, you can save a DataFrame to a Phoenix table
+using the following code
+
+```python
+df.write \
+ .format("org.apache.phoenix.spark") \
+ .mode("overwrite") \
+ .option("table", "TABLE1") \
+ .option("zkUrl", "localhost:2181") \
+ .save()
+```
+
+
### Notes
The functions `phoenixTableAsDataFrame`, `phoenixTableAsRDD` and `saveToPhoenix` all support