You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@phoenix.apache.org by jm...@apache.org on 2015/12/21 17:24:31 UTC
svn commit: r1721207 - in /phoenix/site: publish/phoenix_spark.html source/src/site/markdown/phoenix_spark.md

Author: jmahonin
Date: Mon Dec 21 16:24:30 2015
New Revision: 1721207

URL: http://svn.apache.org/viewvc?rev=1721207&view=rev
Log:
Update phoenix_spark docs: new client-spark jar and pyspark

Modified:
    phoenix/site/publish/phoenix_spark.html
    phoenix/site/source/src/site/markdown/phoenix_spark.md

Modified: phoenix/site/publish/phoenix_spark.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/phoenix_spark.html?rev=1721207&r1=1721206&r2=1721207&view=diff
==============================================================================
--- phoenix/site/publish/phoenix_spark.html (original)
+++ phoenix/site/publish/phoenix_spark.html Mon Dec 21 16:24:30 2015
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2015-10-31
+ Generated by Apache Maven Doxia at 2015-12-21
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -150,7 +150,7 @@
    <h4 id="Prerequisites">Prerequisites</h4> 
    <ul> 
     <li>Phoenix 4.4.0+</li> 
-    <li>Spark 1.3.1+</li> 
+    <li>Spark 1.3.1+ (prebuilt with Hadoop 2.4 recommended)</li> 
    </ul> 
   </div> 
   <div class="section"> 
@@ -162,9 +162,8 @@
   <div class="section"> 
    <h4 id="Spark_setup">Spark setup</h4> 
    <ol style="list-style-type: decimal"> 
-    <li>Ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath for the Spark executors and drivers</li> 
-    <li>One method is to add the phoenix-4.4.0-client.jar to âSPARK_CLASSPATHâ in spark-env.sh, or setting both âspark.executor.extraClassPathâ and âspark.driver.extraClassPathâ in spark-defaults.conf</li> 
-    <li>To help your IDE, you may want to add the following âprovidedâ dependency:</li> 
+    <li>To ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath for the Spark executors and drivers, set both â<i>spark.executor.extraClassPath</i>â and â<i>spark.driver.extraClassPath</i>â in spark-defaults.conf to include the âphoenix-<i><tt>&lt;version&gt;</tt></i>-client-<b>spark</b>.jarâ Note that for Phoenix versions <tt>&lt;</tt> 4.7.0, you must use the âphoenix-<i><tt>&lt;version&gt;</tt></i>-client.jarâ</li> 
+    <li>Add the following dependency to your build:</li> 
    </ol> 
    <div class="source"> 
     <pre>&lt;dependency&gt;
@@ -292,7 +291,7 @@ CREATE TABLE OUTPUT_TABLE (id BIGINT NOT
    </div> 
    <div class="source"> 
     <pre>import org.apache.spark.SparkContext
-import org.apache.spark.sql.SQLContext
+import org.apache.phoenix.spark.sql._
 import org.apache.phoenix.spark._
 
 // Load INPUT_TABLE
@@ -307,6 +306,35 @@ df.save(&quot;org.apache.phoenix.spark&q
 </pre> 
    </div> 
   </div> 
+ </div> 
+ <div class="section"> 
+  <h3 id="PySpark">PySpark</h3> 
+  <p>With Sparkâs DataFrame support, you can also use <tt>pyspark</tt> to read and write from Phoenix tables.</p> 
+  <div class="section"> 
+   <h4 id="Load_a_DataFrame">Load a DataFrame</h4> 
+   <p>Given a table <i>TABLE1</i> and a Zookeeper url of <tt>localhost:2181</tt> you can load the table as a DataFrame using the following Python code in <tt>pyspark</tt></p> 
+   <div class="source"> 
+    <pre>df = sqlContext.read \
+  .format(&quot;org.apache.phoenix.spark&quot;) \
+  .option(&quot;table&quot;, &quot;TABLE1&quot;) \
+  .option(&quot;zkUrl&quot;, &quot;localhost:2181&quot;) \
+  .load()
+</pre> 
+   </div> 
+  </div> 
+  <div class="section"> 
+   <h4 id="Save_a_DataFrame">Save a DataFrame</h4> 
+   <p>Given the same table and Zookeeper URLs above, you can save a DataFrame to a Phoenix table using the following code</p> 
+   <div class="source"> 
+    <pre>df.write \
+  .format(&quot;org.apache.phoenix.spark&quot;) \
+  .mode(&quot;overwrite&quot;) \
+  .option(&quot;table&quot;, &quot;TABLE1&quot;) \
+  .option(&quot;zkUrl&quot;, &quot;localhost:2181&quot;) \
+  .save()
+</pre> 
+   </div> 
+  </div> 
  </div> 
  <div class="section"> 
   <h3 id="Notes">Notes</h3> 

Modified: phoenix/site/source/src/site/markdown/phoenix_spark.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/phoenix_spark.md?rev=1721207&r1=1721206&r2=1721207&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/phoenix_spark.md (original)
+++ phoenix/site/source/src/site/markdown/phoenix_spark.md Mon Dec 21 16:24:30 2015
@@ -6,7 +6,7 @@ as RDDs or DataFrames, and enables persi
 #### Prerequisites
 
 * Phoenix 4.4.0+
-* Spark 1.3.1+
+* Spark 1.3.1+ (prebuilt with Hadoop 2.4 recommended)
 
 #### Why not JDBC?
 
@@ -24,11 +24,11 @@ The choice of which method to use to acc
 
 #### Spark setup
 
-1. Ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath for the Spark executors and drivers
-2. One method is to add the phoenix-4.4.0-client.jar to 'SPARK_CLASSPATH' in spark-env.sh,
-or setting both 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath' in
-spark-defaults.conf
-3. To help your IDE, you may want to add the following 'provided' dependency:
+1. To ensure that all requisite Phoenix / HBase platform dependencies are available on the classpath 
+for the Spark executors and drivers, set both '_spark.executor.extraClassPath_' and 
+'_spark.driver.extraClassPath_' in spark-defaults.conf to include the 'phoenix-_`<version>`_-client-**spark**.jar'
+Note that for Phoenix versions `<` 4.7.0, you must use the 'phoenix-_`<version>`_-client.jar'
+2. Add the following dependency to your build:
 
 ```
 <dependency>
@@ -157,7 +157,7 @@ CREATE TABLE OUTPUT_TABLE (id BIGINT NOT
 
 ```scala
 import org.apache.spark.SparkContext
-import org.apache.spark.sql.SQLContext
+import org.apache.phoenix.spark.sql._
 import org.apache.phoenix.spark._
 
 // Load INPUT_TABLE
@@ -171,6 +171,38 @@ df.save("org.apache.phoenix.spark", Save
   "zkUrl" -> hbaseConnectionString))
 ```
 
+### PySpark
+
+With Spark's DataFrame support, you can also use `pyspark` to read and write from Phoenix tables.
+
+#### Load a DataFrame
+
+Given a table _TABLE1_ and a Zookeeper url of `localhost:2181` you can load the table as a
+DataFrame using the following Python code in `pyspark`
+
+```python
+df = sqlContext.read \
+  .format("org.apache.phoenix.spark") \
+  .option("table", "TABLE1") \
+  .option("zkUrl", "localhost:2181") \
+  .load()
+```
+
+#### Save a DataFrame
+
+Given the same table and Zookeeper URLs above, you can save a DataFrame to a Phoenix table
+using the following code
+
+```python
+df.write \
+  .format("org.apache.phoenix.spark") \
+  .mode("overwrite") \
+  .option("table", "TABLE1") \
+  .option("zkUrl", "localhost:2181") \
+  .save()
+```
+
+
 ### Notes
 
 The functions `phoenixTableAsDataFrame`, `phoenixTableAsRDD` and `saveToPhoenix` all support