You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by me...@apache.org on 2015/05/18 17:46:36 UTC

spark git commit: [SPARK-7272] [MLLIB] User guide for PMML model export

Repository: spark
Updated Branches:
  refs/heads/master 1ecfac6e3 -> 814b3dabd


[SPARK-7272] [MLLIB] User guide for PMML model export

https://issues.apache.org/jira/browse/SPARK-7272

Author: Vincenzo Selvaggio <vs...@hotmail.it>

Closes #6219 from selvinsource/mllib_pmml_model_export_SPARK-7272 and squashes the following commits:

c866fb8 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
1beda98 [Vincenzo Selvaggio] [SPARK-7272] Initial user guide for pmml export
d670662 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
2731375 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
680dc33 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
2e298b5 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
a932f51 [Vincenzo Selvaggio] Create mllib-pmml-model-export.md


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/814b3dab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/814b3dab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/814b3dab

Branch: refs/heads/master
Commit: 814b3dabdf01abc7a2f25aa32284caccadeb7798
Parents: 1ecfac6
Author: Vincenzo Selvaggio <vs...@hotmail.it>
Authored: Mon May 18 08:46:33 2015 -0700
Committer: Xiangrui Meng <me...@databricks.com>
Committed: Mon May 18 08:46:33 2015 -0700

----------------------------------------------------------------------
 docs/mllib-guide.md             |  1 +
 docs/mllib-pmml-model-export.md | 86 ++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/814b3dab/docs/mllib-guide.md
----------------------------------------------------------------------
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index f8e8794..de7d66f 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -39,6 +39,7 @@ filtering, dimensionality reduction, as well as underlying optimization primitiv
 * [Optimization (developer)](mllib-optimization.html)
   * stochastic gradient descent
   * limited-memory BFGS (L-BFGS)
+* [PMML model export](mllib-pmml-model-export.html)
 
 MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future releases, 

http://git-wip-us.apache.org/repos/asf/spark/blob/814b3dab/docs/mllib-pmml-model-export.md
----------------------------------------------------------------------
diff --git a/docs/mllib-pmml-model-export.md b/docs/mllib-pmml-model-export.md
new file mode 100644
index 0000000..42ea2ca
--- /dev/null
+++ b/docs/mllib-pmml-model-export.md
@@ -0,0 +1,86 @@
+---
+layout: global
+title: PMML model export - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - PMML model export
+---
+
+* Table of contents
+{:toc}
+
+## MLlib supported models
+
+MLlib supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)).
+
+The table below outlines the MLlib models that can be exported to PMML and their equivalent PMML model.
+
+<table class="table">
+  <thead>
+    <tr><th>MLlib model</th><th>PMML model</th></tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>KMeansModel</td><td>ClusteringModel</td>
+    </tr>    
+    <tr>
+      <td>LinearRegressionModel</td><td>RegressionModel (functionName="regression")</td>
+    </tr>
+    <tr>
+      <td>RidgeRegressionModel</td><td>RegressionModel (functionName="regression")</td>
+    </tr>
+    <tr>
+      <td>LassoModel</td><td>RegressionModel (functionName="regression")</td>
+    </tr>
+    <tr>
+      <td>SVMModel</td><td>RegressionModel (functionName="classification" normalizationMethod="none")</td>
+    </tr>
+    <tr>
+      <td>Binary LogisticRegressionModel</td><td>RegressionModel (functionName="classification" normalizationMethod="logit")</td>
+    </tr>
+  </tbody>
+</table>
+
+## Examples
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+To export a supported `model` (see table above) to PMML, simply call `model.toPMML`.
+
+Here a complete example of building a KMeansModel and print it out in PMML format:
+{% highlight scala %}
+import org.apache.spark.mllib.clustering.KMeans
+import org.apache.spark.mllib.linalg.Vectors
+
+// Load and parse the data
+val data = sc.textFile("data/mllib/kmeans_data.txt")
+val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
+
+// Cluster the data into two classes using KMeans
+val numClusters = 2
+val numIterations = 20
+val clusters = KMeans.train(parsedData, numClusters, numIterations)
+
+// Export to PMML
+println("PMML Model:\n" + clusters.toPMML)
+{% endhighlight %}
+
+As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats:
+
+{% highlight scala %}
+// Export the model to a String in PMML format
+clusters.toPMML
+
+// Export the model to a local file in PMML format
+clusters.toPMML("/tmp/kmeans.xml")
+
+// Export the model to a directory on a distributed file system in PMML format
+clusters.toPMML(sc,"/tmp/kmeans")
+
+// Export the model to the OutputStream in PMML format
+clusters.toPMML(System.out)
+{% endhighlight %}
+
+For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown.
+
+</div>
+
+</div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org