You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by "Takako Shimamoto (JIRA)" <ji...@apache.org> on 2018/11/02 05:50:00 UTC

[jira] [Created] (PIO-192) Enhance PySpark support

Takako Shimamoto created PIO-192:
------------------------------------

             Summary: Enhance PySpark support
                 Key: PIO-192
                 URL: https://issues.apache.org/jira/browse/PIO-192
             Project: PredictionIO
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.13.0
            Reporter: Takako Shimamoto
            Assignee: Takako Shimamoto


h3. Summary
Enhance the pypio, which is the Python API for PIO.

h3. Goals
The limitations of the current Python support always force developers to have access to sbt. This enhancement will get rid of the build phase.

h3. Description
A Python engine template requires 3 files:

* Python code to specify for the --main-py-file option
* template.json
{code:json}
{"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
{code}
* engine.json
{code:json}
{
  "id": "default",
  "description": "Default settings",
  "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
  "algorithms": [
    {
      "name": "default",
      "params": {
        "name": "BHPApp"
      }
    }
  ]
}
{code}

h4. pypio module
Developers can use the pypio module with jupyter notebook and Python code.

First, import the necessary modules.

{code:python}
from pypio import pypio
{code}

Once the module in imported, the first step is to initialize the pypio module.

{code:python}
pypio.init()
{code}

Next, find data from the event store.

{code:python}
event_df = pypio.find('BHPApp')
{code}

And then, save the model.

{code:python}
# model is a PipelineModel, which is produced after a Pipeline’s fit() method runs
model = ...
pypio.save(model)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)