You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by "Naoki Takezoe (JIRA)" <ji...@apache.org> on 2018/12/10 13:07:00 UTC
[jira] [Resolved] (PIO-192) Enhance PySpark support
[ https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Naoki Takezoe resolved PIO-192.
-------------------------------
Resolution: Done
Fix Version/s: 0.14.0
> Enhance PySpark support
> -----------------------
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.13.0
> Reporter: Takako Shimamoto
> Assignee: Takako Shimamoto
> Priority: Major
> Fix For: 0.14.0
>
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file xxxx.py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id <engine_instance_id>
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)