You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hollin Wilkins (JIRA)" <ji...@apache.org> on 2015/07/16 01:08:04 UTC
[jira] [Created] (SPARK-9084) Add in support for realtime data
predictions using ML PipelineModel
Hollin Wilkins created SPARK-9084:
-------------------------------------
Summary: Add in support for realtime data predictions using ML PipelineModel
Key: SPARK-9084
URL: https://issues.apache.org/jira/browse/SPARK-9084
Project: Spark
Issue Type: New Feature
Components: Spark Core
Reporter: Hollin Wilkins
Priority: Critical
Currently ML provides excellent support for feature manipulation, model selection, and prediction for large datasets. The models can all be easily serialized but currently it is not possible to use the fitted models without a DataFrame. This means that these models are only good for batch processing. In order to support realtime ML pipelines, I propose adding in three new methods to the Transformer class:
def transform(row: StructuredRow): StructuredRow
def transform(row: StructuredRow, paramMap: ParamMap): StructuredRow
def transform(row: StructuredRow, firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): StructuredRow
Where a StructuredRow is a case class that is the combination of an org.apache.spark.sql.Row and an org.apache.spark.sql.types.StructType
This change necessitates the addition of the new transform method to each implementor of the Transformer class.
Following this change, it would be trivial to include the spark jars in an API server, deserialize an ML PipelineModel object, take incoming data from users, convert it into a StructuredRow and feed it into the PipelineModel to get a realtime result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org