You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Varun Kruthiventi (Jira)" <ji...@apache.org> on 2022/08/16 08:29:00 UTC

[jira] [Commented] (SPARK-38648) SPIP: Simplified API for DL Inferencing

    [ https://issues.apache.org/jira/browse/SPARK-38648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580153#comment-17580153 ] 

Varun Kruthiventi commented on SPARK-38648:
-------------------------------------------

Support for integration with DL frameworks like Tensorflow, pytorch will be very helpful. Can we use a common model specification like ONNX for porting the models to spark ? 

> SPIP: Simplified API for DL Inferencing
> ---------------------------------------
>
>                 Key: SPARK-38648
>                 URL: https://issues.apache.org/jira/browse/SPARK-38648
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 3.0.0
>            Reporter: Lee Yang
>            Priority: Minor
>
> h1. Background and Motivation
> The deployment of deep learning (DL) models to Spark clusters can be a point of friction today.  DL practitioners often aren't well-versed with Spark, and Spark experts often aren't well-versed with the fast-changing DL frameworks.  Currently, the deployment of trained DL models is done in a fairly ad-hoc manner, with each model integration usually requiring significant effort.
> To simplify this process, we propose adding an integration layer for each major DL framework that can introspect their respective saved models to more-easily integrate these models into Spark applications.  You can find a detailed proposal here: [https://docs.google.com/document/d/1n7QPHVZfmQknvebZEXxzndHPV2T71aBsDnP4COQa_v0]
> h1. Goals
>  - Simplify the deployment of pre-trained single-node DL models to Spark inference applications.
>  - Follow pandas_udf for simple inference use-cases.
>  - Follow Spark ML Pipelines APIs for transfer-learning use-cases.
>  - Enable integrations with popular third-party DL frameworks like TensorFlow, PyTorch, and Huggingface.
>  - Focus on PySpark, since most of the DL frameworks use Python.
>  - Take advantage of built-in Spark features like GPU scheduling and Arrow integration.
>  - Enable inference on both CPU and GPU.
> h1. Non-goals
>  - DL model training.
>  - Inference w/ distributed models, i.e. "model parallel" inference.
> h1. Target Personas
>  - Data scientists who need to deploy DL models on Spark.
>  - Developers who need to deploy DL models on Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org