You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Matei Zaharia <ma...@gmail.com> on 2018/07/04 01:15:36 UTC

Re: Revisiting Online serving of Spark models?

Just wondering, is there an update on this? I haven’t seen a summary of the offline discussion but maybe I’ve missed it.

Matei 

> On Jun 11, 2018, at 8:51 PM, Holden Karau <ho...@gmail.com> wrote:
> 
> So I kicked of a thread on user@ to collect people's feedback there but I'll summarize the offline results later this week too.
> 
> On Tue, Jun 12, 2018, 5:03 AM Liang-Chi Hsieh <vi...@gmail.com> wrote:
> 
> Hi,
> 
> It'd be great if there can be any sharing of the offline discussion. Thanks!
> 
> 
> 
> Holden Karau wrote
> > We’re by the registration sign going to start walking over at 4:05
> > 
> > On Wed, Jun 6, 2018 at 2:43 PM Maximiliano Felice <
> 
> > maximilianofelice@
> 
> >> wrote:
> > 
> >> Hi!
> >>
> >> Do we meet at the entrance?
> >>
> >> See you
> >>
> >>
> >> El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath <
> >> 
> 
> > nick.pentreath@
> 
> >> escribió:
> >>
> >>> I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it.
> >>>
> >>> On Sun, 3 Jun 2018 at 00:24 Holden Karau &lt;
> 
> > holden@
> 
> > &gt; wrote:
> >>>
> >>>> On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice <
> >>>> 
> 
> > maximilianofelice@
> 
> >> wrote:
> >>>>
> >>>>> Hi!
> >>>>>
> >>>>> We're already in San Francisco waiting for the summit. We even think
> >>>>> that we spotted @holdenk this afternoon.
> >>>>>
> >>>> Unless you happened to be walking by my garage probably not super
> >>>> likely, spent the day working on scooters/motorcycles (my style is a
> >>>> little
> >>>> less unique in SF :)). Also if you see me feel free to say hi unless I
> >>>> look
> >>>> like I haven't had my first coffee of the day, love chatting with folks
> >>>> IRL
> >>>> :)
> >>>>
> >>>>>
> >>>>> @chris, we're really interested in the Meetup you're hosting. My team
> >>>>> will probably join it since the beginning of you have room for us, and
> >>>>> I'll
> >>>>> join it later after discussing the topics on this thread. I'll send
> >>>>> you an
> >>>>> email regarding this request.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <
> >>>>> 
> 
> > sxk1969@
> 
> >> escribió:
> >>>>>
> >>>>>> @Chris This sounds fantastic, please send summary notes for Seattle
> >>>>>> folks
> >>>>>>
> >>>>>> @Felix I work in downtown Seattle, am wondering if we should a tech
> >>>>>> meetup around model serving in spark at my work or elsewhere close,
> >>>>>> thoughts?  I’m actually in the midst of building microservices to
> >>>>>> manage
> >>>>>> models and when I say models I mean much more than machine learning
> >>>>>> models
> >>>>>> (think OR, process models as well)
> >>>>>>
> >>>>>> Regards
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>> On May 31, 2018, at 10:32 PM, Chris Fregly &lt;
> 
> > chris@
> 
> > &gt; wrote:
> >>>>>>
> >>>>>> Hey everyone!
> >>>>>>
> >>>>>> @Felix:  thanks for putting this together.  i sent some of you a
> >>>>>> quick
> >>>>>> calendar event - mostly for me, so i don’t forget!  :)
> >>>>>>
> >>>>>> Coincidentally, this is the focus of June 6th's *Advanced Spark and
> >>>>>> TensorFlow Meetup*
> >>>>>> &lt;https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/&gt;
> >>>>>> @5:30pm
> >>>>>> on June 6th (same night) here in SF!
> >>>>>>
> >>>>>> Everybody is welcome to come.  Here’s the link to the meetup that
> >>>>>> includes the signup link:
> >>>>>> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/*
> >>>>>> &lt;https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/&gt;
> >>>>>>
> >>>>>> We have an awesome lineup of speakers covered a lot of deep,
> >>>>>> technical
> >>>>>> ground.
> >>>>>>
> >>>>>> For those who can’t attend in person, we’ll be broadcasting live -
> >>>>>> and
> >>>>>> posting the recording afterward.
> >>>>>>
> >>>>>> All details are in the meetup link above…
> >>>>>>
> >>>>>> @holden/felix/nick/joseph/maximiliano/saikat/leif:  you’re more than
> >>>>>> welcome to give a talk. I can move things around to make room.
> >>>>>>
> >>>>>> @joseph:  I’d personally like an update on the direction of the
> >>>>>> Databricks proprietary ML Serving export format which is similar to
> >>>>>> PMML
> >>>>>> but not a standard in any way.
> >>>>>>
> >>>>>> Also, the Databricks ML Serving Runtime is only available to
> >>>>>> Databricks customers.  This seems in conflict with the community
> >>>>>> efforts
> >>>>>> described here.  Can you comment on behalf of Databricks?
> >>>>>>
> >>>>>> Look forward to your response, joseph.
> >>>>>>
> >>>>>> See you all soon!
> >>>>>>
> >>>>>> —
> >>>>>>
> >>>>>>
> >>>>>> *Chris Fregly *Founder @ *PipelineAI* &lt;https://pipeline.ai/&gt;
> >>>>>> (100,000
> >>>>>> Users)
> >>>>>> Organizer @ *Advanced Spark and TensorFlow Meetup*
> >>>>>> &lt;https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/&gt;
> >>>>>> (85,000
> >>>>>> Global Members)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> *San Francisco - Chicago - Austin -
> >>>>>> Washington DC - London - Dusseldorf *
> >>>>>> *Try our PipelineAI Community Edition with GPUs and TPUs!!
> >>>>>> &lt;http://community.pipeline.ai/&gt;*
> >>>>>>
> >>>>>>
> >>>>>> On May 30, 2018, at 9:32 AM, Felix Cheung &lt;
> 
> > felixcheung_m@
> 
> > &gt;
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi!
> >>>>>>
> >>>>>> Thank you! Let’s meet then
> >>>>>>
> >>>>>> June 6 4pm
> >>>>>>
> >>>>>> Moscone West Convention Center
> >>>>>> 800 Howard Street, San Francisco, CA 94103
> >>>>>> &lt;https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103&amp;entry=gmail&amp;source=g&gt;
> >>>>>>
> >>>>>> Ground floor (outside of conference area - should be available for
> >>>>>> all) - we will meet and decide where to go
> >>>>>>
> >>>>>> (Would not send invite because that would be too much noise for dev@)
> >>>>>>
> >>>>>> To paraphrase Joseph, we will use this to kick off the discusssion
> >>>>>> and
> >>>>>> post notes after and follow up online. As for Seattle, I would be
> >>>>>> very
> >>>>>> interested to meet in person lateen and discuss ;)
> >>>>>>
> >>>>>>
> >>>>>> _____________________________
> >>>>>> From: Saikat Kanjilal &lt;
> 
> > sxk1969@
> 
> > &gt;
> >>>>>> Sent: Tuesday, May 29, 2018 11:46 AM
> >>>>>> Subject: Re: Revisiting Online serving of Spark models?
> >>>>>> To: Maximiliano Felice &lt;
> 
> > maximilianofelice@
> 
> > &gt;
> >>>>>> Cc: Felix Cheung &lt;
> 
> > felixcheung_m@
> 
> > &gt;, Holden Karau <
> >>>>>> 
> 
> > holden@
> 
> >>, Joseph Bradley &lt;
> 
> > joseph@
> 
> > &gt;, Leif
> >>>>>> Walsh &lt;
> 
> > leif.walsh@
> 
> > &gt;, dev &lt;
> 
> > dev@.apache
> 
> > &gt;
> >>>>>>
> >>>>>>
> >>>>>> Would love to join but am in Seattle, thoughts on how to make this
> >>>>>> work?
> >>>>>>
> >>>>>> Regards
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>> On May 29, 2018, at 10:35 AM, Maximiliano Felice <
> >>>>>> 
> 
> > maximilianofelice@
> 
> >> wrote:
> >>>>>>
> >>>>>> Big +1 to a meeting with fresh air.
> >>>>>>
> >>>>>> Could anyone send the invites? I don't really know which is the place
> >>>>>> Holden is talking about.
> >>>>>>
> >>>>>> 2018-05-29 14:27 GMT-03:00 Felix Cheung &lt;
> 
> > felixcheung_m@
> 
> > &gt;:
> >>>>>>
> >>>>>>> You had me at blue bottle!
> >>>>>>>
> >>>>>>> _____________________________
> >>>>>>> From: Holden Karau &lt;
> 
> > holden@
> 
> > &gt;
> >>>>>>> Sent: Tuesday, May 29, 2018 9:47 AM
> >>>>>>> Subject: Re: Revisiting Online serving of Spark models?
> >>>>>>> To: Felix Cheung &lt;
> 
> > felixcheung_m@
> 
> > &gt;
> >>>>>>> Cc: Saikat Kanjilal &lt;
> 
> > sxk1969@
> 
> > &gt;, Maximiliano Felice <
> >>>>>>> 
> 
> > maximilianofelice@
> 
> >>, Joseph Bradley &lt;
> 
> > joseph@
> 
> > &gt;,
> >>>>>>> Leif Walsh &lt;
> 
> > leif.walsh@
> 
> > &gt;, dev &lt;
> 
> > dev@.apache
> 
> > &gt;
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I'm down for that, we could all go for a walk maybe to the mint
> >>>>>>> plazaa blue bottle and grab coffee (if the weather holds have our
> >>>>>>> design
> >>>>>>> meeting outside :p)?
> >>>>>>>
> >>>>>>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung <
> >>>>>>> 
> 
> > felixcheung_m@
> 
> >> wrote:
> >>>>>>>
> >>>>>>>> Bump.
> >>>>>>>>
> >>>>>>>> ------------------------------
> >>>>>>>> *From:* Felix Cheung &lt;
> 
> > felixcheung_m@
> 
> > &gt;
> >>>>>>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM
> >>>>>>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
> >>>>>>>> *Cc:* Leif Walsh; Holden Karau; dev
> >>>>>>>>
> >>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
> >>>>>>>>
> >>>>>>>> Hi! How about we meet the community and discuss on June 6 4pm at
> >>>>>>>> (near) the Summit?
> >>>>>>>>
> >>>>>>>> (I propose we meet at the venue entrance so we could accommodate
> >>>>>>>> people might not be in the conference)
> >>>>>>>>
> >>>>>>>> ------------------------------
> >>>>>>>> *From:* Saikat Kanjilal &lt;
> 
> > sxk1969@
> 
> > &gt;
> >>>>>>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM
> >>>>>>>> *To:* Maximiliano Felice
> >>>>>>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev
> >>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
> >>>>>>>>
> >>>>>>>> I’m in the same exact boat as Maximiliano and have use cases as
> >>>>>>>> well
> >>>>>>>> for model serving and would love to join this discussion.
> >>>>>>>>
> >>>>>>>> Sent from my iPhone
> >>>>>>>>
> >>>>>>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice <
> >>>>>>>> 
> 
> > maximilianofelice@
> 
> >> wrote:
> >>>>>>>>
> >>>>>>>> Hi!
> >>>>>>>>
> >>>>>>>> I'm don't usually write a lot on this list but I keep up to date
> >>>>>>>> with the discussions and I'm a heavy user of Spark. This topic
> >>>>>>>> caught my
> >>>>>>>> attention, as we're currently facing this issue at work. I'm
> >>>>>>>> attending to
> >>>>>>>> the summit and was wondering if it would it be possible for me to
> >>>>>>>> join that
> >>>>>>>> meeting. I might be able to share some helpful usecases and ideas.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Maximiliano Felice
> >>>>>>>>
> >>>>>>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh <
> >>>>>>>> 
> 
> > leif.walsh@
> 
> >> escribió:
> >>>>>>>>
> >>>>>>>>> I’m with you on json being more readable than parquet, but we’ve
> >>>>>>>>> had success using pyarrow’s parquet reader and have been quite
> >>>>>>>>> happy with
> >>>>>>>>> it so far. If your target is python (and probably if not now, then
> >>>>>>>>> soon,
> >>>>>>>>> R), you should look in to it.
> >>>>>>>>>
> >>>>>>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley &lt;
> 
> > joseph@
> 
> > &gt;
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Regarding model reading and writing, I'll give quick thoughts
> >>>>>>>>>> here:
> >>>>>>>>>> * Our approach was to use the same format but write JSON instead
> >>>>>>>>>> of Parquet.  It's easier to parse JSON without Spark, and using
> >>>>>>>>>> the same
> >>>>>>>>>> format simplifies architecture.  Plus, some people want to check
> >>>>>>>>>> files into
> >>>>>>>>>> version control, and JSON is nice for that.
> >>>>>>>>>> * The reader/writer APIs could be extended to take format
> >>>>>>>>>> parameters (just like DataFrame reader/writers) to handle JSON
> >>>>>>>>>> (and maybe,
> >>>>>>>>>> eventually, handle Parquet in the online serving setting).
> >>>>>>>>>>
> >>>>>>>>>> This would be a big project, so proposing a SPIP might be best.
> >>>>>>>>>> If people are around at the Spark Summit, that could be a good
> >>>>>>>>>> time to meet
> >>>>>>>>>> up & then post notes back to the dev list.
> >>>>>>>>>>
> >>>>>>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung <
> >>>>>>>>>> 
> 
> > felixcheung_m@
> 
> >> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Specifically I’d like bring part of the discussion to Model and
> >>>>>>>>>>> PipelineModel, and various ModelReader and SharedReadWrite
> >>>>>>>>>>> implementations
> >>>>>>>>>>> that rely on SparkContext. This is a big blocker on reusing 
> >>>>>>>>>>> trained models
> >>>>>>>>>>> outside of Spark for online serving.
> >>>>>>>>>>>
> >>>>>>>>>>> What’s the next step? Would folks be interested in getting
> >>>>>>>>>>> together to discuss/get some feedback?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _____________________________
> >>>>>>>>>>> From: Felix Cheung &lt;
> 
> > felixcheung_m@
> 
> > &gt;
> >>>>>>>>>>> Sent: Thursday, May 10, 2018 10:10 AM
> >>>>>>>>>>> Subject: Re: Revisiting Online serving of Spark models?
> >>>>>>>>>>> To: Holden Karau &lt;
> 
> > holden@
> 
> > &gt;, Joseph Bradley <
> >>>>>>>>>>> 
> 
> > joseph@
> 
> >>
> >>>>>>>>>>> Cc: dev &lt;
> 
> > dev@.apache
> 
> > &gt;
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Huge +1 on this!
> >>>>>>>>>>>
> >>>>>>>>>>> ------------------------------
> >>>>>>>>>>> *From:*
> 
> > holden.karau@
> 
> >  &lt;
> 
> > holden.karau@
> 
> > &gt; on behalf
> >>>>>>>>>>> of Holden Karau &lt;
> 
> > holden@
> 
> > &gt;
> >>>>>>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM
> >>>>>>>>>>> *To:* Joseph Bradley
> >>>>>>>>>>> *Cc:* dev
> >>>>>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley <
> >>>>>>>>>>> 
> 
> > joseph@
> 
> >> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks for bringing this up Holden!  I'm a strong supporter of
> >>>>>>>>>>>> this.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Awesome! I'm glad other folks think something like this belongs
> >>>>>>>>>>> in Spark.
> >>>>>>>>>>>
> >>>>>>>>>>>> This was one of the original goals for mllib-local: to have
> >>>>>>>>>>>> local versions of MLlib models which could be deployed without
> >>>>>>>>>>>> the big
> >>>>>>>>>>>> Spark JARs and without a SparkContext or SparkSession.  There
> >>>>>>>>>>>> are related
> >>>>>>>>>>>> commercial offerings like this : ) but the overhead of
> >>>>>>>>>>>> maintaining those
> >>>>>>>>>>>> offerings is pretty high.  Building good APIs within MLlib to
> >>>>>>>>>>>> avoid copying
> >>>>>>>>>>>> logic across libraries will be well worth it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> We've talked about this need at Databricks and have also been
> >>>>>>>>>>>> syncing with the creators of MLeap.  It'd be great to get this
> >>>>>>>>>>>> functionality into Spark itself.  Some thoughts:
> >>>>>>>>>>>> * It'd be valuable to have this go beyond adding transform()
> >>>>>>>>>>>> methods taking a Row to the current Models.  Instead, it would
> >>>>>>>>>>>> be ideal to
> >>>>>>>>>>>> have local, lightweight versions of models in mllib-local,
> >>>>>>>>>>>> outside of the
> >>>>>>>>>>>> main mllib package (for easier deployment with smaller & fewer
> >>>>>>>>>>>> dependencies).
> >>>>>>>>>>>> * Supporting Pipelines is important.  For this, it would be
> >>>>>>>>>>>> ideal to utilize elements of Spark SQL, particularly Rows and
> >>>>>>>>>>>> Types, which
> >>>>>>>>>>>> could be moved into a local sql package.
> >>>>>>>>>>>> * This architecture may require some awkward APIs currently to
> >>>>>>>>>>>> have model prediction logic in mllib-local, local model classes
> >>>>>>>>>>>> in
> >>>>>>>>>>>> mllib-local, and regular (DataFrame-friendly) model classes in
> >>>>>>>>>>>> mllib.  We
> >>>>>>>>>>>> might find it helpful to break some DeveloperApis in Spark 3.0
> >>>>>>>>>>>> to
> >>>>>>>>>>>> facilitate this architecture while making it feasible for 3rd
> >>>>>>>>>>>> party
> >>>>>>>>>>>> developers to extend MLlib APIs (especially in Java).
> >>>>>>>>>>>>
> >>>>>>>>>>> I agree this could be interesting, and feed into the other
> >>>>>>>>>>> discussion around when (or if) we should be considering Spark
> >>>>>>>>>>> 3.0
> >>>>>>>>>>> I _think_ we could probably do it with optional traits people
> >>>>>>>>>>> could mix in to avoid breaking the current APIs but I could be
> >>>>>>>>>>> wrong on
> >>>>>>>>>>> that point.
> >>>>>>>>>>>
> >>>>>>>>>>>> * It could also be worth discussing local DataFrames.  They
> >>>>>>>>>>>> might not be as important as per-Row transformations, but they
> >>>>>>>>>>>> would be
> >>>>>>>>>>>> helpful for batching for higher throughput.
> >>>>>>>>>>>>
> >>>>>>>>>>> That could be interesting as well.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'll be interested to hear others' thoughts too!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Joseph
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau <
> >>>>>>>>>>>> 
> 
> > holden@
> 
> >> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi y'all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> With the renewed interest in ML in Apache Spark now seems like
> >>>>>>>>>>>>> a good a time as any to revisit the online serving situation
> >>>>>>>>>>>>> in Spark ML.
> >>>>>>>>>>>>> DB & other's have done some excellent working moving a lot of
> >>>>>>>>>>>>> the necessary
> >>>>>>>>>>>>> tools into a local linear algebra package that doesn't depend
> >>>>>>>>>>>>> on having a
> >>>>>>>>>>>>> SparkContext.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> There are a few different commercial and non-commercial
> >>>>>>>>>>>>> solutions round this, but currently our individual
> >>>>>>>>>>>>> transform/predict
> >>>>>>>>>>>>> methods are private so they either need to copy or
> >>>>>>>>>>>>> re-implement (or put
> >>>>>>>>>>>>> them selves in org.apache.spark) to access them. How would
> >>>>>>>>>>>>> folks feel about
> >>>>>>>>>>>>> adding a new trait for ML pipeline stages to expose to do
> >>>>>>>>>>>>> transformation of
> >>>>>>>>>>>>> single element inputs (or local collections) that could be
> >>>>>>>>>>>>> optionally
> >>>>>>>>>>>>> implemented by stages which support this? That way we can have
> >>>>>>>>>>>>> less copy
> >>>>>>>>>>>>> and paste code possibly getting out of sync with our model
> >>>>>>>>>>>>> training.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think continuing to have on-line serving grow in different
> >>>>>>>>>>>>> projects is probably the right path, forward (folks have
> >>>>>>>>>>>>> different needs),
> >>>>>>>>>>>>> but I'd love to see us make it simpler for other projects to
> >>>>>>>>>>>>> build reliable
> >>>>>>>>>>>>> serving tools.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I realize this maybe puts some of the folks in an awkward
> >>>>>>>>>>>>> position with their own commercial offerings, but hopefully if
> >>>>>>>>>>>>> we make it
> >>>>>>>>>>>>> easier for everyone the commercial vendors can benefit as
> >>>>>>>>>>>>> well.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Holden :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Joseph Bradley
> >>>>>>>>>>>> Software Engineer - Machine Learning
> >>>>>>>>>>>> Databricks, Inc.
> >>>>>>>>>>>> [image: http://databricks.com] &lt;http://databricks.com/&gt;
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Joseph Bradley
> >>>>>>>>>> Software Engineer - Machine Learning
> >>>>>>>>>> Databricks, Inc.
> >>>>>>>>>> [image: http://databricks.com] &lt;http://databricks.com/&gt;
> >>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> --
> >>>>>>>>> Cheers,
> >>>>>>>>> Leif
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Twitter: https://twitter.com/holdenkarau
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Twitter: https://twitter.com/holdenkarau
> >>>>
> >>> --
> > Twitter: https://twitter.com/holdenkarau
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Revisiting Online serving of Spark models?

Posted by Maximiliano Felice <ma...@gmail.com>.

Hi!

To keep things ordered, I just sent an update on an older email requesting
an update for this, named: Spark model serving.

I propose to follow the discussion there. Or here, but not to branch.

Bye!


El mar., 3 jul. 2018 a las 22:15, Matei Zaharia (<ma...@gmail.com>)
escribió:

> Just wondering, is there an update on this? I haven’t seen a summary of
> the offline discussion but maybe I’ve missed it.
>
> Matei
>
> > On Jun 11, 2018, at 8:51 PM, Holden Karau <ho...@gmail.com>
> wrote:
> >
> > So I kicked of a thread on user@ to collect people's feedback there but
> I'll summarize the offline results later this week too.
> >
> > On Tue, Jun 12, 2018, 5:03 AM Liang-Chi Hsieh <vi...@gmail.com> wrote:
> >
> > Hi,
> >
> > It'd be great if there can be any sharing of the offline discussion.
> Thanks!
> >
> >
> >
> > Holden Karau wrote
> > > We’re by the registration sign going to start walking over at 4:05
> > >
> > > On Wed, Jun 6, 2018 at 2:43 PM Maximiliano Felice <
> >
> > > maximilianofelice@
> >
> > >> wrote:
> > >
> > >> Hi!
> > >>
> > >> Do we meet at the entrance?
> > >>
> > >> See you
> > >>
> > >>
> > >> El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath <
> > >>
> >
> > > nick.pentreath@
> >
> > >> escribió:
> > >>
> > >>> I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it.
> > >>>
> > >>> On Sun, 3 Jun 2018 at 00:24 Holden Karau &lt;
> >
> > > holden@
> >
> > > &gt; wrote:
> > >>>
> > >>>> On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice <
> > >>>>
> >
> > > maximilianofelice@
> >
> > >> wrote:
> > >>>>
> > >>>>> Hi!
> > >>>>>
> > >>>>> We're already in San Francisco waiting for the summit. We even
> think
> > >>>>> that we spotted @holdenk this afternoon.
> > >>>>>
> > >>>> Unless you happened to be walking by my garage probably not super
> > >>>> likely, spent the day working on scooters/motorcycles (my style is a
> > >>>> little
> > >>>> less unique in SF :)). Also if you see me feel free to say hi
> unless I
> > >>>> look
> > >>>> like I haven't had my first coffee of the day, love chatting with
> folks
> > >>>> IRL
> > >>>> :)
> > >>>>
> > >>>>>
> > >>>>> @chris, we're really interested in the Meetup you're hosting. My
> team
> > >>>>> will probably join it since the beginning of you have room for us,
> and
> > >>>>> I'll
> > >>>>> join it later after discussing the topics on this thread. I'll send
> > >>>>> you an
> > >>>>> email regarding this request.
> > >>>>>
> > >>>>> Thanks
> > >>>>>
> > >>>>> El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <
> > >>>>>
> >
> > > sxk1969@
> >
> > >> escribió:
> > >>>>>
> > >>>>>> @Chris This sounds fantastic, please send summary notes for
> Seattle
> > >>>>>> folks
> > >>>>>>
> > >>>>>> @Felix I work in downtown Seattle, am wondering if we should a
> tech
> > >>>>>> meetup around model serving in spark at my work or elsewhere
> close,
> > >>>>>> thoughts?  I’m actually in the midst of building microservices to
> > >>>>>> manage
> > >>>>>> models and when I say models I mean much more than machine
> learning
> > >>>>>> models
> > >>>>>> (think OR, process models as well)
> > >>>>>>
> > >>>>>> Regards
> > >>>>>>
> > >>>>>> Sent from my iPhone
> > >>>>>>
> > >>>>>> On May 31, 2018, at 10:32 PM, Chris Fregly &lt;
> >
> > > chris@
> >
> > > &gt; wrote:
> > >>>>>>
> > >>>>>> Hey everyone!
> > >>>>>>
> > >>>>>> @Felix:  thanks for putting this together.  i sent some of you a
> > >>>>>> quick
> > >>>>>> calendar event - mostly for me, so i don’t forget!  :)
> > >>>>>>
> > >>>>>> Coincidentally, this is the focus of June 6th's *Advanced Spark
> and
> > >>>>>> TensorFlow Meetup*
> > >>>>>> &lt;
> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/&gt
> ;
> > >>>>>> @5:30pm
> > >>>>>> on June 6th (same night) here in SF!
> > >>>>>>
> > >>>>>> Everybody is welcome to come.  Here’s the link to the meetup that
> > >>>>>> includes the signup link:
> > >>>>>> *
> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/*
> > >>>>>> &lt;
> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/&gt
> ;
> > >>>>>>
> > >>>>>> We have an awesome lineup of speakers covered a lot of deep,
> > >>>>>> technical
> > >>>>>> ground.
> > >>>>>>
> > >>>>>> For those who can’t attend in person, we’ll be broadcasting live -
> > >>>>>> and
> > >>>>>> posting the recording afterward.
> > >>>>>>
> > >>>>>> All details are in the meetup link above…
> > >>>>>>
> > >>>>>> @holden/felix/nick/joseph/maximiliano/saikat/leif:  you’re more
> than
> > >>>>>> welcome to give a talk. I can move things around to make room.
> > >>>>>>
> > >>>>>> @joseph:  I’d personally like an update on the direction of the
> > >>>>>> Databricks proprietary ML Serving export format which is similar
> to
> > >>>>>> PMML
> > >>>>>> but not a standard in any way.
> > >>>>>>
> > >>>>>> Also, the Databricks ML Serving Runtime is only available to
> > >>>>>> Databricks customers.  This seems in conflict with the community
> > >>>>>> efforts
> > >>>>>> described here.  Can you comment on behalf of Databricks?
> > >>>>>>
> > >>>>>> Look forward to your response, joseph.
> > >>>>>>
> > >>>>>> See you all soon!
> > >>>>>>
> > >>>>>> —
> > >>>>>>
> > >>>>>>
> > >>>>>> *Chris Fregly *Founder @ *PipelineAI* &lt;https://pipeline.ai/&gt
> ;
> > >>>>>> (100,000
> > >>>>>> Users)
> > >>>>>> Organizer @ *Advanced Spark and TensorFlow Meetup*
> > >>>>>> &lt;
> https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/&gt;
> > >>>>>> (85,000
> > >>>>>> Global Members)
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> *San Francisco - Chicago - Austin -
> > >>>>>> Washington DC - London - Dusseldorf *
> > >>>>>> *Try our PipelineAI Community Edition with GPUs and TPUs!!
> > >>>>>> &lt;http://community.pipeline.ai/&gt;*
> > >>>>>>
> > >>>>>>
> > >>>>>> On May 30, 2018, at 9:32 AM, Felix Cheung &lt;
> >
> > > felixcheung_m@
> >
> > > &gt;
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> Hi!
> > >>>>>>
> > >>>>>> Thank you! Let’s meet then
> > >>>>>>
> > >>>>>> June 6 4pm
> > >>>>>>
> > >>>>>> Moscone West Convention Center
> > >>>>>> 800 Howard Street, San Francisco, CA 94103
> > >>>>>> &lt;
> https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103&amp;entry=gmail&amp;source=g&gt
> ;
> > >>>>>>
> > >>>>>> Ground floor (outside of conference area - should be available for
> > >>>>>> all) - we will meet and decide where to go
> > >>>>>>
> > >>>>>> (Would not send invite because that would be too much noise for
> dev@)
> > >>>>>>
> > >>>>>> To paraphrase Joseph, we will use this to kick off the discusssion
> > >>>>>> and
> > >>>>>> post notes after and follow up online. As for Seattle, I would be
> > >>>>>> very
> > >>>>>> interested to meet in person lateen and discuss ;)
> > >>>>>>
> > >>>>>>
> > >>>>>> _____________________________
> > >>>>>> From: Saikat Kanjilal &lt;
> >
> > > sxk1969@
> >
> > > &gt;
> > >>>>>> Sent: Tuesday, May 29, 2018 11:46 AM
> > >>>>>> Subject: Re: Revisiting Online serving of Spark models?
> > >>>>>> To: Maximiliano Felice &lt;
> >
> > > maximilianofelice@
> >
> > > &gt;
> > >>>>>> Cc: Felix Cheung &lt;
> >
> > > felixcheung_m@
> >
> > > &gt;, Holden Karau <
> > >>>>>>
> >
> > > holden@
> >
> > >>, Joseph Bradley &lt;
> >
> > > joseph@
> >
> > > &gt;, Leif
> > >>>>>> Walsh &lt;
> >
> > > leif.walsh@
> >
> > > &gt;, dev &lt;
> >
> > > dev@.apache
> >
> > > &gt;
> > >>>>>>
> > >>>>>>
> > >>>>>> Would love to join but am in Seattle, thoughts on how to make this
> > >>>>>> work?
> > >>>>>>
> > >>>>>> Regards
> > >>>>>>
> > >>>>>> Sent from my iPhone
> > >>>>>>
> > >>>>>> On May 29, 2018, at 10:35 AM, Maximiliano Felice <
> > >>>>>>
> >
> > > maximilianofelice@
> >
> > >> wrote:
> > >>>>>>
> > >>>>>> Big +1 to a meeting with fresh air.
> > >>>>>>
> > >>>>>> Could anyone send the invites? I don't really know which is the
> place
> > >>>>>> Holden is talking about.
> > >>>>>>
> > >>>>>> 2018-05-29 14:27 GMT-03:00 Felix Cheung &lt;
> >
> > > felixcheung_m@
> >
> > > &gt;:
> > >>>>>>
> > >>>>>>> You had me at blue bottle!
> > >>>>>>>
> > >>>>>>> _____________________________
> > >>>>>>> From: Holden Karau &lt;
> >
> > > holden@
> >
> > > &gt;
> > >>>>>>> Sent: Tuesday, May 29, 2018 9:47 AM
> > >>>>>>> Subject: Re: Revisiting Online serving of Spark models?
> > >>>>>>> To: Felix Cheung &lt;
> >
> > > felixcheung_m@
> >
> > > &gt;
> > >>>>>>> Cc: Saikat Kanjilal &lt;
> >
> > > sxk1969@
> >
> > > &gt;, Maximiliano Felice <
> > >>>>>>>
> >
> > > maximilianofelice@
> >
> > >>, Joseph Bradley &lt;
> >
> > > joseph@
> >
> > > &gt;,
> > >>>>>>> Leif Walsh &lt;
> >
> > > leif.walsh@
> >
> > > &gt;, dev &lt;
> >
> > > dev@.apache
> >
> > > &gt;
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> I'm down for that, we could all go for a walk maybe to the mint
> > >>>>>>> plazaa blue bottle and grab coffee (if the weather holds have our
> > >>>>>>> design
> > >>>>>>> meeting outside :p)?
> > >>>>>>>
> > >>>>>>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung <
> > >>>>>>>
> >
> > > felixcheung_m@
> >
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> Bump.
> > >>>>>>>>
> > >>>>>>>> ------------------------------
> > >>>>>>>> *From:* Felix Cheung &lt;
> >
> > > felixcheung_m@
> >
> > > &gt;
> > >>>>>>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM
> > >>>>>>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
> > >>>>>>>> *Cc:* Leif Walsh; Holden Karau; dev
> > >>>>>>>>
> > >>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
> > >>>>>>>>
> > >>>>>>>> Hi! How about we meet the community and discuss on June 6 4pm at
> > >>>>>>>> (near) the Summit?
> > >>>>>>>>
> > >>>>>>>> (I propose we meet at the venue entrance so we could accommodate
> > >>>>>>>> people might not be in the conference)
> > >>>>>>>>
> > >>>>>>>> ------------------------------
> > >>>>>>>> *From:* Saikat Kanjilal &lt;
> >
> > > sxk1969@
> >
> > > &gt;
> > >>>>>>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM
> > >>>>>>>> *To:* Maximiliano Felice
> > >>>>>>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley;
> dev
> > >>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
> > >>>>>>>>
> > >>>>>>>> I’m in the same exact boat as Maximiliano and have use cases as
> > >>>>>>>> well
> > >>>>>>>> for model serving and would love to join this discussion.
> > >>>>>>>>
> > >>>>>>>> Sent from my iPhone
> > >>>>>>>>
> > >>>>>>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice <
> > >>>>>>>>
> >
> > > maximilianofelice@
> >
> > >> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi!
> > >>>>>>>>
> > >>>>>>>> I'm don't usually write a lot on this list but I keep up to date
> > >>>>>>>> with the discussions and I'm a heavy user of Spark. This topic
> > >>>>>>>> caught my
> > >>>>>>>> attention, as we're currently facing this issue at work. I'm
> > >>>>>>>> attending to
> > >>>>>>>> the summit and was wondering if it would it be possible for me
> to
> > >>>>>>>> join that
> > >>>>>>>> meeting. I might be able to share some helpful usecases and
> ideas.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Maximiliano Felice
> > >>>>>>>>
> > >>>>>>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh <
> > >>>>>>>>
> >
> > > leif.walsh@
> >
> > >> escribió:
> > >>>>>>>>
> > >>>>>>>>> I’m with you on json being more readable than parquet, but
> we’ve
> > >>>>>>>>> had success using pyarrow’s parquet reader and have been quite
> > >>>>>>>>> happy with
> > >>>>>>>>> it so far. If your target is python (and probably if not now,
> then
> > >>>>>>>>> soon,
> > >>>>>>>>> R), you should look in to it.
> > >>>>>>>>>
> > >>>>>>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley &lt;
> >
> > > joseph@
> >
> > > &gt;
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Regarding model reading and writing, I'll give quick thoughts
> > >>>>>>>>>> here:
> > >>>>>>>>>> * Our approach was to use the same format but write JSON
> instead
> > >>>>>>>>>> of Parquet.  It's easier to parse JSON without Spark, and
> using
> > >>>>>>>>>> the same
> > >>>>>>>>>> format simplifies architecture.  Plus, some people want to
> check
> > >>>>>>>>>> files into
> > >>>>>>>>>> version control, and JSON is nice for that.
> > >>>>>>>>>> * The reader/writer APIs could be extended to take format
> > >>>>>>>>>> parameters (just like DataFrame reader/writers) to handle JSON
> > >>>>>>>>>> (and maybe,
> > >>>>>>>>>> eventually, handle Parquet in the online serving setting).
> > >>>>>>>>>>
> > >>>>>>>>>> This would be a big project, so proposing a SPIP might be
> best.
> > >>>>>>>>>> If people are around at the Spark Summit, that could be a good
> > >>>>>>>>>> time to meet
> > >>>>>>>>>> up & then post notes back to the dev list.
> > >>>>>>>>>>
> > >>>>>>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung <
> > >>>>>>>>>>
> >
> > > felixcheung_m@
> >
> > >> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Specifically I’d like bring part of the discussion to Model
> and
> > >>>>>>>>>>> PipelineModel, and various ModelReader and SharedReadWrite
> > >>>>>>>>>>> implementations
> > >>>>>>>>>>> that rely on SparkContext. This is a big blocker on reusing
> > >>>>>>>>>>> trained models
> > >>>>>>>>>>> outside of Spark for online serving.
> > >>>>>>>>>>>
> > >>>>>>>>>>> What’s the next step? Would folks be interested in getting
> > >>>>>>>>>>> together to discuss/get some feedback?
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> _____________________________
> > >>>>>>>>>>> From: Felix Cheung &lt;
> >
> > > felixcheung_m@
> >
> > > &gt;
> > >>>>>>>>>>> Sent: Thursday, May 10, 2018 10:10 AM
> > >>>>>>>>>>> Subject: Re: Revisiting Online serving of Spark models?
> > >>>>>>>>>>> To: Holden Karau &lt;
> >
> > > holden@
> >
> > > &gt;, Joseph Bradley <
> > >>>>>>>>>>>
> >
> > > joseph@
> >
> > >>
> > >>>>>>>>>>> Cc: dev &lt;
> >
> > > dev@.apache
> >
> > > &gt;
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Huge +1 on this!
> > >>>>>>>>>>>
> > >>>>>>>>>>> ------------------------------
> > >>>>>>>>>>> *From:*
> >
> > > holden.karau@
> >
> > >  &lt;
> >
> > > holden.karau@
> >
> > > &gt; on behalf
> > >>>>>>>>>>> of Holden Karau &lt;
> >
> > > holden@
> >
> > > &gt;
> > >>>>>>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM
> > >>>>>>>>>>> *To:* Joseph Bradley
> > >>>>>>>>>>> *Cc:* dev
> > >>>>>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley <
> > >>>>>>>>>>>
> >
> > > joseph@
> >
> > >> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Thanks for bringing this up Holden!  I'm a strong supporter
> of
> > >>>>>>>>>>>> this.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Awesome! I'm glad other folks think something like this
> belongs
> > >>>>>>>>>>> in Spark.
> > >>>>>>>>>>>
> > >>>>>>>>>>>> This was one of the original goals for mllib-local: to have
> > >>>>>>>>>>>> local versions of MLlib models which could be deployed
> without
> > >>>>>>>>>>>> the big
> > >>>>>>>>>>>> Spark JARs and without a SparkContext or SparkSession.
> There
> > >>>>>>>>>>>> are related
> > >>>>>>>>>>>> commercial offerings like this : ) but the overhead of
> > >>>>>>>>>>>> maintaining those
> > >>>>>>>>>>>> offerings is pretty high.  Building good APIs within MLlib
> to
> > >>>>>>>>>>>> avoid copying
> > >>>>>>>>>>>> logic across libraries will be well worth it.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> We've talked about this need at Databricks and have also
> been
> > >>>>>>>>>>>> syncing with the creators of MLeap.  It'd be great to get
> this
> > >>>>>>>>>>>> functionality into Spark itself.  Some thoughts:
> > >>>>>>>>>>>> * It'd be valuable to have this go beyond adding transform()
> > >>>>>>>>>>>> methods taking a Row to the current Models.  Instead, it
> would
> > >>>>>>>>>>>> be ideal to
> > >>>>>>>>>>>> have local, lightweight versions of models in mllib-local,
> > >>>>>>>>>>>> outside of the
> > >>>>>>>>>>>> main mllib package (for easier deployment with smaller &
> fewer
> > >>>>>>>>>>>> dependencies).
> > >>>>>>>>>>>> * Supporting Pipelines is important.  For this, it would be
> > >>>>>>>>>>>> ideal to utilize elements of Spark SQL, particularly Rows
> and
> > >>>>>>>>>>>> Types, which
> > >>>>>>>>>>>> could be moved into a local sql package.
> > >>>>>>>>>>>> * This architecture may require some awkward APIs currently
> to
> > >>>>>>>>>>>> have model prediction logic in mllib-local, local model
> classes
> > >>>>>>>>>>>> in
> > >>>>>>>>>>>> mllib-local, and regular (DataFrame-friendly) model classes
> in
> > >>>>>>>>>>>> mllib.  We
> > >>>>>>>>>>>> might find it helpful to break some DeveloperApis in Spark
> 3.0
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>> facilitate this architecture while making it feasible for
> 3rd
> > >>>>>>>>>>>> party
> > >>>>>>>>>>>> developers to extend MLlib APIs (especially in Java).
> > >>>>>>>>>>>>
> > >>>>>>>>>>> I agree this could be interesting, and feed into the other
> > >>>>>>>>>>> discussion around when (or if) we should be considering Spark
> > >>>>>>>>>>> 3.0
> > >>>>>>>>>>> I _think_ we could probably do it with optional traits people
> > >>>>>>>>>>> could mix in to avoid breaking the current APIs but I could
> be
> > >>>>>>>>>>> wrong on
> > >>>>>>>>>>> that point.
> > >>>>>>>>>>>
> > >>>>>>>>>>>> * It could also be worth discussing local DataFrames.  They
> > >>>>>>>>>>>> might not be as important as per-Row transformations, but
> they
> > >>>>>>>>>>>> would be
> > >>>>>>>>>>>> helpful for batching for higher throughput.
> > >>>>>>>>>>>>
> > >>>>>>>>>>> That could be interesting as well.
> > >>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'll be interested to hear others' thoughts too!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Joseph
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau <
> > >>>>>>>>>>>>
> >
> > > holden@
> >
> > >> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi y'all,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> With the renewed interest in ML in Apache Spark now seems
> like
> > >>>>>>>>>>>>> a good a time as any to revisit the online serving
> situation
> > >>>>>>>>>>>>> in Spark ML.
> > >>>>>>>>>>>>> DB & other's have done some excellent working moving a lot
> of
> > >>>>>>>>>>>>> the necessary
> > >>>>>>>>>>>>> tools into a local linear algebra package that doesn't
> depend
> > >>>>>>>>>>>>> on having a
> > >>>>>>>>>>>>> SparkContext.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> There are a few different commercial and non-commercial
> > >>>>>>>>>>>>> solutions round this, but currently our individual
> > >>>>>>>>>>>>> transform/predict
> > >>>>>>>>>>>>> methods are private so they either need to copy or
> > >>>>>>>>>>>>> re-implement (or put
> > >>>>>>>>>>>>> them selves in org.apache.spark) to access them. How would
> > >>>>>>>>>>>>> folks feel about
> > >>>>>>>>>>>>> adding a new trait for ML pipeline stages to expose to do
> > >>>>>>>>>>>>> transformation of
> > >>>>>>>>>>>>> single element inputs (or local collections) that could be
> > >>>>>>>>>>>>> optionally
> > >>>>>>>>>>>>> implemented by stages which support this? That way we can
> have
> > >>>>>>>>>>>>> less copy
> > >>>>>>>>>>>>> and paste code possibly getting out of sync with our model
> > >>>>>>>>>>>>> training.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I think continuing to have on-line serving grow in
> different
> > >>>>>>>>>>>>> projects is probably the right path, forward (folks have
> > >>>>>>>>>>>>> different needs),
> > >>>>>>>>>>>>> but I'd love to see us make it simpler for other projects
> to
> > >>>>>>>>>>>>> build reliable
> > >>>>>>>>>>>>> serving tools.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I realize this maybe puts some of the folks in an awkward
> > >>>>>>>>>>>>> position with their own commercial offerings, but
> hopefully if
> > >>>>>>>>>>>>> we make it
> > >>>>>>>>>>>>> easier for everyone the commercial vendors can benefit as
> > >>>>>>>>>>>>> well.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Holden :)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> --
> > >>>>>>>>>>>> Joseph Bradley
> > >>>>>>>>>>>> Software Engineer - Machine Learning
> > >>>>>>>>>>>> Databricks, Inc.
> > >>>>>>>>>>>> [image: http://databricks.com] &lt;
> http://databricks.com/&gt;
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> Joseph Bradley
> > >>>>>>>>>> Software Engineer - Machine Learning
> > >>>>>>>>>> Databricks, Inc.
> > >>>>>>>>>> [image: http://databricks.com] &lt;http://databricks.com/&gt;
> > >>>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> --
> > >>>>>>>>> Cheers,
> > >>>>>>>>> Leif
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Twitter: https://twitter.com/holdenkarau
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Twitter: https://twitter.com/holdenkarau
> > >>>>
> > >>> --
> > > Twitter: https://twitter.com/holdenkarau
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>