You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by Ron Gonzalez <zl...@yahoo.com> on 2018/01/17 01:50:18 UTC

Some interesting use case

Hi,  I was wondering if anyone has encountered or used Beam in the following manner:   1. During machine learning training, use Beam to create the event table. The flow may consist of some joins, aggregations, row-based transformations, etc...  2. Once the model is created, deploy the model to some scoring service via PMML (or some other scoring service).  3. Enable the SAME transformations used in #1 by using a separate engine but thereby guaranteeing that it will transform the data identically as the engine used in #1.
  I think this is a pretty interesting use case where Beam is used to guarantee portability across engines and deployment (batch to true streaming, not micro-batch). What's not clear to me is with respect to how batch joins would translate during one-by-one scoring (probably lookups) or how aggregations given that some kind of history would need to be stored (and how much is kept is configurable too).
  Thoughts?
Thanks,Ron

Re: Some interesting use case

Posted by zlgonzalez <zl...@yahoo.com>.

Thanks Boris.Yeah we can talk about it at the BoF...
Thanks,Ron


Sent via the Samsung Galaxy S7 active, an AT&T 4G LTE smartphone
-------- Original message --------From: Boris Lublinsky <bo...@lightbend.com> Date: 1/17/18  6:10 AM  (GMT-08:00) To: user@beam.apache.org Cc: dev@beam.apache.org, Charles Chen <cc...@google.com> Subject: Re: Some interesting use case 
Ron,If you are talking about Tensorflow Saved model format, I personally think that it is overkill for model serving. My preferred option is to used traditional TF export, which can be optimized for serving.As for processing I am using TF Java APIs, which basically is a population of the tensor column.
But if you are really interested, we can talk about it in San Jose or set up a config call if you want to discuss it sooner.

Boris Lublinsky
FDP Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/



On Jan 16, 2018, at 10:53 PM, Ron Gonzalez <zl...@yahoo.com> wrote:

            Yes you're right. I believe this is the use case that I'm after. So if I understand correctly, transforms that do aggregations just assume that the batch of data being aggregated is passed as part of a tensor column. Is it possible to hook up a lookup call to another Tensorflow Serving servable for a join in batch mode?
Will a saved model when loaded into a tensorflow serving model actually have the definitions of the metadata when retrieved using the tensorflow serving metadata api?
Thanks,Ron

            
            
                
                    
                    
                        On Tuesday, January 16, 2018, 6:16:01 PM PST, Charles Chen <cc...@google.com> wrote:
                    
                    

                    

                    This sounds similar to the use case for tf.Transform, a library that depends on Beam: https://github.com/tensorflow/transform
On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez <zl...@yahoo.com> wrote:
Hi,  I was wondering if anyone has encountered or used Beam in the following manner:   1. During machine learning training, use Beam to create the event table. The flow may consist of some joins, aggregations, row-based transformations, etc...  2. Once the model is created, deploy the model to some scoring service via PMML (or some other scoring service).  3. Enable the SAME transformations used in #1 by using a separate engine but thereby guaranteeing that it will transform the data identically as the engine used in #1.
  I think this is a pretty interesting use case where Beam is used to guarantee portability across engines and deployment (batch to true streaming, not micro-batch). What's not clear to me is with respect to how batch joins would translate during one-by-one scoring (probably lookups) or how aggregations given that some kind of history would need to be stored (and how much is kept is configurable too).
  Thoughts?
Thanks,Ron

Re: Some interesting use case

Posted by zlgonzalez <zl...@yahoo.com>.

Thanks Boris.Yeah we can talk about it at the BoF...
Thanks,Ron


Sent via the Samsung Galaxy S7 active, an AT&T 4G LTE smartphone
-------- Original message --------From: Boris Lublinsky <bo...@lightbend.com> Date: 1/17/18  6:10 AM  (GMT-08:00) To: user@beam.apache.org Cc: dev@beam.apache.org, Charles Chen <cc...@google.com> Subject: Re: Some interesting use case 
Ron,If you are talking about Tensorflow Saved model format, I personally think that it is overkill for model serving. My preferred option is to used traditional TF export, which can be optimized for serving.As for processing I am using TF Java APIs, which basically is a population of the tensor column.
But if you are really interested, we can talk about it in San Jose or set up a config call if you want to discuss it sooner.

Boris Lublinsky
FDP Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/



On Jan 16, 2018, at 10:53 PM, Ron Gonzalez <zl...@yahoo.com> wrote:

            Yes you're right. I believe this is the use case that I'm after. So if I understand correctly, transforms that do aggregations just assume that the batch of data being aggregated is passed as part of a tensor column. Is it possible to hook up a lookup call to another Tensorflow Serving servable for a join in batch mode?
Will a saved model when loaded into a tensorflow serving model actually have the definitions of the metadata when retrieved using the tensorflow serving metadata api?
Thanks,Ron

            
            
                
                    
                    
                        On Tuesday, January 16, 2018, 6:16:01 PM PST, Charles Chen <cc...@google.com> wrote:
                    
                    

                    

                    This sounds similar to the use case for tf.Transform, a library that depends on Beam: https://github.com/tensorflow/transform
On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez <zl...@yahoo.com> wrote:
Hi,  I was wondering if anyone has encountered or used Beam in the following manner:   1. During machine learning training, use Beam to create the event table. The flow may consist of some joins, aggregations, row-based transformations, etc...  2. Once the model is created, deploy the model to some scoring service via PMML (or some other scoring service).  3. Enable the SAME transformations used in #1 by using a separate engine but thereby guaranteeing that it will transform the data identically as the engine used in #1.
  I think this is a pretty interesting use case where Beam is used to guarantee portability across engines and deployment (batch to true streaming, not micro-batch). What's not clear to me is with respect to how batch joins would translate during one-by-one scoring (probably lookups) or how aggregations given that some kind of history would need to be stored (and how much is kept is configurable too).
  Thoughts?
Thanks,Ron

Re: Some interesting use case

Posted by Boris Lublinsky <bo...@lightbend.com>.

Ron,
If you are talking about Tensorflow Saved model format, I personally think that it is overkill for model serving. My preferred option is to used traditional TF export, which can be optimized for serving.
As for processing I am using TF Java APIs, which basically is a population of the tensor column.

But if you are really interested, we can talk about it in San Jose or set up a config call if you want to discuss it sooner.

Boris Lublinsky
FDP Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/

> On Jan 16, 2018, at 10:53 PM, Ron Gonzalez <zl...@yahoo.com> wrote:
> 
> Yes you're right. I believe this is the use case that I'm after. So if I understand correctly, transforms that do aggregations just assume that the batch of data being aggregated is passed as part of a tensor column. Is it possible to hook up a lookup call to another Tensorflow Serving servable for a join in batch mode?
> 
> Will a saved model when loaded into a tensorflow serving model actually have the definitions of the metadata when retrieved using the tensorflow serving metadata api?
> 
> Thanks,
> Ron
> 
> On Tuesday, January 16, 2018, 6:16:01 PM PST, Charles Chen <cc...@google.com> wrote:
> 
> 
> This sounds similar to the use case for tf.Transform, a library that depends on Beam: https://github.com/tensorflow/transform <https://github.com/tensorflow/transform>
> On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez <zlgonzalez@yahoo.com <ma...@yahoo.com>> wrote:
> Hi,
>   I was wondering if anyone has encountered or used Beam in the following manner:
>  
>   1. During machine learning training, use Beam to create the event table. The flow may consist of some joins, aggregations, row-based transformations, etc...
>   2. Once the model is created, deploy the model to some scoring service via PMML (or some other scoring service).
>   3. Enable the SAME transformations used in #1 by using a separate engine but thereby guaranteeing that it will transform the data identically as the engine used in #1.
> 
>   I think this is a pretty interesting use case where Beam is used to guarantee portability across engines and deployment (batch to true streaming, not micro-batch). What's not clear to me is with respect to how batch joins would translate during one-by-one scoring (probably lookups) or how aggregations given that some kind of history would need to be stored (and how much is kept is configurable too).
> 
>   Thoughts?
> 
> Thanks,
> Ron

Re: Some interesting use case

Posted by Boris Lublinsky <bo...@lightbend.com>.

Ron,
If you are talking about Tensorflow Saved model format, I personally think that it is overkill for model serving. My preferred option is to used traditional TF export, which can be optimized for serving.
As for processing I am using TF Java APIs, which basically is a population of the tensor column.

But if you are really interested, we can talk about it in San Jose or set up a config call if you want to discuss it sooner.

Boris Lublinsky
FDP Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/

> On Jan 16, 2018, at 10:53 PM, Ron Gonzalez <zl...@yahoo.com> wrote:
> 
> Yes you're right. I believe this is the use case that I'm after. So if I understand correctly, transforms that do aggregations just assume that the batch of data being aggregated is passed as part of a tensor column. Is it possible to hook up a lookup call to another Tensorflow Serving servable for a join in batch mode?
> 
> Will a saved model when loaded into a tensorflow serving model actually have the definitions of the metadata when retrieved using the tensorflow serving metadata api?
> 
> Thanks,
> Ron
> 
> On Tuesday, January 16, 2018, 6:16:01 PM PST, Charles Chen <cc...@google.com> wrote:
> 
> 
> This sounds similar to the use case for tf.Transform, a library that depends on Beam: https://github.com/tensorflow/transform <https://github.com/tensorflow/transform>
> On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez <zlgonzalez@yahoo.com <ma...@yahoo.com>> wrote:
> Hi,
>   I was wondering if anyone has encountered or used Beam in the following manner:
>  
>   1. During machine learning training, use Beam to create the event table. The flow may consist of some joins, aggregations, row-based transformations, etc...
>   2. Once the model is created, deploy the model to some scoring service via PMML (or some other scoring service).
>   3. Enable the SAME transformations used in #1 by using a separate engine but thereby guaranteeing that it will transform the data identically as the engine used in #1.
> 
>   I think this is a pretty interesting use case where Beam is used to guarantee portability across engines and deployment (batch to true streaming, not micro-batch). What's not clear to me is with respect to how batch joins would translate during one-by-one scoring (probably lookups) or how aggregations given that some kind of history would need to be stored (and how much is kept is configurable too).
> 
>   Thoughts?
> 
> Thanks,
> Ron

Re: Some interesting use case

Posted by Ron Gonzalez <zl...@yahoo.com>.

Yes you're right. I believe this is the use case that I'm after. So if I understand correctly, transforms that do aggregations just assume that the batch of data being aggregated is passed as part of a tensor column. Is it possible to hook up a lookup call to another Tensorflow Serving servable for a join in batch mode?
Will a saved model when loaded into a tensorflow serving model actually have the definitions of the metadata when retrieved using the tensorflow serving metadata api?
Thanks,Ron
On Tuesday, January 16, 2018, 6:16:01 PM PST, Charles Chen <cc...@google.com> wrote:

This sounds similar to the use case for tf.Transform, a library that depends on Beam: https://github.com/tensorflow/transform
On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez <zl...@yahoo.com> wrote:

Hi, I was wondering if anyone has encountered or used Beam in the following manner: 1. During machine learning training, use Beam to create the event table. The flow may consist of some joins, aggregations, row-based transformations, etc... 2. Once the model is created, deploy the model to some scoring service via PMML (or some other scoring service). 3. Enable the SAME transformations used in #1 by using a separate engine but thereby guaranteeing that it will transform the data identically as the engine used in #1.
I think this is a pretty interesting use case where Beam is used to guarantee portability across engines and deployment (batch to true streaming, not micro-batch). What's not clear to me is with respect to how batch joins would translate during one-by-one scoring (probably lookups) or how aggregations given that some kind of history would need to be stored (and how much is kept is configurable too).
Thoughts?
Thanks,Ron

Re: Some interesting use case

Posted by Ron Gonzalez <zl...@yahoo.com>.

This sounds similar to the use case for tf.Transform, a library that depends on Beam: https://github.com/tensorflow/transform
On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez <zl...@yahoo.com> wrote:

Re: Some interesting use case

Posted by Charles Chen <cc...@google.com>.

This sounds similar to the use case for tf.Transform, a library that
depends on Beam: https://github.com/tensorflow/transform

On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez <zl...@yahoo.com> wrote:

> Hi,
>   I was wondering if anyone has encountered or used Beam in the following
> manner:
>
>   1. During machine learning training, use Beam to create the event table.
> The flow may consist of some joins, aggregations, row-based
> transformations, etc...
>   2. Once the model is created, deploy the model to some scoring service
> via PMML (or some other scoring service).
>   3. Enable the SAME transformations used in #1 by using a separate engine
> but thereby guaranteeing that it will transform the data identically as the
> engine used in #1.
>
>   I think this is a pretty interesting use case where Beam is used to
> guarantee portability across engines and deployment (batch to true
> streaming, not micro-batch). What's not clear to me is with respect to how
> batch joins would translate during one-by-one scoring (probably lookups) or
> how aggregations given that some kind of history would need to be stored
> (and how much is kept is configurable too).
>
>   Thoughts?
>
> Thanks,
> Ron
>

Re: Some interesting use case

Posted by Boris Lublinsky <bo...@lightbend.com>.

I do have Beam based Model serving implementation, which can take PMML or Tensorflow.
It is listening on Kafka for both Models and data stream and can serve any amount of models.

The model can be produced using any external application, exporting a complete model pipeline.

The complete write up and code are at https://www.lightbend.com/blog/serving-machine-learning-models-free-oreilly-ebook-from-lightbend <https://www.lightbend.com/blog/serving-machine-learning-models-free-oreilly-ebook-from-lightbend>.
The book describes, Flink, Spark, Beam, Kafka streams and Akka streams implementations. Dean and me will show extended Akka and Kafka streams implementation during training session
I have updated it to Beam 2.2 and it has both Java and Scala (based on Beam Java APIs) versions.

Boris Lublinsky
FDP Architect
boris.lublinsky@lightbend.com
https://www.lightbend.com/

> On Jan 16, 2018, at 7:50 PM, Ron Gonzalez <zl...@yahoo.com> wrote:
> 
> Hi,
>   I was wondering if anyone has encountered or used Beam in the following manner:
>  
>   1. During machine learning training, use Beam to create the event table. The flow may consist of some joins, aggregations, row-based transformations, etc...
>   2. Once the model is created, deploy the model to some scoring service via PMML (or some other scoring service).
>   3. Enable the SAME transformations used in #1 by using a separate engine but thereby guaranteeing that it will transform the data identically as the engine used in #1.
> 
>   I think this is a pretty interesting use case where Beam is used to guarantee portability across engines and deployment (batch to true streaming, not micro-batch). What's not clear to me is with respect to how batch joins would translate during one-by-one scoring (probably lookups) or how aggregations given that some kind of history would need to be stored (and how much is kept is configurable too).
> 
>   Thoughts?
> 
> Thanks,
> Ron