You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Kamil Wasilewski <ka...@polidea.com> on 2020/01/14 15:22:14 UTC

[DISCUSS] Integrate Google Cloud AI functionalities

Hi all,

We’d like to implement a set of PTransforms that would allow users to use
some of the Google Cloud AI services in Beam pipelines.

Here's the full list of services and functionalities we’d like to integrate
Beam with:

* Video Intelligence [1]

* Cloud Natural Language [2]

* Cloud AI Platform Prediction [3]

* Data Masking/Tokenization [4]

* Inspecting image data for sensitive information using Cloud Vision [5]

However, we're not sure whether to put those transforms directly into Beam,
because they would require some additional GCP dependencies. One of our
ideas is a separate library, that depends on Beam and that can be installed
optionally, stored somewhere in the beam repository (e.g. in the
BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
maybe it is totally fine to put them into SDKs, just like other IOs?

If you have any other thoughts, do not hesitate to let us know.

Best,

Kamil

[1] https://cloud.google.com/video-intelligence/

[2] https://cloud.google.com/natural-language/

[3] https://cloud.google.com/ml-engine/docs/prediction-overview

[4]
https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming

[5] https://cloud.google.com/vision/

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Robert Bradshaw <ro...@google.com>.
The current state is that it works, and a large amount of testing is
being added [1], but the public API is still in flux (especially the
java-as-callee side [2], and the specification of dependencies [3,4]).
It is being actively worked on though.

[1] https://github.com/apache/beam/pull/10051
[2] https://lists.apache.org/thread.html/d7a7fac2615ea15dbd9e66b1fb02a95bc125f5a4f8a897acc40fe408%40%3Cdev.beam.apache.org%3E
[3] https://lists.apache.org/thread.html/6fcee7047f53cf1c0636fb65367ef70842016d57effe2e5795c4137d%40%3Cdev.beam.apache.org%3E
[4] https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog/edit?usp=sharing

On Tue, Jan 21, 2020 at 2:31 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
> Hello, we are synced I was exactly back to need that same functionality. Last time I checked (end november 2019) there were still many things that were not there. First the External transform is not yet correctly exposed to SDK users (see the previous discussion [1] and Jira ticket BEAM-8546 [2]).
>
> I also hit file staging issues, I am not sure yet if those were my problem or something that should be fixed too but I will probably take a look at this soon. Max, Heejong or anyone more familiar with cross-language pipelines has info on progress in this area?
>
> [1] https://lists.apache.org/thread.html/28f44041748deff8a587a149b4fcf0a8d13d219b32c5063979072474%40%3Cdev.beam.apache.org%3E
> [2] https://issues.apache.org/jira/browse/BEAM-8546
>
>
> On Tue, Jan 21, 2020 at 10:18 AM Michał Walenia <mi...@polidea.com> wrote:
>>
>> Is using Python from Java via ExternalTransform working and tested?
>>
>> On Tue, Jan 21, 2020 at 6:50 AM Reza Rokni <re...@google.com> wrote:
>>>
>>> +1 for using cross language transforms.
>>>
>>> On Thu, 16 Jan 2020 at 01:23, Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski <ka...@polidea.com> wrote:
>>>>>
>>>>> Based on your feedback, I think it'd be fine to deal with the problem as follows:
>>>>> * for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai`
>>>>> * for Java: create a `google-cloud-platform-ai` module in `sdks/java/extensions` folder
>>>>>
>>>>> As for cross language, we expect those transforms to be quite simple, so the cost of implementing them twice is not that high.
>>>>
>>>>
>>>> One option would be to implement inference in a library like tfx_bsl [1]. It comes with a generalized Beam transform that can do inference either from a saved model file or by using a service endpoint. The service endpoint API option is there and could support cloud AI APIs. If we utilize tfx_bsl, we will leverage the existing TFX integration and would avoid creating a parallel set of transforms. Then for Java, we could enable the same interface with cross language transform and offer a unified inference API for both languages.
>>>>
>>>> [1] https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78
>>>>
>>>>
>>>>>
>>>>>
>>>>> Thanks for your input,
>>>>> Kamil
>>>>>
>>>>> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <al...@vanboxel.be> wrote:
>>>>>>
>>>>>> If it's in Java also be careful to align with the current google cloud IO's, certainly it's dependencies. The google IO's are not depending on the the newest client libraries and that's something we're sometimes struggling with when we depend on our own client libraries. So make sure to align them.
>>>>>>
>>>>>> Also note that although gRPC is vendored, the google IO's do still have their own dependency on gRPC and this is the biggest reason for trouble.
>>>>>>
>>>>>>  _/
>>>>>> _/ Alex Van Boxel
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <lc...@google.com> wrote:
>>>>>>>
>>>>>>> It depends on what language the client libraries are exposed in. For example, if the client libraries are in Java, sdks/java/extensions makes sense while if its Python then integrating it within the gcp extension within sdks/python/apache_beam makes sense.
>>>>>>>
>>>>>>> Adding additional dependencies is ok depending on the licensing and the process is slightly different for each language.
>>>>>>>
>>>>>>> For transforms that are complicated, there is a cross language effort going on so that one can execute one language's transforms within another languages pipeline which may remove the need to write the transforms more then once.
>>>>>>>
>>>>>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Nice idea, IO looks like a good place for them but there is another path that could fit this case: `sdks/java/extensions`, some module like `google-cloud-platform-ai` in that folder or something like that, no?
>>>>>>>>
>>>>>>>> In any case great initiative. +1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <ka...@polidea.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We’d like to implement a set of PTransforms that would allow users to use some of the Google Cloud AI services in Beam pipelines.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's the full list of services and functionalities we’d like to integrate Beam with:
>>>>>>>>>
>>>>>>>>> * Video Intelligence [1]
>>>>>>>>>
>>>>>>>>> * Cloud Natural Language [2]
>>>>>>>>>
>>>>>>>>> * Cloud AI Platform Prediction [3]
>>>>>>>>>
>>>>>>>>> * Data Masking/Tokenization [4]
>>>>>>>>>
>>>>>>>>> * Inspecting image data for sensitive information using Cloud Vision [5]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> However, we're not sure whether to put those transforms directly into Beam, because they would require some additional GCP dependencies. One of our ideas is a separate library, that depends on Beam and that can be installed optionally, stored somewhere in the beam repository (e.g. in the BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or maybe it is totally fine to put them into SDKs, just like other IOs?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Kamil
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] https://cloud.google.com/video-intelligence/
>>>>>>>>>
>>>>>>>>> [2] https://cloud.google.com/natural-language/
>>>>>>>>>
>>>>>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>>>>>>
>>>>>>>>> [4] https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>>>>>>
>>>>>>>>> [5] https://cloud.google.com/vision/
>>>
>>>
>>>
>>> --
>>>
>>> This email may be confidential and privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
>>>
>>> The above terms reflect a potential business arrangement, are provided solely as a basis for further discussion, and are not intended to be and do not constitute a legally binding obligation. No legally binding obligations will be created, implied, or inferred until an agreement in final form is executed in writing by all parties involved.
>>
>>
>>
>> --
>>
>> Michał Walenia
>> Polidea | Software Engineer
>>
>> M: +48 791 432 002
>> E: michal.walenia@polidea.com
>>
>> Unique Tech
>> Check out our projects!

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Ismaël Mejía <ie...@gmail.com>.
Hello, we are synced I was exactly back to need that same functionality.
Last time I checked (end november 2019) there were still many things that
were not there. First the External transform is not yet correctly exposed
to SDK users (see the previous discussion [1] and Jira ticket BEAM-8546
[2]).

I also hit file staging issues, I am not sure yet if those were my problem
or something that should be fixed too but I will probably take a look at
this soon. Max, Heejong or anyone more familiar with cross-language
pipelines has info on progress in this area?

[1]
https://lists.apache.org/thread.html/28f44041748deff8a587a149b4fcf0a8d13d219b32c5063979072474%40%3Cdev.beam.apache.org%3E
[2] https://issues.apache.org/jira/browse/BEAM-8546


On Tue, Jan 21, 2020 at 10:18 AM Michał Walenia <mi...@polidea.com>
wrote:

> Is using Python from Java via ExternalTransform working and tested?
>
> On Tue, Jan 21, 2020 at 6:50 AM Reza Rokni <re...@google.com> wrote:
>
>> +1 for using cross language transforms.
>>
>> On Thu, 16 Jan 2020 at 01:23, Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski <
>>> kamil.wasilewski@polidea.com> wrote:
>>>
>>>> Based on your feedback, I think it'd be fine to deal with the problem
>>>> as follows:
>>>> * for Python: put the transforms into
>>>> `sdks/python/apache_beam/io/gcp/ai`
>>>> * for Java: create a `google-cloud-platform-ai` module in
>>>> `sdks/java/extensions` folder
>>>>
>>>> As for cross language, we expect those transforms to be quite simple,
>>>> so the cost of implementing them twice is not that high.
>>>>
>>>
>>> One option would be to implement inference in a library like tfx_bsl
>>> [1]. It comes with a generalized Beam transform that can do inference
>>> either from a saved model file or by using a service endpoint. The service
>>> endpoint API option is there and could support cloud AI APIs. If we utilize
>>> tfx_bsl, we will leverage the existing TFX integration and would avoid
>>> creating a parallel set of transforms. Then for Java, we could enable the
>>> same interface with cross language transform and offer a unified inference
>>> API for both languages.
>>>
>>> [1]
>>> https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78
>>>
>>>
>>>
>>>>
>>>> Thanks for your input,
>>>> Kamil
>>>>
>>>> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <al...@vanboxel.be>
>>>> wrote:
>>>>
>>>>> If it's in Java also be careful to align with the current google cloud
>>>>> IO's, certainly it's dependencies. The google IO's are not depending on the
>>>>> the newest client libraries and that's something we're sometimes struggling
>>>>> with when we depend on our own client libraries. So make sure to align them.
>>>>>
>>>>> Also note that although gRPC is vendored, the google IO's do still
>>>>> have their own dependency on gRPC and this is the biggest reason for
>>>>> trouble.
>>>>>
>>>>>  _/
>>>>> _/ Alex Van Boxel
>>>>>
>>>>>
>>>>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> It depends on what language the client libraries are exposed in. For
>>>>>> example, if the client libraries are in Java, sdks/java/extensions makes
>>>>>> sense while if its Python then integrating it within the gcp extension
>>>>>> within sdks/python/apache_beam makes sense.
>>>>>>
>>>>>> Adding additional dependencies is ok depending on the licensing and
>>>>>> the process is slightly different for each language.
>>>>>>
>>>>>> For transforms that are complicated, there is a cross language effort
>>>>>> going on so that one can execute one language's transforms within another
>>>>>> languages pipeline which may remove the need to write the transforms more
>>>>>> then once.
>>>>>>
>>>>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <ie...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Nice idea, IO looks like a good place for them but there is another
>>>>>>> path that could fit this case: `sdks/java/extensions`, some module like
>>>>>>> `google-cloud-platform-ai` in that folder or something like that, no?
>>>>>>>
>>>>>>> In any case great initiative. +1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
>>>>>>> kamil.wasilewski@polidea.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We’d like to implement a set of PTransforms that would allow users
>>>>>>>> to use some of the Google Cloud AI services in Beam pipelines.
>>>>>>>>
>>>>>>>> Here's the full list of services and functionalities we’d like to
>>>>>>>> integrate Beam with:
>>>>>>>>
>>>>>>>> * Video Intelligence [1]
>>>>>>>>
>>>>>>>> * Cloud Natural Language [2]
>>>>>>>>
>>>>>>>> * Cloud AI Platform Prediction [3]
>>>>>>>>
>>>>>>>> * Data Masking/Tokenization [4]
>>>>>>>>
>>>>>>>> * Inspecting image data for sensitive information using Cloud
>>>>>>>> Vision [5]
>>>>>>>>
>>>>>>>> However, we're not sure whether to put those transforms directly
>>>>>>>> into Beam, because they would require some additional GCP dependencies. One
>>>>>>>> of our ideas is a separate library, that depends on Beam and that can be
>>>>>>>> installed optionally, stored somewhere in the beam repository (e.g. in the
>>>>>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
>>>>>>>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>>>>>>>
>>>>>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Kamil
>>>>>>>>
>>>>>>>> [1] https://cloud.google.com/video-intelligence/
>>>>>>>>
>>>>>>>> [2] https://cloud.google.com/natural-language/
>>>>>>>>
>>>>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>>>>>
>>>>>>>> [4]
>>>>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>>>>>
>>>>>>>> [5] https://cloud.google.com/vision/
>>>>>>>>
>>>>>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>
>
> --
>
> Michał Walenia
> Polidea <https://www.polidea.com/> | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.walenia@polidea.com
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Michał Walenia <mi...@polidea.com>.
Is using Python from Java via ExternalTransform working and tested?

On Tue, Jan 21, 2020 at 6:50 AM Reza Rokni <re...@google.com> wrote:

> +1 for using cross language transforms.
>
> On Thu, 16 Jan 2020 at 01:23, Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski <
>> kamil.wasilewski@polidea.com> wrote:
>>
>>> Based on your feedback, I think it'd be fine to deal with the problem as
>>> follows:
>>> * for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai`
>>> * for Java: create a `google-cloud-platform-ai` module in
>>> `sdks/java/extensions` folder
>>>
>>> As for cross language, we expect those transforms to be quite simple, so
>>> the cost of implementing them twice is not that high.
>>>
>>
>> One option would be to implement inference in a library like tfx_bsl [1].
>> It comes with a generalized Beam transform that can do inference either
>> from a saved model file or by using a service endpoint. The service
>> endpoint API option is there and could support cloud AI APIs. If we utilize
>> tfx_bsl, we will leverage the existing TFX integration and would avoid
>> creating a parallel set of transforms. Then for Java, we could enable the
>> same interface with cross language transform and offer a unified inference
>> API for both languages.
>>
>> [1]
>> https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78
>>
>>
>>
>>>
>>> Thanks for your input,
>>> Kamil
>>>
>>> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <al...@vanboxel.be> wrote:
>>>
>>>> If it's in Java also be careful to align with the current google cloud
>>>> IO's, certainly it's dependencies. The google IO's are not depending on the
>>>> the newest client libraries and that's something we're sometimes struggling
>>>> with when we depend on our own client libraries. So make sure to align them.
>>>>
>>>> Also note that although gRPC is vendored, the google IO's do still have
>>>> their own dependency on gRPC and this is the biggest reason for trouble.
>>>>
>>>>  _/
>>>> _/ Alex Van Boxel
>>>>
>>>>
>>>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> It depends on what language the client libraries are exposed in. For
>>>>> example, if the client libraries are in Java, sdks/java/extensions makes
>>>>> sense while if its Python then integrating it within the gcp extension
>>>>> within sdks/python/apache_beam makes sense.
>>>>>
>>>>> Adding additional dependencies is ok depending on the licensing and
>>>>> the process is slightly different for each language.
>>>>>
>>>>> For transforms that are complicated, there is a cross language effort
>>>>> going on so that one can execute one language's transforms within another
>>>>> languages pipeline which may remove the need to write the transforms more
>>>>> then once.
>>>>>
>>>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <ie...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Nice idea, IO looks like a good place for them but there is another
>>>>>> path that could fit this case: `sdks/java/extensions`, some module like
>>>>>> `google-cloud-platform-ai` in that folder or something like that, no?
>>>>>>
>>>>>> In any case great initiative. +1
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
>>>>>> kamil.wasilewski@polidea.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We’d like to implement a set of PTransforms that would allow users
>>>>>>> to use some of the Google Cloud AI services in Beam pipelines.
>>>>>>>
>>>>>>> Here's the full list of services and functionalities we’d like to
>>>>>>> integrate Beam with:
>>>>>>>
>>>>>>> * Video Intelligence [1]
>>>>>>>
>>>>>>> * Cloud Natural Language [2]
>>>>>>>
>>>>>>> * Cloud AI Platform Prediction [3]
>>>>>>>
>>>>>>> * Data Masking/Tokenization [4]
>>>>>>>
>>>>>>> * Inspecting image data for sensitive information using Cloud Vision
>>>>>>> [5]
>>>>>>>
>>>>>>> However, we're not sure whether to put those transforms directly
>>>>>>> into Beam, because they would require some additional GCP dependencies. One
>>>>>>> of our ideas is a separate library, that depends on Beam and that can be
>>>>>>> installed optionally, stored somewhere in the beam repository (e.g. in the
>>>>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
>>>>>>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>>>>>>
>>>>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Kamil
>>>>>>>
>>>>>>> [1] https://cloud.google.com/video-intelligence/
>>>>>>>
>>>>>>> [2] https://cloud.google.com/natural-language/
>>>>>>>
>>>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>>>>
>>>>>>> [4]
>>>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>>>>
>>>>>>> [5] https://cloud.google.com/vision/
>>>>>>>
>>>>>>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>


-- 

Michał Walenia
Polidea <https://www.polidea.com/> | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.walenia@polidea.com

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Reza Rokni <re...@google.com>.
+1 for using cross language transforms.

On Thu, 16 Jan 2020 at 01:23, Ahmet Altay <al...@google.com> wrote:

>
>
> On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
>
>> Based on your feedback, I think it'd be fine to deal with the problem as
>> follows:
>> * for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai`
>> * for Java: create a `google-cloud-platform-ai` module in
>> `sdks/java/extensions` folder
>>
>> As for cross language, we expect those transforms to be quite simple, so
>> the cost of implementing them twice is not that high.
>>
>
> One option would be to implement inference in a library like tfx_bsl [1].
> It comes with a generalized Beam transform that can do inference either
> from a saved model file or by using a service endpoint. The service
> endpoint API option is there and could support cloud AI APIs. If we utilize
> tfx_bsl, we will leverage the existing TFX integration and would avoid
> creating a parallel set of transforms. Then for Java, we could enable the
> same interface with cross language transform and offer a unified inference
> API for both languages.
>
> [1]
> https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78
>
>
>
>>
>> Thanks for your input,
>> Kamil
>>
>> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <al...@vanboxel.be> wrote:
>>
>>> If it's in Java also be careful to align with the current google cloud
>>> IO's, certainly it's dependencies. The google IO's are not depending on the
>>> the newest client libraries and that's something we're sometimes struggling
>>> with when we depend on our own client libraries. So make sure to align them.
>>>
>>> Also note that although gRPC is vendored, the google IO's do still have
>>> their own dependency on gRPC and this is the biggest reason for trouble.
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>>
>>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> It depends on what language the client libraries are exposed in. For
>>>> example, if the client libraries are in Java, sdks/java/extensions makes
>>>> sense while if its Python then integrating it within the gcp extension
>>>> within sdks/python/apache_beam makes sense.
>>>>
>>>> Adding additional dependencies is ok depending on the licensing and the
>>>> process is slightly different for each language.
>>>>
>>>> For transforms that are complicated, there is a cross language effort
>>>> going on so that one can execute one language's transforms within another
>>>> languages pipeline which may remove the need to write the transforms more
>>>> then once.
>>>>
>>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>
>>>>> Nice idea, IO looks like a good place for them but there is another
>>>>> path that could fit this case: `sdks/java/extensions`, some module like
>>>>> `google-cloud-platform-ai` in that folder or something like that, no?
>>>>>
>>>>> In any case great initiative. +1
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
>>>>> kamil.wasilewski@polidea.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> We’d like to implement a set of PTransforms that would allow users to
>>>>>> use some of the Google Cloud AI services in Beam pipelines.
>>>>>>
>>>>>> Here's the full list of services and functionalities we’d like to
>>>>>> integrate Beam with:
>>>>>>
>>>>>> * Video Intelligence [1]
>>>>>>
>>>>>> * Cloud Natural Language [2]
>>>>>>
>>>>>> * Cloud AI Platform Prediction [3]
>>>>>>
>>>>>> * Data Masking/Tokenization [4]
>>>>>>
>>>>>> * Inspecting image data for sensitive information using Cloud Vision
>>>>>> [5]
>>>>>>
>>>>>> However, we're not sure whether to put those transforms directly into
>>>>>> Beam, because they would require some additional GCP dependencies. One of
>>>>>> our ideas is a separate library, that depends on Beam and that can be
>>>>>> installed optionally, stored somewhere in the beam repository (e.g. in the
>>>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
>>>>>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>>>>>
>>>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Kamil
>>>>>>
>>>>>> [1] https://cloud.google.com/video-intelligence/
>>>>>>
>>>>>> [2] https://cloud.google.com/natural-language/
>>>>>>
>>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>>>
>>>>>> [4]
>>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>>>
>>>>>> [5] https://cloud.google.com/vision/
>>>>>>
>>>>>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Ahmet Altay <al...@google.com>.
On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski <
kamil.wasilewski@polidea.com> wrote:

> Based on your feedback, I think it'd be fine to deal with the problem as
> follows:
> * for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai`
> * for Java: create a `google-cloud-platform-ai` module in
> `sdks/java/extensions` folder
>
> As for cross language, we expect those transforms to be quite simple, so
> the cost of implementing them twice is not that high.
>

One option would be to implement inference in a library like tfx_bsl [1].
It comes with a generalized Beam transform that can do inference either
from a saved model file or by using a service endpoint. The service
endpoint API option is there and could support cloud AI APIs. If we utilize
tfx_bsl, we will leverage the existing TFX integration and would avoid
creating a parallel set of transforms. Then for Java, we could enable the
same interface with cross language transform and offer a unified inference
API for both languages.

[1]
https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78



>
> Thanks for your input,
> Kamil
>
> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <al...@vanboxel.be> wrote:
>
>> If it's in Java also be careful to align with the current google cloud
>> IO's, certainly it's dependencies. The google IO's are not depending on the
>> the newest client libraries and that's something we're sometimes struggling
>> with when we depend on our own client libraries. So make sure to align them.
>>
>> Also note that although gRPC is vendored, the google IO's do still have
>> their own dependency on gRPC and this is the biggest reason for trouble.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> It depends on what language the client libraries are exposed in. For
>>> example, if the client libraries are in Java, sdks/java/extensions makes
>>> sense while if its Python then integrating it within the gcp extension
>>> within sdks/python/apache_beam makes sense.
>>>
>>> Adding additional dependencies is ok depending on the licensing and the
>>> process is slightly different for each language.
>>>
>>> For transforms that are complicated, there is a cross language effort
>>> going on so that one can execute one language's transforms within another
>>> languages pipeline which may remove the need to write the transforms more
>>> then once.
>>>
>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>
>>>> Nice idea, IO looks like a good place for them but there is another
>>>> path that could fit this case: `sdks/java/extensions`, some module like
>>>> `google-cloud-platform-ai` in that folder or something like that, no?
>>>>
>>>> In any case great initiative. +1
>>>>
>>>>
>>>>
>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
>>>> kamil.wasilewski@polidea.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We’d like to implement a set of PTransforms that would allow users to
>>>>> use some of the Google Cloud AI services in Beam pipelines.
>>>>>
>>>>> Here's the full list of services and functionalities we’d like to
>>>>> integrate Beam with:
>>>>>
>>>>> * Video Intelligence [1]
>>>>>
>>>>> * Cloud Natural Language [2]
>>>>>
>>>>> * Cloud AI Platform Prediction [3]
>>>>>
>>>>> * Data Masking/Tokenization [4]
>>>>>
>>>>> * Inspecting image data for sensitive information using Cloud Vision
>>>>> [5]
>>>>>
>>>>> However, we're not sure whether to put those transforms directly into
>>>>> Beam, because they would require some additional GCP dependencies. One of
>>>>> our ideas is a separate library, that depends on Beam and that can be
>>>>> installed optionally, stored somewhere in the beam repository (e.g. in the
>>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
>>>>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>>>>
>>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>>
>>>>> Best,
>>>>>
>>>>> Kamil
>>>>>
>>>>> [1] https://cloud.google.com/video-intelligence/
>>>>>
>>>>> [2] https://cloud.google.com/natural-language/
>>>>>
>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>>
>>>>> [4]
>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>>
>>>>> [5] https://cloud.google.com/vision/
>>>>>
>>>>

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Kamil Wasilewski <ka...@polidea.com>.
Based on your feedback, I think it'd be fine to deal with the problem as
follows:
* for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai`
* for Java: create a `google-cloud-platform-ai` module in
`sdks/java/extensions` folder

As for cross language, we expect those transforms to be quite simple, so
the cost of implementing them twice is not that high.

Thanks for your input,
Kamil

On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <al...@vanboxel.be> wrote:

> If it's in Java also be careful to align with the current google cloud
> IO's, certainly it's dependencies. The google IO's are not depending on the
> the newest client libraries and that's something we're sometimes struggling
> with when we depend on our own client libraries. So make sure to align them.
>
> Also note that although gRPC is vendored, the google IO's do still have
> their own dependency on gRPC and this is the biggest reason for trouble.
>
>  _/
> _/ Alex Van Boxel
>
>
> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <lc...@google.com> wrote:
>
>> It depends on what language the client libraries are exposed in. For
>> example, if the client libraries are in Java, sdks/java/extensions makes
>> sense while if its Python then integrating it within the gcp extension
>> within sdks/python/apache_beam makes sense.
>>
>> Adding additional dependencies is ok depending on the licensing and the
>> process is slightly different for each language.
>>
>> For transforms that are complicated, there is a cross language effort
>> going on so that one can execute one language's transforms within another
>> languages pipeline which may remove the need to write the transforms more
>> then once.
>>
>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> Nice idea, IO looks like a good place for them but there is another path
>>> that could fit this case: `sdks/java/extensions`, some module like
>>> `google-cloud-platform-ai` in that folder or something like that, no?
>>>
>>> In any case great initiative. +1
>>>
>>>
>>>
>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
>>> kamil.wasilewski@polidea.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We’d like to implement a set of PTransforms that would allow users to
>>>> use some of the Google Cloud AI services in Beam pipelines.
>>>>
>>>> Here's the full list of services and functionalities we’d like to
>>>> integrate Beam with:
>>>>
>>>> * Video Intelligence [1]
>>>>
>>>> * Cloud Natural Language [2]
>>>>
>>>> * Cloud AI Platform Prediction [3]
>>>>
>>>> * Data Masking/Tokenization [4]
>>>>
>>>> * Inspecting image data for sensitive information using Cloud Vision [5]
>>>>
>>>> However, we're not sure whether to put those transforms directly into
>>>> Beam, because they would require some additional GCP dependencies. One of
>>>> our ideas is a separate library, that depends on Beam and that can be
>>>> installed optionally, stored somewhere in the beam repository (e.g. in the
>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
>>>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>>>
>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>
>>>> Best,
>>>>
>>>> Kamil
>>>>
>>>> [1] https://cloud.google.com/video-intelligence/
>>>>
>>>> [2] https://cloud.google.com/natural-language/
>>>>
>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>
>>>> [4]
>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>
>>>> [5] https://cloud.google.com/vision/
>>>>
>>>

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Alex Van Boxel <al...@vanboxel.be>.
If it's in Java also be careful to align with the current google cloud
IO's, certainly it's dependencies. The google IO's are not depending on the
the newest client libraries and that's something we're sometimes struggling
with when we depend on our own client libraries. So make sure to align them.

Also note that although gRPC is vendored, the google IO's do still have
their own dependency on gRPC and this is the biggest reason for trouble.

 _/
_/ Alex Van Boxel


On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <lc...@google.com> wrote:

> It depends on what language the client libraries are exposed in. For
> example, if the client libraries are in Java, sdks/java/extensions makes
> sense while if its Python then integrating it within the gcp extension
> within sdks/python/apache_beam makes sense.
>
> Adding additional dependencies is ok depending on the licensing and the
> process is slightly different for each language.
>
> For transforms that are complicated, there is a cross language effort
> going on so that one can execute one language's transforms within another
> languages pipeline which may remove the need to write the transforms more
> then once.
>
> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Nice idea, IO looks like a good place for them but there is another path
>> that could fit this case: `sdks/java/extensions`, some module like
>> `google-cloud-platform-ai` in that folder or something like that, no?
>>
>> In any case great initiative. +1
>>
>>
>>
>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
>> kamil.wasilewski@polidea.com> wrote:
>>
>>> Hi all,
>>>
>>> We’d like to implement a set of PTransforms that would allow users to
>>> use some of the Google Cloud AI services in Beam pipelines.
>>>
>>> Here's the full list of services and functionalities we’d like to
>>> integrate Beam with:
>>>
>>> * Video Intelligence [1]
>>>
>>> * Cloud Natural Language [2]
>>>
>>> * Cloud AI Platform Prediction [3]
>>>
>>> * Data Masking/Tokenization [4]
>>>
>>> * Inspecting image data for sensitive information using Cloud Vision [5]
>>>
>>> However, we're not sure whether to put those transforms directly into
>>> Beam, because they would require some additional GCP dependencies. One of
>>> our ideas is a separate library, that depends on Beam and that can be
>>> installed optionally, stored somewhere in the beam repository (e.g. in the
>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
>>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>>
>>> If you have any other thoughts, do not hesitate to let us know.
>>>
>>> Best,
>>>
>>> Kamil
>>>
>>> [1] https://cloud.google.com/video-intelligence/
>>>
>>> [2] https://cloud.google.com/natural-language/
>>>
>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>
>>> [4]
>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>
>>> [5] https://cloud.google.com/vision/
>>>
>>

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Luke Cwik <lc...@google.com>.
It depends on what language the client libraries are exposed in. For
example, if the client libraries are in Java, sdks/java/extensions makes
sense while if its Python then integrating it within the gcp extension
within sdks/python/apache_beam makes sense.

Adding additional dependencies is ok depending on the licensing and the
process is slightly different for each language.

For transforms that are complicated, there is a cross language effort going
on so that one can execute one language's transforms within another
languages pipeline which may remove the need to write the transforms more
then once.

On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Nice idea, IO looks like a good place for them but there is another path
> that could fit this case: `sdks/java/extensions`, some module like
> `google-cloud-platform-ai` in that folder or something like that, no?
>
> In any case great initiative. +1
>
>
>
> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
>
>> Hi all,
>>
>> We’d like to implement a set of PTransforms that would allow users to use
>> some of the Google Cloud AI services in Beam pipelines.
>>
>> Here's the full list of services and functionalities we’d like to
>> integrate Beam with:
>>
>> * Video Intelligence [1]
>>
>> * Cloud Natural Language [2]
>>
>> * Cloud AI Platform Prediction [3]
>>
>> * Data Masking/Tokenization [4]
>>
>> * Inspecting image data for sensitive information using Cloud Vision [5]
>>
>> However, we're not sure whether to put those transforms directly into
>> Beam, because they would require some additional GCP dependencies. One of
>> our ideas is a separate library, that depends on Beam and that can be
>> installed optionally, stored somewhere in the beam repository (e.g. in the
>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>
>> If you have any other thoughts, do not hesitate to let us know.
>>
>> Best,
>>
>> Kamil
>>
>> [1] https://cloud.google.com/video-intelligence/
>>
>> [2] https://cloud.google.com/natural-language/
>>
>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>
>> [4]
>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>
>> [5] https://cloud.google.com/vision/
>>
>

Re: [DISCUSS] Integrate Google Cloud AI functionalities

Posted by Ismaël Mejía <ie...@gmail.com>.
Nice idea, IO looks like a good place for them but there is another path
that could fit this case: `sdks/java/extensions`, some module like
`google-cloud-platform-ai` in that folder or something like that, no?

In any case great initiative. +1



On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
kamil.wasilewski@polidea.com> wrote:

> Hi all,
>
> We’d like to implement a set of PTransforms that would allow users to use
> some of the Google Cloud AI services in Beam pipelines.
>
> Here's the full list of services and functionalities we’d like to
> integrate Beam with:
>
> * Video Intelligence [1]
>
> * Cloud Natural Language [2]
>
> * Cloud AI Platform Prediction [3]
>
> * Data Masking/Tokenization [4]
>
> * Inspecting image data for sensitive information using Cloud Vision [5]
>
> However, we're not sure whether to put those transforms directly into
> Beam, because they would require some additional GCP dependencies. One of
> our ideas is a separate library, that depends on Beam and that can be
> installed optionally, stored somewhere in the beam repository (e.g. in the
> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
> maybe it is totally fine to put them into SDKs, just like other IOs?
>
> If you have any other thoughts, do not hesitate to let us know.
>
> Best,
>
> Kamil
>
> [1] https://cloud.google.com/video-intelligence/
>
> [2] https://cloud.google.com/natural-language/
>
> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>
> [4]
> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>
> [5] https://cloud.google.com/vision/
>