You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Luke Cwik <lc...@google.com> on 2020/06/10 15:32:48 UTC

Re: Running apache_beam python sdk without c/c++ libs

Most runners are written in Java while others are cloud offerings which
wouldn't work for your use case which limits you to use the direct runner
(not meant for production/high performance applications). Beam Python SDK
uses cython for performance reasons but I don't believe it strictly
requires it as many unit tests run with and without cython enabled.
Integrations between Beam and third party libraries may require it though
so it likely depends on what you plan to do.

On Wed, Jun 10, 2020 at 8:17 AM Noah Goodrich <me...@noahgoodrich.com> wrote:

> I am looking at using the Beam Python SDK in AWS Glue but it doesn't
> support non-native python libraries (anything that is c/c++ based).
>
> Is the Beam Python SDK / runners able to be used without any c/c++ library
> dependencies?
>

Re: Running apache_beam python sdk without c/c++ libs

Posted by Luke Cwik <lc...@google.com>.
I'm not sure. It depends on whether the Spark -> Beam Python integration
will interfere with the magic built into AWS Glue.

On Wed, Jun 10, 2020 at 8:57 AM Noah Goodrich <me...@noahgoodrich.com> wrote:

> I was hoping to use the Spark runner since Glue is just Spark with some
> magic on top. And in our specific use case, we'd be looking at working with
> S3, Kinesis, and MySQL RDS.
>
> Sounds like this is a non-starter?
>
> On Wed, Jun 10, 2020 at 9:33 AM Luke Cwik <lc...@google.com> wrote:
>
>> Most runners are written in Java while others are cloud offerings which
>> wouldn't work for your use case which limits you to use the direct runner
>> (not meant for production/high performance applications). Beam Python SDK
>> uses cython for performance reasons but I don't believe it strictly
>> requires it as many unit tests run with and without cython enabled.
>> Integrations between Beam and third party libraries may require it though
>> so it likely depends on what you plan to do.
>>
>> On Wed, Jun 10, 2020 at 8:17 AM Noah Goodrich <me...@noahgoodrich.com>
>> wrote:
>>
>>> I am looking at using the Beam Python SDK in AWS Glue but it doesn't
>>> support non-native python libraries (anything that is c/c++ based).
>>>
>>> Is the Beam Python SDK / runners able to be used without any c/c++
>>> library dependencies?
>>>
>>

Re: Running apache_beam python sdk without c/c++ libs

Posted by Luke Cwik <lc...@google.com>.
I'm not sure. It depends on whether the Spark -> Beam Python integration
will interfere with the magic built into AWS Glue.

On Wed, Jun 10, 2020 at 8:57 AM Noah Goodrich <me...@noahgoodrich.com> wrote:

> I was hoping to use the Spark runner since Glue is just Spark with some
> magic on top. And in our specific use case, we'd be looking at working with
> S3, Kinesis, and MySQL RDS.
>
> Sounds like this is a non-starter?
>
> On Wed, Jun 10, 2020 at 9:33 AM Luke Cwik <lc...@google.com> wrote:
>
>> Most runners are written in Java while others are cloud offerings which
>> wouldn't work for your use case which limits you to use the direct runner
>> (not meant for production/high performance applications). Beam Python SDK
>> uses cython for performance reasons but I don't believe it strictly
>> requires it as many unit tests run with and without cython enabled.
>> Integrations between Beam and third party libraries may require it though
>> so it likely depends on what you plan to do.
>>
>> On Wed, Jun 10, 2020 at 8:17 AM Noah Goodrich <me...@noahgoodrich.com>
>> wrote:
>>
>>> I am looking at using the Beam Python SDK in AWS Glue but it doesn't
>>> support non-native python libraries (anything that is c/c++ based).
>>>
>>> Is the Beam Python SDK / runners able to be used without any c/c++
>>> library dependencies?
>>>
>>

Re: Running apache_beam python sdk without c/c++ libs

Posted by Noah Goodrich <me...@noahgoodrich.com>.
I was hoping to use the Spark runner since Glue is just Spark with some
magic on top. And in our specific use case, we'd be looking at working with
S3, Kinesis, and MySQL RDS.

Sounds like this is a non-starter?

On Wed, Jun 10, 2020 at 9:33 AM Luke Cwik <lc...@google.com> wrote:

> Most runners are written in Java while others are cloud offerings which
> wouldn't work for your use case which limits you to use the direct runner
> (not meant for production/high performance applications). Beam Python SDK
> uses cython for performance reasons but I don't believe it strictly
> requires it as many unit tests run with and without cython enabled.
> Integrations between Beam and third party libraries may require it though
> so it likely depends on what you plan to do.
>
> On Wed, Jun 10, 2020 at 8:17 AM Noah Goodrich <me...@noahgoodrich.com> wrote:
>
>> I am looking at using the Beam Python SDK in AWS Glue but it doesn't
>> support non-native python libraries (anything that is c/c++ based).
>>
>> Is the Beam Python SDK / runners able to be used without any c/c++
>> library dependencies?
>>
>