You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Peter Dannemann <pb...@gmail.com> on 2020/01/05 13:18:19 UTC

Python IO Connector

I’d like to develop the Python SDK’s SQL IO connector. I was thinking it
would be easiest to use sqlalchemy to achieve maximum database engine
support, but I suppose I could also create an ABC for databases that follow
the DB API and create subclasses for each database engine that override a
connect method. What are your thoughts on the best way to do this?

Re: Python IO Connector

Posted by Brian Hulette <bh...@google.com>.

Regarding cross-language and Beam rows (and SQL!) - I have a PR up [1] that
adds an example script for using Beam's SqlTransform in Python by
leveraging the portable row coder. Unfortunately I got stalled figuring out
how to build/stage the Java artifacts for the SQL extensions so it hasn't
been merged yet.

I think a cross-language JdbcIO would be quite similar, except it's in core
so there's no issue with additional jars. JdbcIO already has a ReadRows
transform that can produce a PCollection<Row>, we would just need to add an
ExternalTransformBuilder and ExternalTransformRegistrar implementation for
that transform. PubsubIO [2] has a good example of this.

[1] https://github.com/apache/beam/pull/10055
[2]
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L720

On Tue, Jan 7, 2020 at 4:49 AM Lucas Magalhães <
lucas.magalhaes@paralelocs.com.br> wrote:

> Hi Peter.
>
> Why don't you use this external library?
> https://pypi.org/project/beam-nuggets/   They already use SQLAlchemy and
> is pretty easy to use.
>
>
> On Mon, Jan 6, 2020 at 10:17 PM Luke Cwik <lc...@google.com> wrote:
>
>> Eugene, the JdbcIO output should be updated to support Beam's schema
>> format which would allow for "rows" to cross the language boundaries.
>>
>> If the connector is easy to write and maintain then it makes sense for
>> native. Maybe the Python version will have an easier time to support
>> splitting and hence could overtake the Java implementation in useful
>> features.
>>
>> On Mon, Jan 6, 2020 at 3:55 PM <pb...@gmail.com> wrote:
>>
>>> Apache Airflow went for the DB API approach as well and it seems like to
>>> have worked well for them. We will likely need to add extra_requires for
>>> each database engine Python package though, which adds some complexity but
>>> not a lot
>>>
>>> On Jan 6, 2020, at 6:12 PM, Eugene Kirpichov <jk...@google.com> wrote:
>>>
>>> Agreed with above, it seems prudent to develop a pure-Python connector
>>> for something as common as interacting with a database. It's likely easier
>>> to achieve an idiomatic API, familiar to non-Beam Python SQL users, within
>>> pure Python.
>>>
>>> Developing a cross-language connector here might be plain impossible,
>>> because rows read from a database are (at least in JDBC) not encodable -
>>> they require a user's callback to translate to an encodable user type, and
>>> the callback can't be in Python because then you have to encode its input
>>> before giving it to Python. Same holds for the write transform.
>>>
>>> Not sure about sqlalchemy though, maybe use plain DB-API
>>> https://www.python.org/dev/peps/pep-0249/ instead? Seems like the
>>> Python one is more friendly than JDBC in the sense that it actually returns
>>> rows as tuples of simple data types.
>>>
>>> On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <ch...@google.com>
>>>> wrote:
>>>>
>>>>> Regarding cross-language transforms, we need to add better
>>>>> documentation, but for now you'll have to go with existing examples and
>>>>> tests. For example,
>>>>>
>>>>>
>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
>>>>>
>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py
>>>>>
>>>>> Note that cross-language transforms feature is currently only
>>>>> available for Flink Runner. Dataflow support is in development.
>>>>>
>>>>
>>>> I think it works with all non-Dataflow runners, with the exception of
>>>> the Java and Go Direct runners. (It does work with the Python direct
>>>> runner.)
>>>>
>>>>
>>>>> I'm fine with developing this natively for Python as well. AFAIK Java
>>>>> JDBC IO connector is not a super-complicated connector and it should be
>>>>> fine to make relatively easy to maintain and widely usable connectors
>>>>> available in multiple SDKs.
>>>>>
>>>>
>>>> Yes, a case can certainly be made for having native connectors for
>>>> particular common/simple sources. (We certainly don't call cross-language
>>>> to read text files for example.)
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>>
>>>>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> +Chamikara Jayalath <ch...@google.com> +Heejong Lee
>>>>>> <he...@google.com>
>>>>>>
>>>>>> On Mon, Jan 6, 2020 at 10:20 AM <pb...@gmail.com> wrote:
>>>>>>
>>>>>>> How do I go about doing that? From the docs, it appears cross
>>>>>>> language transforms are
>>>>>>> currently undocumented.
>>>>>>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
>>>>>>>
>>>>>>> What about using a cross language transform between Python and the
>>>>>>> already existing Java JdbcIO transform?
>>>>>>>
>>>>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was
>>>>>>>> thinking it would be easiest to use sqlalchemy to achieve maximum database
>>>>>>>> engine support, but I suppose I could also create an ABC for databases that
>>>>>>>> follow the DB API and create subclasses for each database engine that
>>>>>>>> override a connect method. What are your thoughts on the best way to do
>>>>>>>> this?
>>>>>>>>
>>>>>>>
>
> --
> Lucas Magalhães,
> CTO
>
> Paralelo CS - Consultoria e Serviços
> Tel: +55 (11) 3090-5557 <+55%2011%203090-5557>
> Cel: +55 (11) 99420-4667 <+55%2011%2099420-4667>
> lucas.magalhaes@paralelocs.com.br
>
> <http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br
>

Re: Python IO Connector

Posted by Lucas Magalhães <lu...@paralelocs.com.br>.

Hi Peter.

Why don't you use this external library?
https://pypi.org/project/beam-nuggets/   They already use SQLAlchemy and is
pretty easy to use.


On Mon, Jan 6, 2020 at 10:17 PM Luke Cwik <lc...@google.com> wrote:

> Eugene, the JdbcIO output should be updated to support Beam's schema
> format which would allow for "rows" to cross the language boundaries.
>
> If the connector is easy to write and maintain then it makes sense for
> native. Maybe the Python version will have an easier time to support
> splitting and hence could overtake the Java implementation in useful
> features.
>
> On Mon, Jan 6, 2020 at 3:55 PM <pb...@gmail.com> wrote:
>
>> Apache Airflow went for the DB API approach as well and it seems like to
>> have worked well for them. We will likely need to add extra_requires for
>> each database engine Python package though, which adds some complexity but
>> not a lot
>>
>> On Jan 6, 2020, at 6:12 PM, Eugene Kirpichov <jk...@google.com> wrote:
>>
>> Agreed with above, it seems prudent to develop a pure-Python connector
>> for something as common as interacting with a database. It's likely easier
>> to achieve an idiomatic API, familiar to non-Beam Python SQL users, within
>> pure Python.
>>
>> Developing a cross-language connector here might be plain impossible,
>> because rows read from a database are (at least in JDBC) not encodable -
>> they require a user's callback to translate to an encodable user type, and
>> the callback can't be in Python because then you have to encode its input
>> before giving it to Python. Same holds for the write transform.
>>
>> Not sure about sqlalchemy though, maybe use plain DB-API
>> https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python
>> one is more friendly than JDBC in the sense that it actually returns rows
>> as tuples of simple data types.
>>
>> On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <ch...@google.com>
>>> wrote:
>>>
>>>> Regarding cross-language transforms, we need to add better
>>>> documentation, but for now you'll have to go with existing examples and
>>>> tests. For example,
>>>>
>>>>
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
>>>>
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py
>>>>
>>>> Note that cross-language transforms feature is currently only available
>>>> for Flink Runner. Dataflow support is in development.
>>>>
>>>
>>> I think it works with all non-Dataflow runners, with the exception of
>>> the Java and Go Direct runners. (It does work with the Python direct
>>> runner.)
>>>
>>>
>>>> I'm fine with developing this natively for Python as well. AFAIK Java
>>>> JDBC IO connector is not a super-complicated connector and it should be
>>>> fine to make relatively easy to maintain and widely usable connectors
>>>> available in multiple SDKs.
>>>>
>>>
>>> Yes, a case can certainly be made for having native connectors for
>>> particular common/simple sources. (We certainly don't call cross-language
>>> to read text files for example.)
>>>
>>>
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>>
>>>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> +Chamikara Jayalath <ch...@google.com> +Heejong Lee
>>>>> <he...@google.com>
>>>>>
>>>>> On Mon, Jan 6, 2020 at 10:20 AM <pb...@gmail.com> wrote:
>>>>>
>>>>>> How do I go about doing that? From the docs, it appears cross
>>>>>> language transforms are
>>>>>> currently undocumented.
>>>>>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>> What about using a cross language transform between Python and the
>>>>>> already existing Java JdbcIO transform?
>>>>>>
>>>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was
>>>>>>> thinking it would be easiest to use sqlalchemy to achieve maximum database
>>>>>>> engine support, but I suppose I could also create an ABC for databases that
>>>>>>> follow the DB API and create subclasses for each database engine that
>>>>>>> override a connect method. What are your thoughts on the best way to do
>>>>>>> this?
>>>>>>>
>>>>>>

-- 
Lucas Magalhães,
CTO

Paralelo CS - Consultoria e Serviços
Tel: +55 (11) 3090-5557
Cel: +55 (11) 99420-4667
lucas.magalhaes@paralelocs.com.br

<http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br

Re: Python IO Connector

Posted by Luke Cwik <lc...@google.com>.

Eugene, the JdbcIO output should be updated to support Beam's schema format
which would allow for "rows" to cross the language boundaries.

If the connector is easy to write and maintain then it makes sense for
native. Maybe the Python version will have an easier time to support
splitting and hence could overtake the Java implementation in useful
features.

On Mon, Jan 6, 2020 at 3:55 PM <pb...@gmail.com> wrote:

> Apache Airflow went for the DB API approach as well and it seems like to
> have worked well for them. We will likely need to add extra_requires for
> each database engine Python package though, which adds some complexity but
> not a lot
>
> On Jan 6, 2020, at 6:12 PM, Eugene Kirpichov <jk...@google.com> wrote:
>
> Agreed with above, it seems prudent to develop a pure-Python connector for
> something as common as interacting with a database. It's likely easier to
> achieve an idiomatic API, familiar to non-Beam Python SQL users, within
> pure Python.
>
> Developing a cross-language connector here might be plain impossible,
> because rows read from a database are (at least in JDBC) not encodable -
> they require a user's callback to translate to an encodable user type, and
> the callback can't be in Python because then you have to encode its input
> before giving it to Python. Same holds for the write transform.
>
> Not sure about sqlalchemy though, maybe use plain DB-API
> https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python
> one is more friendly than JDBC in the sense that it actually returns rows
> as tuples of simple data types.
>
> On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <ch...@google.com>
>> wrote:
>>
>>> Regarding cross-language transforms, we need to add better
>>> documentation, but for now you'll have to go with existing examples and
>>> tests. For example,
>>>
>>>
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
>>>
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py
>>>
>>> Note that cross-language transforms feature is currently only available
>>> for Flink Runner. Dataflow support is in development.
>>>
>>
>> I think it works with all non-Dataflow runners, with the exception of the
>> Java and Go Direct runners. (It does work with the Python direct runner.)
>>
>>
>>> I'm fine with developing this natively for Python as well. AFAIK Java
>>> JDBC IO connector is not a super-complicated connector and it should be
>>> fine to make relatively easy to maintain and widely usable connectors
>>> available in multiple SDKs.
>>>
>>
>> Yes, a case can certainly be made for having native connectors for
>> particular common/simple sources. (We certainly don't call cross-language
>> to read text files for example.)
>>
>>
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> +Chamikara Jayalath <ch...@google.com> +Heejong Lee
>>>> <he...@google.com>
>>>>
>>>> On Mon, Jan 6, 2020 at 10:20 AM <pb...@gmail.com> wrote:
>>>>
>>>>> How do I go about doing that? From the docs, it appears cross language
>>>>> transforms are
>>>>> currently undocumented.
>>>>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>> What about using a cross language transform between Python and the
>>>>> already existing Java JdbcIO transform?
>>>>>
>>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking
>>>>>> it would be easiest to use sqlalchemy to achieve maximum database engine
>>>>>> support, but I suppose I could also create an ABC for databases that follow
>>>>>> the DB API and create subclasses for each database engine that override a
>>>>>> connect method. What are your thoughts on the best way to do this?
>>>>>>
>>>>>

Re: Python IO Connector

Posted by pb...@gmail.com.

Apache Airflow went for the DB API approach as well and it seems like to have worked well for them. We will likely need to add extra_requires for each database engine Python package though, which adds some complexity but not a lot

> On Jan 6, 2020, at 6:12 PM, Eugene Kirpichov <jk...@google.com> wrote:
> 
> Agreed with above, it seems prudent to develop a pure-Python connector for something as common as interacting with a database. It's likely easier to achieve an idiomatic API, familiar to non-Beam Python SQL users, within pure Python.
> 
> Developing a cross-language connector here might be plain impossible, because rows read from a database are (at least in JDBC) not encodable - they require a user's callback to translate to an encodable user type, and the callback can't be in Python because then you have to encode its input before giving it to Python. Same holds for the write transform.
> 
> Not sure about sqlalchemy though, maybe use plain DB-API https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python one is more friendly than JDBC in the sense that it actually returns rows as tuples of simple data types.
> 
>> On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <ro...@google.com> wrote:
>>> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <ch...@google.com> wrote:
>> 
>>> Regarding cross-language transforms, we need to add better documentation, but for now you'll have to go with existing examples and tests. For example,
>>> 
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py
>>> 
>>> Note that cross-language transforms feature is currently only available for Flink Runner. Dataflow support is in development.
>> 
>> I think it works with all non-Dataflow runners, with the exception of the Java and Go Direct runners. (It does work with the Python direct runner.)
>>  
>>> I'm fine with developing this natively for Python as well. AFAIK Java JDBC IO connector is not a super-complicated connector and it should be fine to make relatively easy to maintain and widely usable connectors available in multiple SDKs.
>> 
>> Yes, a case can certainly be made for having native connectors for particular common/simple sources. (We certainly don't call cross-language to read text files for example.)
>>  
>>> 
>>> Thanks,
>>> Cham 
>>> 
>>> 
>>>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote:
>>>> +Chamikara Jayalath +Heejong Lee 
>>>> 
>>>>> On Mon, Jan 6, 2020 at 10:20 AM <pb...@gmail.com> wrote:
>>>>> How do I go about doing that? From the docs, it appears cross language transforms are
>>>>> currently undocumented. https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
>>>>>> 
>>>>>> What about using a cross language transform between Python and the already existing Java JdbcIO transform?
>>>>>> 
>>>>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com> wrote:
>>>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking it would be easiest to use sqlalchemy to achieve maximum database engine support, but I suppose I could also create an ABC for databases that follow the DB API and create subclasses for each database engine that override a connect method. What are your thoughts on the best way to do this?

Re: Python IO Connector

Posted by Eugene Kirpichov <jk...@google.com>.

Agreed with above, it seems prudent to develop a pure-Python connector for
something as common as interacting with a database. It's likely easier to
achieve an idiomatic API, familiar to non-Beam Python SQL users, within
pure Python.

Developing a cross-language connector here might be plain impossible,
because rows read from a database are (at least in JDBC) not encodable -
they require a user's callback to translate to an encodable user type, and
the callback can't be in Python because then you have to encode its input
before giving it to Python. Same holds for the write transform.

Not sure about sqlalchemy though, maybe use plain DB-API
https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python
one is more friendly than JDBC in the sense that it actually returns rows
as tuples of simple data types.

On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <ro...@google.com> wrote:

> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> Regarding cross-language transforms, we need to add better documentation,
>> but for now you'll have to go with existing examples and tests. For example,
>>
>>
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
>>
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py
>>
>> Note that cross-language transforms feature is currently only available
>> for Flink Runner. Dataflow support is in development.
>>
>
> I think it works with all non-Dataflow runners, with the exception of the
> Java and Go Direct runners. (It does work with the Python direct runner.)
>
>
>> I'm fine with developing this natively for Python as well. AFAIK Java
>> JDBC IO connector is not a super-complicated connector and it should be
>> fine to make relatively easy to maintain and widely usable connectors
>> available in multiple SDKs.
>>
>
> Yes, a case can certainly be made for having native connectors for
> particular common/simple sources. (We certainly don't call cross-language
> to read text files for example.)
>
>
>>
>> Thanks,
>> Cham
>>
>>
>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> +Chamikara Jayalath <ch...@google.com> +Heejong Lee
>>> <he...@google.com>
>>>
>>> On Mon, Jan 6, 2020 at 10:20 AM <pb...@gmail.com> wrote:
>>>
>>>> How do I go about doing that? From the docs, it appears cross language
>>>> transforms are
>>>> currently undocumented.
>>>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
>>>>
>>>> What about using a cross language transform between Python and the
>>>> already existing Java JdbcIO transform?
>>>>
>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com>
>>>> wrote:
>>>>
>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking
>>>>> it would be easiest to use sqlalchemy to achieve maximum database engine
>>>>> support, but I suppose I could also create an ABC for databases that follow
>>>>> the DB API and create subclasses for each database engine that override a
>>>>> connect method. What are your thoughts on the best way to do this?
>>>>>
>>>>

Re: Python IO Connector

Posted by Robert Bradshaw <ro...@google.com>.

On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <ch...@google.com>
wrote:

> Regarding cross-language transforms, we need to add better documentation,
> but for now you'll have to go with existing examples and tests. For example,
>
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py
>
> Note that cross-language transforms feature is currently only available
> for Flink Runner. Dataflow support is in development.
>

I think it works with all non-Dataflow runners, with the exception of the
Java and Go Direct runners. (It does work with the Python direct runner.)


> I'm fine with developing this natively for Python as well. AFAIK Java JDBC
> IO connector is not a super-complicated connector and it should be fine to
> make relatively easy to maintain and widely usable connectors available in
> multiple SDKs.
>

Yes, a case can certainly be made for having native connectors for
particular common/simple sources. (We certainly don't call cross-language
to read text files for example.)


>
> Thanks,
> Cham
>
>
> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote:
>
>> +Chamikara Jayalath <ch...@google.com> +Heejong Lee
>> <he...@google.com>
>>
>> On Mon, Jan 6, 2020 at 10:20 AM <pb...@gmail.com> wrote:
>>
>>> How do I go about doing that? From the docs, it appears cross language
>>> transforms are
>>> currently undocumented.
>>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
>>>
>>> What about using a cross language transform between Python and the
>>> already existing Java JdbcIO transform?
>>>
>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com> wrote:
>>>
>>>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking
>>>> it would be easiest to use sqlalchemy to achieve maximum database engine
>>>> support, but I suppose I could also create an ABC for databases that follow
>>>> the DB API and create subclasses for each database engine that override a
>>>> connect method. What are your thoughts on the best way to do this?
>>>>
>>>

Re: Python IO Connector

Posted by Chamikara Jayalath <ch...@google.com>.

Regarding cross-language transforms, we need to add better documentation,
but for now you'll have to go with existing examples and tests. For example,

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py

Note that cross-language transforms feature is currently only available for
Flink Runner. Dataflow support is in development.

I'm fine with developing this natively for Python as well. AFAIK Java JDBC
IO connector is not a super-complicated connector and it should be fine to
make relatively easy to maintain and widely usable connectors available in
multiple SDKs.

Thanks,
Cham

On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote:

> +Chamikara Jayalath <ch...@google.com> +Heejong Lee
> <he...@google.com>
>
> On Mon, Jan 6, 2020 at 10:20 AM <pb...@gmail.com> wrote:
>
>> How do I go about doing that? From the docs, it appears cross language
>> transforms are
>> currently undocumented.
>> https://beam.apache.org/roadmap/connectors-multi-sdk/
>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
>>
>> What about using a cross language transform between Python and the
>> already existing Java JdbcIO transform?
>>
>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com> wrote:
>>
>>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking it
>>> would be easiest to use sqlalchemy to achieve maximum database engine
>>> support, but I suppose I could also create an ABC for databases that follow
>>> the DB API and create subclasses for each database engine that override a
>>> connect method. What are your thoughts on the best way to do this?
>>>
>>

Re: Python IO Connector

Posted by Luke Cwik <lc...@google.com>.

+Chamikara Jayalath <ch...@google.com> +Heejong Lee <he...@google.com>


On Mon, Jan 6, 2020 at 10:20 AM <pb...@gmail.com> wrote:

> How do I go about doing that? From the docs, it appears cross language
> transforms are
> currently undocumented.
> https://beam.apache.org/roadmap/connectors-multi-sdk/
> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
>
> What about using a cross language transform between Python and the already
> existing Java JdbcIO transform?
>
> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com> wrote:
>
>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking it
>> would be easiest to use sqlalchemy to achieve maximum database engine
>> support, but I suppose I could also create an ABC for databases that follow
>> the DB API and create subclasses for each database engine that override a
>> connect method. What are your thoughts on the best way to do this?
>>
>

Re: Python IO Connector

Posted by pb...@gmail.com.

How do I go about doing that? From the docs, it appears cross language transforms are
currently undocumented. https://beam.apache.org/roadmap/connectors-multi-sdk/
> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote:
> 
> What about using a cross language transform between Python and the already existing Java JdbcIO transform?
> 
>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com> wrote:
>> I’d like to develop the Python SDK’s SQL IO connector. I was thinking it would be easiest to use sqlalchemy to achieve maximum database engine support, but I suppose I could also create an ABC for databases that follow the DB API and create subclasses for each database engine that override a connect method. What are your thoughts on the best way to do this?

Re: Python IO Connector

Posted by Luke Cwik <lc...@google.com>.

What about using a cross language transform between Python and the already
existing Java JdbcIO transform?

On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pb...@gmail.com> wrote:

> I’d like to develop the Python SDK’s SQL IO connector. I was thinking it
> would be easiest to use sqlalchemy to achieve maximum database engine
> support, but I suppose I could also create an ABC for databases that follow
> the DB API and create subclasses for each database engine that override a
> connect method. What are your thoughts on the best way to do this?
>